The world is being quietly rearranged by people who write very long documents.


The title they went with AIRA_2: Overcoming Bottlenecks in AI Research Agents Noisy translates that to

Research AI agents solve their own bottlenecks, improve faster


Researchers built a system that lets AI agents run many experiments in parallel instead of one at a time, and added better ways to tell if an experiment actually worked — letting the AI keep improving even over long search periods. This means AI can now solve harder research problems on its own without getting stuck or fooled by bad measurement signals.
This is a demonstration that AI research agents can cross a real capability threshold — moving from toy benchmarks to sustained performance gains — by fixing structural constraints rather than just scaling up. The ablation work showing that prior 'overfitting' was evaluation noise, not memorization, is honest failure documentation that matters outside the lab.

If you insist
Read the original →