The world is being quietly rearranged by people who write very long documents.


The title they went with LABBench2: An Improved Benchmark for AI Systems Performing Biology Research Noisy translates that to

AI systems are getting better at biology research, but the tests are getting harder


Researchers created a new, tougher benchmark for AI systems that perform biology research. This new test, LABBench2, has nearly 1,900 tasks and is significantly harder than previous versions. It means current AI models perform much worse on these more realistic tasks, showing there is still a lot of room for improvement.
The ability to measure AI progress in scientific discovery is critical. If the tests are too easy, it creates a false sense of progress. This new benchmark pushes AI systems to perform more complex, real-world biology tasks, which means the AI tools developed using this benchmark will be more capable in actual labs. It shifts the focus from theoretical knowledge to practical application, which is essential for AI to genuinely accelerate scientific breakthroughs.
Watch for new AI models that show significant performance improvements on LABBench2, especially across multiple subtasks, as this would indicate a real leap in practical AI capabilities for biology.

If you insist
Read the original →