Biology AI trains on real research papers instead of textbook problems — performance jumps 9%

What happened

Researchers built a dataset of 345,000 question-and-answer pairs extracted directly from biology research papers, then used it to train a reasoning model. The model now performs 9% better on biology tasks than models trained on standard academic benchmarks that don't match what modern biology research actually looks like.

Why it matters

Biology reasoning models have lagged behind AI systems trained for math and coding because the training datasets don't reflect what biologists actually do — they're built from textbook problems, not real research. This signals that the bottleneck for AI reasoning in science isn't the model architecture; it's the training data. If this pattern holds across biology and scales to other sciences, it means future AI systems trained on real domain work will outperform general models, and the researchers or institutions that control access to good training data will control the capability frontier.

The signal

Watch whether the 345K dataset gets adopted by other teams building biology reasoning models, and whether similar dataset-extraction pipelines emerge for chemistry, materials science, or drug discovery — that's the signal this approach is becoming standard rather than a one-off.