AI training method cuts rewards to useful signals instead of final outcomes

What happened

Researchers found that training AI systems on whether they got the right final answer misses what happened in between — the actual thinking steps where good reasoning happens. They built a method that rewards the moment an AI first figures out the right answer, not just the end result, and got 13-24% better performance on smaller models.

Why it matters

Current AI training treats reasoning like a black box: you feed in a question, check the final answer, and adjust accordingly. This means the AI learns nothing about whether its intermediate steps were smart or lucky. The new method breaks that open by rewarding the instant the AI stumbles onto the right answer, preserving signals about how it got there. That's useful because it tells you not just what the model knows, but which path through its reasoning actually led somewhere.

The signal

Whether larger language models (the ones actually deployed in products) show the same 13-24% gains when trained this way, or whether the improvement only appears on smaller test models.