A smarter way to teach smaller AI models — by letting them pick what they learn during training
What happened
Researchers found that teaching a large reasoning model's skills to a smaller model works better when the smaller model gets to choose which reasoning steps to learn from, in real time during training, instead of learning from everything the large model produces. This means smaller AI models can now achieve better reasoning performance with less wasted training — the student actively guides what the teacher shows it, rather than passively absorbing everything.
Why it matters
For years, the only way to make smaller AI models useful at complex tasks was to copy the reasoning from much larger models, then filter out the bad examples afterward. The problem: by then the damage is done. The large model has already generated reasoning paths the small model can't actually learn from, wasting computation. This paper shows that if you let the small model reject unhelpful reasoning in real time — essentially saying 'no, don't go there' — it learns faster and better. The practical effect: you can now build smaller, cheaper models that reason as well as much larger ones, because the training is less wasteful. That opens up deployment in places where a massive model won't fit.
The signal
Watch whether commercial AI companies adopt this generation-time selection approach in their model-distillation pipelines over the next 18 months, and whether smaller open-source reasoning models improve accordingly.