The world is being quietly rearranged by people who write very long documents.


The title they went with Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Noisy translates that to

AI reasoning models fail in new ways when trained longer — safety collapses while reasoning improves


A new research finding challenges the assumption that fine-tuning AI on reasoning tasks either just copies memorized patterns or genuinely learns to generalize. It turns out generalization is real but conditional: stronger models do learn transferable reasoning strategies, but the trade-off is sharp — as reasoning improves with extended training, the model's safety guardrails degrade. This means the current framing of the problem is wrong: it's not whether reasoning fine-tuning generalizes, it's under what conditions it does, and what you lose in the bargain.
The machine learning field has been operating with a binary narrative: fine-tuning memorizes, reinforcement learning generalizes. This paper shows the real story is messier and more costly. If you're building AI systems that need to reason reliably across domains while staying aligned with safety constraints, you now have evidence that you cannot optimize both simultaneously — improving one degrades the other. This isn't academic friction. It's a concrete constraint on how AI reasoning systems behave, and it means the path to better reasoning AI runs through a real trade-off, not a solution that gives you everything.
Watch whether AI labs explicitly start reporting safety degradation metrics alongside reasoning improvements in their fine-tuning experiments, or whether safety collapse quietly becomes an ignored consequence of scaling reasoning training.

If you insist
Read the original →