The world is being quietly rearranged by people who write very long documents.


The title they went with Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers Noisy translates that to

Transformers trained to loop can reason deeper than they were taught


Researchers found that transformer neural networks struggle to combine knowledge to solve multi-step problems, especially when asked to go deeper than their training data showed them. By letting the same neural network layers run multiple times in a single forward pass (instead of just once), the network can learn to chain reasoning steps together and generalize to problems it never saw during training.
Large language models store facts and rules but fail at the reasoning that connects them together. This paper shows a structural change: recurrent-depth transformers can decompose reasoning into reusable steps and apply those steps to problems harder than anything in their training set. The practical limit is overthinking—too many recurrence steps degrades performance. This matters because if the mechanism holds up in larger models, it suggests a path toward AI systems that can generalize reasoning rather than memorize patterns.
Check whether larger language models adopt recurrent-depth architecture and whether they show measurable improvement on compositional reasoning tasks (multi-hop question-answering, symbolic reasoning) compared to vanilla transformers at the same model size.

If you insist
Read the original →