Transformers trained to loop can reason deeper than they were taught

What happened

Researchers found that transformer neural networks struggle to combine knowledge to solve multi-step problems, especially when asked to go deeper than their training data showed them. By letting the same neural network layers run multiple times in a single forward pass (instead of just once), the network can learn to chain reasoning steps together and generalize to problems it never saw during training.

Why it matters

Large language models store facts and rules but fail at the reasoning that connects them together. This paper shows a structural change: recurrent-depth transformers can decompose reasoning into reusable steps and apply those steps to problems harder than anything in their training set. The practical limit is overthinking—too many recurrence steps degrades performance. This matters because if the mechanism holds up in larger models, it suggests a path toward AI systems that can generalize reasoning rather than memorize patterns.

The signal

Check whether larger language models adopt recurrent-depth architecture and whether they show measurable improvement on compositional reasoning tasks (multi-hop question-answering, symbolic reasoning) compared to vanilla transformers at the same model size.