AI models training each other hit a wall — adding random noise to their outputs keeps them learning
What happened
When one AI model generates problems for another to solve, both models quickly get stuck repeating the same patterns, killing the learning process. Adding random noise to the problem-generator's outputs forces it to keep varying what it produces, letting training continue and solver performance improves by about 4 points on standard benchmarks.
Why it matters
Self-play training (where models compete or cooperate without human feedback) is supposed to be cheap and scalable, but it fails in practice when both models converge on narrow, repetitive patterns. This paper shows that a simple constraint on the output space—literally masking random tokens—prevents that collapse. The finding matters because it suggests that even autonomous AI systems need structural rules analogous to game rules in chess or Go, not just raw reward signals. If this holds across domains, it changes how you'd actually build systems that learn without human-written curricula.
The signal
Whether vocabulary dropout (or similar output-space constraints) gets adopted in real co-evolutionary training pipelines at scale, and whether the +4 point gains hold on harder benchmarks beyond math reasoning.