The world is being quietly rearranged by people who write very long documents.


The title they went with Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution Noisy translates that to

AI models training each other hit a wall — adding random noise to their outputs keeps them learning


When one AI model generates problems for another to solve, both models quickly get stuck repeating the same patterns, killing the learning process. Adding random noise to the problem-generator's outputs forces it to keep varying what it produces, letting training continue and solver performance improves by about 4 points on standard benchmarks.
Self-play training (where models compete or cooperate without human feedback) is supposed to be cheap and scalable, but it fails in practice when both models converge on narrow, repetitive patterns. This paper shows that a simple constraint on the output space—literally masking random tokens—prevents that collapse. The finding matters because it suggests that even autonomous AI systems need structural rules analogous to game rules in chess or Go, not just raw reward signals. If this holds across domains, it changes how you'd actually build systems that learn without human-written curricula.
Whether vocabulary dropout (or similar output-space constraints) gets adopted in real co-evolutionary training pipelines at scale, and whether the +4 point gains hold on harder benchmarks beyond math reasoning.

If you insist
Read the original →