The world is being quietly rearranged by people who write very long documents.


The title they went with Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling Noisy translates that to

AI model training method fixes itself without human feedback — works on math and geometry tasks


Researchers developed a technique that lets multimodal AI models improve their own reasoning by sampling multiple solutions and learning from patterns in their own outputs, rather than relying on human-labeled correct answers. In practice, this means AI systems can now self-correct on tasks like visual math problems without needing expensive human feedback to guide them.
Training AI models has always required humans to label correct answers so the model learns what right looks like. This paper shows the model can identify its own correct reasoning patterns by checking which answers appear consistently across multiple attempts, then using those patterns to retrain itself. The constraint shifts: you no longer need human annotators, but the model still has to actually get things right on its own before it can learn from itself. This only works if the model's correct outputs outnumber its wrong ones.
Whether this technique improves performance on real multimodal tasks at scale, or whether it mostly helps with well-defined mathematical problems where right and wrong answers are unambiguous.

If you insist
Read the original →