AI model training method fixes itself without human feedback — works on math and geometry tasks

What happened

Researchers developed a technique that lets multimodal AI models improve their own reasoning by sampling multiple solutions and learning from patterns in their own outputs, rather than relying on human-labeled correct answers. In practice, this means AI systems can now self-correct on tasks like visual math problems without needing expensive human feedback to guide them.

Why it matters

Training AI models has always required humans to label correct answers so the model learns what right looks like. This paper shows the model can identify its own correct reasoning patterns by checking which answers appear consistently across multiple attempts, then using those patterns to retrain itself. The constraint shifts: you no longer need human annotators, but the model still has to actually get things right on its own before it can learn from itself. This only works if the model's correct outputs outnumber its wrong ones.

The signal

Whether this technique improves performance on real multimodal tasks at scale, or whether it mostly helps with well-defined mathematical problems where right and wrong answers are unambiguous.