What happened
A new system teaches AI language models to repair formal proofs (mathematical arguments that verify code is correct) by analyzing concrete counterexamples—cases where the proof breaks—rather than trying to generate correct proofs from scratch. This makes proof generation faster, more accurate, and requires fewer computational tokens, which matters because formal verification is the gold standard for ensuring critical software (airplanes, medical devices, financial systems) actually works.
Why it matters
For years, AI approaches to formal verification have treated proof-writing as a one-shot prediction task, ignoring the specific ways proofs fail. This work shows that reasoning backward from failure—the way human mathematicians actually work—makes AI significantly better at a task that currently blocks adoption of formal verification in industry. If this pattern holds at scale, it could reduce the cost and time required to mathematically guarantee correctness in safety-critical systems.