LLMs can now catch their own math mistakes before they're published

What happened

A new hybrid system lets language models generate math proofs in a compact format, then automatically expands them into fully rigorous proofs that a lightweight checker verifies. This means LLMs can produce mathematically sound arguments without humans having to formalize every step by hand.

Why it matters

Language models are notoriously confident about wrong answers in math and logic. They'll present a proof that sounds plausible but skips steps, misapplies rules, or invokes lemmas that don't actually follow from the setup. Today, catching these errors requires a human expert reading carefully or formal verification systems so tedious they're rarely used outside academic papers. This system splits the difference: the LLM does the creative work (sketching the proof), and a small trusted program does the checking (filling in gaps and verifying each step). It doesn't make LLMs intelligent about math, but it makes their output verifiable without requiring mathematicians to hand-translate everything into formal languages like Lean or Coq.

The signal

Measure whether systems using this approach actually catch and prevent errors that would otherwise slip through — the threshold is: does it reduce false confident proofs more than it slows down proof generation.