Cheaper AI models can verify math proofs just as well as expensive ones — if you ask them the right way
What happened
Smaller, cheaper AI models lag behind frontier models by only 10% on accuracy when verifying mathematical proofs, but give inconsistent answers 25% more often. Using specialized prompts tuned to each model's weaknesses closes both gaps, letting a $35 billion parameter model match the performance of models costing millions to train.
Why it matters
Math proof verification is one of the few tasks where we can actually check whether an AI got it right or wrong, and it's becoming the standard way to validate whether AI systems can reason at all. If cheaper models can do the job with the right prompting, the cost to verify any AI's mathematical claims drops by orders of magnitude. That changes who can afford to run verification systems — suddenly it's not just the companies building frontier models checking themselves.
The signal
Watch whether researchers and companies actually adopt cheaper verifier models in their production pipelines, or whether frontier models remain the default despite the cost gap closing.