AI can now write software specs that actually work — when it fails, another AI fixes it

What happened

Researchers built a system where one AI writes formal specifications for code, and if it fails verification, a second AI uses the error message to fix it. In tests on 72 real programs, this two-stage approach succeeded 93% of the time and ran 27% faster than earlier methods.

Why it matters

Writing formal specifications — precise mathematical descriptions of what code should do — is a bottleneck in software correctness. It's tedious, requires expertise, and most teams skip it entirely. If AI can generate specs that actually verify, and repair its own failures without human intervention, the cost of formal methods drops from 'hire a specialist' to 'run the pipeline.' The catch is that this still works on 72 test programs in a lab. The signal is whether this pattern holds on real codebases at actual companies.

The signal

Watch whether software teams at companies begin using AI-generated specs in their development pipelines, and whether the failure rate stays low when programs are bigger and messier than the benchmark.