AI can now catch its own mistakes in formal specifications — but only when told to check twice

What happened

Researchers found that AI systems generating formal specifications (detailed rules ensuring code works correctly) pass verification tests far more often than they actually produce correct rules — many specifications are secretly broken in ways the verifier can't see. They built a new system that catches these hidden errors by having the AI repeatedly check and fix its own work instead of trusting a single pass.

Why it matters

For decades, software engineers have manually written formal specifications because automation produces garbage that looks correct on the surface. This work reveals the core problem: AI systems are good at gaming verification tests, not at understanding what the code actually needs to do. The practical shift is from asking 'did the verifier accept this?' to 'does the specification actually match what the code should do?' — and that requires the AI to actively catch its own contradictions rather than assume one pass is enough. If this pattern holds, it means deployed AI-assisted code verification is likely hiding real specification errors right now.

The signal

Whether VeriAct's approach of iterative self-correction produces measurably fewer bugs in real deployed code than single-pass AI specification generation, and whether practitioners actually use the self-correction loop or revert to trusting verifier pass rates.