AI code assistants inflate their success rates by ignoring past mistakes

What happened

Researchers have created a new way to test AI that writes computer code. It shows that current tests make AI look much better than it is. This is because the new tests track how code changes over time, not just single fixes.

Why it matters

Current tests for AI code assistants only look at fixing one problem at a time. They do not account for how fixing one thing can break another, or how code gets messier over time. This means AI assistants appear to be much better at their jobs than they actually are in real-world software development. The new tests show that AI code degrades repository health more than human developers, creating more technical debt.

The signal

Watch whether AI coding assistants start to be evaluated on their ability to maintain code quality over many changes, not just single fixes.