The world is being quietly rearranged by people who write very long documents.


The title they went with Measuring LLM Trust Allocation Across Conflicting Software Artifacts Noisy translates that to

AI code assistants trust documentation over actual code — and miss bugs when they disagree


Researchers tested how well AI programming assistants spot problems when code, documentation, and tests conflict. It turns out they're much better at catching bugs in written documentation than at detecting when code quietly drifts away from what the documentation says it does.
AI code assistants are already in widespread use — they're built into most major development environments and used by millions of programmers daily. This finding shows a specific failure mode: when documentation looks correct but the actual code is subtly broken, the AI trusts the documentation and misses the bug. For critical systems (banking, medical devices, infrastructure), that's a real problem. Programmers using these tools need to know when the assistant is reliable and when it's confidently wrong.
Watch whether major code-assistant vendors (GitHub Copilot, Claude, etc.) add explicit consistency checking between documentation and implementation before offering suggestions on correctness-critical code.

If you insist
Read the original →