The world is being quietly rearranged by people who write very long documents.


The title they went with An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations Noisy translates that to

LLMs make up library code 8-40% of the time. Static analysis catches less than half of it.


When large language models write code that uses external libraries, they invent features that don't actually exist in those libraries in 8 to 40 percent of responses. Static analysis tools (automated code inspection) can catch 16 to 70 percent of these fake-library errors, leaving a floor of real problems that no automated tool will ever catch.
This is the gap between what automation can and cannot solve. LLMs hallucinate libraries routinely, and the paper shows the absolute upper bound on what code inspection tools can ever catch is around 77 percent — meaning even if you build a perfect detector, 23 percent of hallucinated code will slip through because the problem is invisible to static analysis. This matters because it clarifies the real cost of using LLMs to write code: you cannot automate your way out of the problem. Someone still has to read and verify the generated code.
Track whether production code-generation systems (GitHub Copilot, Claude, etc.) add mandatory static analysis checks to their output before showing it to users, and whether those checks actually reduce hallucinated-library code in real deployed codebases.

If you insist
Read the original →