The world is being quietly rearranged by people who write very long documents.


The title they went with Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering Noisy translates that to

Study reveals AI struggles to understand real software codebases


Researchers created the first benchmark testing whether AI language models can answer questions about entire software projects—not just isolated code snippets—by collecting 1,318 real developer questions across 134 open-source projects. The AI systems performed only moderately well, and when they did answer correctly, they were often just repeating answers found online rather than actually understanding how the code worked together.
This is the first empirical evidence showing that current AI tools can't reliably understand how real software systems actually work at scale, which matters because companies are increasingly betting on AI to help developers navigate large codebases—but the AI is essentially pattern-matching answers rather than reasoning through code.

If you insist
Read the original →