New way to catch what AI leaves out — not just what it gets wrong
What happened
Researchers built a measurement method that checks whether LLMs actually cover all the important facts they should, not just whether the facts they do mention are correct. Until now, evaluation focused on precision — did the AI get it right — but ignored recall — did the AI even mention it.
Why it matters
For years, AI evaluation has had a blind spot: it catches hallucinations and errors, but not omissions. This paper shows that current LLMs fail at recall — they skip entire categories of relevant facts — which means a generated response can sound complete and accurate while actually being substantially incomplete. The measurement method itself is what matters here. If it gets adopted into standard evaluation, it could force model developers to build systems that don't just avoid making things up, but actually cover what they're supposed to cover.
The signal
Watch whether major LLM developers start reporting recall scores alongside precision scores in their model cards and technical reports over the next 12 months.