New metric reveals AI chatbots stay dangerously overconfident even when they should stay silent

What happened

Researchers built a test that measures whether AI language models know when to shut up and admit uncertainty instead of confidently giving wrong answers. The test shows that even the best current models are terrible at this, and standard confidence measurements miss the problem entirely.

Why it matters

AI systems are already deployed in places where wrong answers have real costs—medical diagnosis, legal research, financial advice. A confident wrong answer is worse than an uncertain one, because people act on it. This paper shows that current ways of measuring AI reliability don't catch the cases where the model is most dangerous: when it sounds sure but shouldn't be. The gap between how calibrated a model looks on standard tests and how badly it overconfidently fails on specific errors is large enough that two models can look identical on existing metrics but behave completely differently in the world.

The signal

Watch whether deployment teams at OpenAI, Anthropic, and Google start using this metric (or variants of it) in their own safety evaluations, and whether they publish results showing their models improved.