Aligned AI models give same answers too often, breaking uncertainty detection

What happened

When AI language models are trained to be helpful and harmless, they start producing nearly identical responses to the same question — on some tasks, 40-79% of repeated queries yield the same answer. This makes it impossible for uncertainty-detection methods to work, because there's no variation to measure; the system appears confident even when it shouldn't be.

Why it matters

AI systems are being deployed in high-stakes domains like medical diagnosis and legal research where knowing when you're uncertain is as important as getting the right answer. If alignment training (the process that makes models safer and more useful) inadvertently breaks the signals that tell you when a model is guessing, you've traded safety in one dimension for blindness in another. This reveals a hidden cost of current alignment methods that affects how reliably we can use these systems in domains where wrong answers need to be flagged, not confidently delivered.