Language models hide reasoning they can't explain — and we still don't know what that means
What happened
Researchers probed the internal representations of large language models and found they can detect analogies their explicit answers miss. This suggests models may be reasoning in ways their training doesn't expose when you ask them direct questions.
Why it matters
For years, researchers have assumed that what a language model says is what it knows. This work suggests the relationship is messier — models may contain latent reasoning that prompting fails to access. The practical consequence is blunt: you cannot trust a model's explicit answer as proof of what it actually understands, and you cannot assume its failures reflect genuine cognitive limits rather than access problems.
The signal
Watch whether follow-up work confirms this gap holds across different domains and model sizes, or whether it's an artifact of how these specific analogies were designed and measured.