Language models hide reasoning they can't explain — and we still don't know what that means

What happened

Researchers probed the internal representations of large language models and found they can detect analogies their explicit answers miss. This suggests models may be reasoning in ways their training doesn't expose when you ask them direct questions.

Why it matters

For years, researchers have assumed that what a language model says is what it knows. This work suggests the relationship is messier — models may contain latent reasoning that prompting fails to access. The practical consequence is blunt: you cannot trust a model's explicit answer as proof of what it actually understands, and you cannot assume its failures reflect genuine cognitive limits rather than access problems.

The signal

Watch whether follow-up work confirms this gap holds across different domains and model sizes, or whether it's an artifact of how these specific analogies were designed and measured.