Researchers find evidence that AI models develop geometric representations of beliefs, like human reasoning
What happened
A research team developed a method to find hidden structures in how large language models represent information, then applied it to Gemma-2-9B and found 5 clusters that appear to encode belief states in geometric form. If confirmed, this would mean the model's internal reasoning works more like human probabilistic judgment than anyone realized.
Why it matters
For years, mechanistic interpretability research has been trying to understand what's actually happening inside transformer models — whether they're just pattern-matching or whether they develop something like internal models of the world. This paper shows that at least one production model (Gemma) appears to encode probabilistic beliefs as geometric structures, similar to how researchers theorized they should work. That narrows the interpretability problem from 'we have no idea what's in there' to 'we can point to a specific encoding scheme.' The practical implication is stark: if models reason using geometric representations of belief, then steering model behavior might be possible by manipulating those geometries directly, rather than fine-tuning weights blindly.
The signal
Watch whether other research groups can reproduce these five clusters in Gemma-2-9B, and whether the same geometric structures appear in other models of similar size — that would indicate this is a general property of large models, not a quirk of this one.