The world is being quietly rearranged by people who write very long documents.


The title they went with Finding Belief Geometries with Sparse Autoencoders Noisy translates that to

Researchers find evidence that AI models develop geometric representations of beliefs, like human reasoning


A research team developed a method to find hidden structures in how large language models represent information, then applied it to Gemma-2-9B and found 5 clusters that appear to encode belief states in geometric form. If confirmed, this would mean the model's internal reasoning works more like human probabilistic judgment than anyone realized.
For years, mechanistic interpretability research has been trying to understand what's actually happening inside transformer models — whether they're just pattern-matching or whether they develop something like internal models of the world. This paper shows that at least one production model (Gemma) appears to encode probabilistic beliefs as geometric structures, similar to how researchers theorized they should work. That narrows the interpretability problem from 'we have no idea what's in there' to 'we can point to a specific encoding scheme.' The practical implication is stark: if models reason using geometric representations of belief, then steering model behavior might be possible by manipulating those geometries directly, rather than fine-tuning weights blindly.
Watch whether other research groups can reproduce these five clusters in Gemma-2-9B, and whether the same geometric structures appear in other models of similar size — that would indicate this is a general property of large models, not a quirk of this one.

If you insist
Read the original →