The world is being quietly rearranged by people who write very long documents.


The title they went with Linear Representations of Hierarchical Concepts in Language Models Noisy translates that to

Language models store concept hierarchies in simple, readable mathematical patterns


Researchers found that language models encode hierarchical relationships (like 'Japan is in Asia') using straightforward linear patterns that can be recovered and read directly from the model's internal numbers. This means the abstract knowledge in these models isn't buried in noise — it's organized in a legible way that lets you trace exactly how a model knows what belongs inside what.
For years, researchers have treated language model internals like a black box — you feed in text, you get output, and what happens in between stays opaque. This paper shows that at least for hierarchical knowledge, the encoding is clean enough to read. That matters because it's the first concrete evidence that some of what models know isn't scrambled or distributed across billions of parameters in an irretrievable way. If hierarchy is stored linearly, other structured knowledge might be too. That opens a real question: can you now audit what a model actually knows, or correct what it knows, by editing these representations directly instead of retraining?
Watch whether other research teams can use these linear representations to do something practical — catch false beliefs a model holds, fix systematic errors, or transfer knowledge between models — before the next wave of larger models makes these representations obsolete.

If you insist
Read the original →