Language models store facts backward and forward as separate memories — not unified understanding

What happened

Researchers tested whether language models that learn facts in both directions (A implies B, B implies A) actually develop a single unified understanding or just memorize both routes separately. It turns out they memorize both routes as distinct entries with different internal structures, rather than developing a direction-agnostic concept of the relationship.

Why it matters

This matters because it shows a fundamental gap between how language models appear to work and how they actually work internally. A model that says it understands "Paris is the capital of France" may just have stored that phrase as a lookup table, not as a genuine concept. The implication is uncomfortable: training techniques that make models produce correct answers don't necessarily mean the model understands anything — it just means the model learned to retrieve the right output for the right input. This breaks a common assumption in AI development: that better performance on test benchmarks corresponds to deeper, more flexible internal reasoning.

The signal

Watch whether subsequent training methods designed to fix this (by explicitly forcing unified representations) actually make models more reliable on novel tasks they haven't seen before, or whether they still rely on memorization.