The world is being quietly rearranged by people who write very long documents.


The title they went with Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models Noisy translates that to

AI safety monitoring fails silently in most models — one has a 57-token warning window before it breaks


Researchers tested seven language models to see if they give any internal warning before they break a rule or hallucinate, and found almost none do. Only one model showed a detectable signal 57 tokens before failure, meaning current safety approaches (checking outputs after the fact) miss the moment when the model actually decides to violate its training.
For three years, AI companies have relied on monitoring outputs after a model generates text. This paper shows that approach is fundamentally reactive—the model's internal state has already committed to the error before you see the words. The finding splits into two distinct problems: rule violations leave a measurable trace (sometimes, in one model), but hallucinations leave none at all. This means you cannot catch hallucination before it happens. Detection of made-up facts requires external verification—you have to check the answer against reality, not the model's internals.
Watch whether any deployed system actually uses internal monitoring of 'trajectory tension' (the metric this paper introduces) to catch rule violations before generation, versus continuing to rely on output filtering.

If you insist
Read the original →