The world is being quietly rearranged by people who write very long documents.


The title they went with Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space Noisy translates that to

AI models recognize a consistent 'self' even when it's described differently


Researchers found that large language models form a stable internal representation of an agent's identity, even when that identity is rephrased. This 'identity attractor' means the model's internal state converges to a similar point regardless of how the identity is described.
This paper suggests that AI models can develop a persistent sense of 'self' or identity. If an AI can maintain a stable internal representation of who it is, even through varied inputs, it changes how we think about building and controlling AI agents. It means that an AI's core identity might be more robust and less easily swayed by minor changes in prompts than previously thought.
Watch for future research that tests whether this 'identity attractor' can be intentionally manipulated or if it makes AI agents more resistant to adversarial attacks on their core programming.

If you insist
Read the original →