The world is being quietly rearranged by people who write very long documents.


The title they went with ATANT: An Evaluation Framework for AI Continuity Noisy translates that to

First test to measure whether AI systems actually remember context across conversations


Researchers built the first standardized test to measure whether AI systems can reliably remember and retrieve facts from past conversations without mixing them up. This matters because AI companies have added memory tools (vector databases, long context windows) but nobody could actually verify they work — now someone can.
Right now, companies claim their AI systems have 'continuity' or 'memory,' but there's no agreed way to measure whether that's true or marketing. This framework gives you a ruler. The test runs 250 different life stories through the system and checks whether it retrieves the correct fact for the correct person without cross-contamination — a deceptively hard problem that degrades as the database grows. If this gets adopted, it stops being possible to ship memory systems without proving they actually work at scale.
Watch whether major AI labs (OpenAI, Anthropic, Claude, etc.) start running their systems against ATANT and publishing results, or whether they ignore it because the scores would be embarrassing.

If you insist
Read the original →