The world is being quietly rearranged by people who write very long documents.


The title they went with MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization Noisy translates that to

First real-world benchmark for AI agents remembering users across years and domains


Researchers created the first large-scale test of whether AI language models can actually remember and recognize individual users over long periods and across different topics, using real shopping behavior from Amazon instead of fake scripted conversations. Most AI memory systems today fail this test badly — they can't reliably track what an actual person cares about over months or years, which matters because companies want AI assistants that feel personalized and remember you.
Until now, AI memory systems have only been tested on short, artificial conversations that don't reflect how real people interact with AI over time. This benchmark reveals that current memory methods don't work well enough for the personalization that companies are betting on — meaning either AI assistants will need fundamentally different memory architecture, or personalization claims are getting ahead of what the technology can actually do.

If you insist
Read the original →