The world is being quietly rearranged by people who write very long documents.


The title they went with HippoCamp: Benchmarking Contextual Agents on Personal Computers Noisy translates that to

AI assistants fail to search personal computers — benchmark exposes 48% accuracy gap


Researchers built a test where AI assistants try to find files and answer questions about a person's computer — the kind of thing a personal AI assistant would actually need to do. Current best-in-class AI systems only succeed about half the time, especially when files are spread across many formats and the answer requires connecting information from multiple documents.
This is the first serious measurement of whether AI can actually work on your personal files — not on generic web tasks or simulated environments, but on real computers with real messy data. The gap matters because personal AI assistants (the products companies are now shipping) need to handle exactly this kind of work, and right now they're failing at scale. If deployed today, a system with 48% accuracy on file search would be unreliable enough that most users would stop trusting it within days.
Monitor whether commercial AI companies release personal assistant products in the next 12 months that attempt this task, and whether their public accuracy claims on file retrieval exceed the 48% ceiling found here.

If you insist
Read the original →