The world is being quietly rearranged by people who write very long documents.


The title they went with PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Noisy translates that to

New video benchmark exposes limits of AI visual reasoning


Researchers created a dataset of 1,114 complex video questions that require AI systems to piece together evidence spread across time — not just analyze single moments — to answer correctly. Current best-in-class AI models score only 46%, while humans struggle too when they can't rewatch, suggesting that even our most advanced systems have fundamental gaps in understanding how things connect visually and temporally.
This reveals a genuine bottleneck in what AI can actually do: most video benchmarks test pattern-matching on simple tasks, but real-world video understanding requires connecting distant pieces of visual information over time, which current AI systems flatly cannot do reliably — this benchmark gives researchers a concrete map of where that gap is.

If you insist
Read the original →