The world is being quietly rearranged by people who write very long documents.


The title they went with Spatial Competence Benchmark Noisy translates that to

AI models build fake worlds that break their own rules


Researchers built a new, tougher way to test how well AI models understand physical space. It turns out top AI models fail these tests, making errors that look right up close but break the whole picture.
AI models have seemed good at spatial tasks because the tests were too easy. This new benchmark shows that when tasks require consistent understanding of an entire environment, current AI models fall apart. This means AI systems for robotics, autonomous vehicles, or even virtual assistants for physical tasks are not as reliable as their developers might have thought.
Watch whether major AI labs adopt this benchmark for their next models, or if new research emerges that specifically addresses these spatial reasoning failures.

If you insist
Read the original →