The world is being quietly rearranged by people who write very long documents.


The title they went with ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Noisy translates that to

New test reveals major reasoning gaps in image-generating AI models


Researchers built a new benchmark to stress-test visual AI systems—revealing that even top models fail basic reasoning tasks like understanding physics, cause-and-effect, and spatial relationships, despite producing visually realistic images. This matters because it exposes a gap between what these models appear to do (generate convincing pictures) and what they actually understand (very little about how the world works).
For years, AI image generators have been evaluated mainly on whether humans think the output looks good—a metric that hides whether the model actually understands what it's generating. This benchmark makes that gap visible and measurable, which changes what builders and buyers can actually claim about these systems' capabilities.

If you insist
Read the original →