AI vision models still struggle with spatial reasoning, new training method shows promise

What happened

Researchers found that when AI vision models are given 3D geometry information alongside 2D images, they still ignore the geometry and rely on surface-level visual patterns. They developed a training technique that masks parts of 2D information to force the model to use geometry instead, improving performance on spatial reasoning tasks.

Why it matters

This documents a real limitation in how current AI vision systems understand space and position — they can describe what they see but struggle to reason about where things are relative to each other, which matters for robotics, autonomous systems, and any application where spatial accuracy matters beyond decoration.