AI vision models learn to catch themselves making stuff up

What happened

Researchers found that image-analyzing AI models tend to ignore pictures and rely on guessing when generating long text, but they already have the ability to double-check their work — they just don't use it automatically. A new training method teaches these models to pause and verify what they see in the image before continuing, which cuts down hallucinations and makes their reasoning more grounded in actual evidence.

Why it matters

This is a lab result showing one path to reducing a specific failure mode in multimodal AI, but it hasn't been deployed at scale, doesn't measure real-world impact, and doesn't demonstrate that the improvement persists when models encounter genuinely novel situations outside controlled benchmarks.