What happened
Researchers discovered that self-supervised vision models like DINO hide object-detection ability scattered throughout their internal layers, not just in the final output — and a new method called Object-DINO can extract this information without retraining. In practice, this means image analysis systems can now locate objects in photos more accurately and reduce hallucinations (false object detections) in AI systems that combine images with text.
Why it matters
For years, researchers assumed object recognition happened in one place (the final layer). This shows the capability is distributed across the entire network, which changes how we should design and extract information from these models — potentially unlocking better performance from systems already trained and deployed.