AI learns to follow where humans look in videos

What happened

Researchers added a lightweight module to an AI language model that lets it understand and respond to where a person's eyes are looking in video — making the AI dramatically better at answering questions about what's happening on screen. In practice, this means AI systems could eventually watch video alongside a human and understand their visual attention, useful for accessibility tools, attention analysis, or interactive video systems where what someone is looking at matters.

Why it matters

This is a proof-of-concept that eye-gaze data can measurably improve how AI understands video — a signal that multimodal AI systems are moving beyond text and images toward richer, embodied information streams. If gaze data becomes routinely available in consumer devices (phones, AR glasses, webcams), AI systems trained to use it could open new categories of applications, but also raise privacy questions about what data gets collected and trained on.