The world is being quietly rearranged by people who write very long documents.


The title they went with GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding Noisy translates that to

AI learns to follow where humans look in videos


Researchers added a lightweight module to an AI language model that lets it understand and respond to where a person's eyes are looking in video — making the AI dramatically better at answering questions about what's happening on screen. In practice, this means AI systems could eventually watch video alongside a human and understand their visual attention, useful for accessibility tools, attention analysis, or interactive video systems where what someone is looking at matters.
This is a proof-of-concept that eye-gaze data can measurably improve how AI understands video — a signal that multimodal AI systems are moving beyond text and images toward richer, embodied information streams. If gaze data becomes routinely available in consumer devices (phones, AR glasses, webcams), AI systems trained to use it could open new categories of applications, but also raise privacy questions about what data gets collected and trained on.

If you insist
Read the original →