The world is being quietly rearranged by people who write very long documents.


The title they went with GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks Noisy translates that to

AI still struggles to understand what users actually want from software


Researchers built a benchmark testing whether AI can watch someone use software like Photoshop and understand not just what they're clicking, but why they're doing it and when they need help. Current AI models fail badly at this — getting the user's intent right only 44% of the time — but performance jumps dramatically when you give the AI context about what the user is trying to accomplish.
Every AI assistant that tries to help you with creative software (design tools, video editors, spreadsheets) currently can only react to your clicks, not understand your actual goal. This benchmark measures whether AI can shift from automation to genuine collaboration, which requires solving a fundamentally harder problem than just predicting the next action.

If you insist
Read the original →