The world is being quietly rearranged by people who write very long documents.


The title they went with When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation Noisy translates that to

AI agents still can't handle plan changes mid-task — first systematic test shows the gap


Researchers created the first rigorous test of whether AI language models can adapt when a user changes their request halfway through a long, complex task like web navigation. The test reveals that even the most powerful AI models struggle to abandon their original plan, adjust course efficiently, or understand revised goals without starting over.
Right now, AI agents are being deployed for real work — booking travel, managing schedules, filing documents — but they're fragile. If a user needs to change course mid-task, most agents get confused or revert to starting from scratch, wasting work already done. This paper documents exactly where and how they fail, which means companies building AI assistants now know what they're shipping: tools that break under a very normal condition (people changing their minds). The gap is big enough that it should change how companies think about deploying these systems for high-value work where interruptions are inevitable.
Monitor whether major AI agent products (ChatGPT with tasks, Claude with computer use, browser automation tools) add explicit interruption-handling to their product roadmaps within the next 12 months, or whether users report frustration with mid-task changes in support forums.

If you insist
Read the original →