AI agents still can't handle plan changes mid-task — first systematic test shows the gap

What happened

Researchers created the first rigorous test of whether AI language models can adapt when a user changes their request halfway through a long, complex task like web navigation. The test reveals that even the most powerful AI models struggle to abandon their original plan, adjust course efficiently, or understand revised goals without starting over.

Why it matters

Right now, AI agents are being deployed for real work — booking travel, managing schedules, filing documents — but they're fragile. If a user needs to change course mid-task, most agents get confused or revert to starting from scratch, wasting work already done. This paper documents exactly where and how they fail, which means companies building AI assistants now know what they're shipping: tools that break under a very normal condition (people changing their minds). The gap is big enough that it should change how companies think about deploying these systems for high-value work where interruptions are inevitable.

The signal

Monitor whether major AI agent products (ChatGPT with tasks, Claude with computer use, browser automation tools) add explicit interruption-handling to their product roadmaps within the next 12 months, or whether users report frustration with mid-task changes in support forums.