AI agents can now plan once, then execute — cutting language model calls from dozens to two or three

What happened

Most AI agents that use tools remake their entire plan after every single step, wasting compute and stacking errors. This paper introduces a system where the AI plans the full workflow upfront, then hands it off to deterministic execution that only calls back to the AI if something breaks. In practice, this means AI agents using external tools could run faster and fail less often.

Why it matters

Current tool-using AI systems are computationally expensive because they re-plan constantly. This is a real bottleneck for deployment — every extra language model call costs money and latency. The paper shows that pre-planning the workflow and only invoking repair when necessary can match or beat the current approach on most tasks, which means you could deploy these systems cheaper and faster. The catch: it only works well on tasks with clear structure; it fails when you need to adapt on the fly based on surprising results.

The signal

Watch whether this execution pattern gets adopted in production AI agents at scale. If companies building chatbots or code-writing agents start implementing pre-planned workflows instead of reactive re-planning, you'll see the latency and cost savings show up in their benchmarks or deployment timelines within 6–12 months.