The world is being quietly rearranged by people who write very long documents.


The title they went with Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents Noisy translates that to

AI agents can now plan once, then execute — cutting language model calls from dozens to two or three


Most AI agents that use tools remake their entire plan after every single step, wasting compute and stacking errors. This paper introduces a system where the AI plans the full workflow upfront, then hands it off to deterministic execution that only calls back to the AI if something breaks. In practice, this means AI agents using external tools could run faster and fail less often.
Current tool-using AI systems are computationally expensive because they re-plan constantly. This is a real bottleneck for deployment — every extra language model call costs money and latency. The paper shows that pre-planning the workflow and only invoking repair when necessary can match or beat the current approach on most tasks, which means you could deploy these systems cheaper and faster. The catch: it only works well on tasks with clear structure; it fails when you need to adapt on the fly based on surprising results.
Watch whether this execution pattern gets adopted in production AI agents at scale. If companies building chatbots or code-writing agents start implementing pre-planned workflows instead of reactive re-planning, you'll see the latency and cost savings show up in their benchmarks or deployment timelines within 6–12 months.

If you insist
Read the original →