Researchers teach AI agents to fix themselves while deciding which agent to ask

What happened

A new system lets multiple AI agents improve their own performance while simultaneously learning which agent is best for each question. Previously, systems either improved the agents or improved the routing between them, but not both at once. This means AI systems can get better at self-correction while getting smarter about which tool to use for which problem.

Why it matters

This is an architecture problem, not a capability breakthrough. The signal is structural: it removes the choice between optimizing the router or optimizing the agents. In practice, this matters because real deployed multi-agent systems today work with the agents they have and accept fixed collaboration schemes. If this approach scales to production systems, it means AI systems could continuously adapt their own reasoning without human retraining. The question is whether this makes multi-agent reasoning reliable enough to use outside research benchmarks, where real-world questions don't fit the test cases.

The signal

Whether production AI systems (customer support, technical troubleshooting, medical advisors) that use multiple agents start implementing co-evolution or whether the overhead of continuous self-improvement makes it impractical compared to periodic human retraining.