AI agents explore faster by skipping policy training

What happened

Researchers found that training AI agents to both explore and execute tasks together is inefficient — the system wastes computation on policy optimization when it just needs to discover new states. They built a simpler method that uses tree search and uncertainty measurement to explore much faster, then distills the discovered paths into working policies afterward, achieving better results on notoriously hard exploration problems.

Why it matters

This suggests a structural inefficiency in how current AI systems are built: forcing exploration and task execution through the same machinery adds overhead that slows discovery by an order of magnitude. If this decoupled approach generalizes beyond the test benchmarks, it could change how exploration-heavy AI systems are designed, making some previously unsolvable tasks tractable.