Cheaper AI answers by reusing past queries instead of running the full model

What happened

Researchers built a system that caches answers from expensive AI models and routes new questions to cheaper models when possible, escalating only genuinely hard queries to the powerful (and costly) model. This cuts the total cost of running AI services by reducing how often you need to run the expensive version, which matters because AI inference is increasingly the dominant operating cost for companies running chatbots and search engines.

Why it matters

This is an engineering optimization, not a breakthrough — but it points to a real structural shift: as AI inference becomes cheaper and more ubiquitous, the economics of serving it flip from 'run the same model for everyone' to 'route intelligently based on query difficulty and past answers.' If this generalizes, it changes how AI companies price and operate services, making them viable at much lower margins.