What happened
Researchers found that when speech-to-text AI systems read back prior conversation turns to understand context, the audio data balloons and slows everything down — so they built a compression technique that shrinks that historical audio into a few learned tokens while keeping the transcripts readable. This makes conversational speech recognition faster and cheaper without losing most of the accuracy gains that come from understanding what was said before.
Why it matters
As voice AI systems move from single-utterance isolation (like voice commands) to actual conversation, efficiency becomes the bottleneck — this shows one way to keep context-awareness without the computational drag that would make real-time conversation prohibitively expensive.