What happened
A new scheduling system makes AI services respond faster when users send requests with different media types — videos, images, and text — together. Right now, one large video request can bog down the entire system; this change lets smaller text requests go through quickly while bigger video requests process in the background, cutting wait times for interactive requests by 78%.
Why it matters
As AI services handle richer inputs (video, not just text), the bottleneck has shifted from compute to scheduling — the order in which requests get processed. This work shows the bottleneck is solvable with software alone, meaning companies can deploy multimodal AI without buying expensive new hardware, and users get usable latency instead of frustrating delays.