AI now isolates individual sounds from videos using text commands

What happened

Researchers built an AI system that watches a video and generates only the audio you ask for — ignoring all other sounds in the scene. This matters for film and music production, where sound engineers need to control each audio element separately instead of recording or extracting everything at once.

Why it matters

Video-to-audio generation has been one-way (all sounds or nothing); selective extraction by text description opens a new workflow in professional media production where individual sound tracks are currently separated manually or through expensive re-recording.