The world is being quietly rearranged by people who write very long documents.


The title they went with Hear What Matters! Text-conditioned Selective Video-to-Audio Generation Noisy translates that to

AI now isolates individual sounds from videos using text commands


Researchers built an AI system that watches a video and generates only the audio you ask for — ignoring all other sounds in the scene. This matters for film and music production, where sound engineers need to control each audio element separately instead of recording or extracting everything at once.
Video-to-audio generation has been one-way (all sounds or nothing); selective extraction by text description opens a new workflow in professional media production where individual sound tracks are currently separated manually or through expensive re-recording.

If you insist
Read the original →