The world is being quietly rearranged by people who write very long documents.


The title they went with Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Noisy translates that to

New pipeline makes real-time two-way speech AI training easier


Researchers released an open-source data processing pipeline that makes it much easier to prepare training data for AI systems that can listen and speak simultaneously in natural conversation — the kind that handles overlapping speech and natural interruptions. Right now, speech AI mostly trains on single-speaker recordings, so building systems that feel like actual back-and-forth conversation requires solving hard problems (like figuring out who is speaking when) that existing tools struggle with.
Full-duplex speech systems that handle real conversation remain bottlenecked by the scarcity of good training data and the difficulty of processing it cleanly; removing that data-preparation bottleneck is what lets this entire class of AI move from lab experiments to deployment at scale.

If you insist
Read the original →