The world is being quietly rearranged by people who write very long documents.


The title they went with Voxtral TTS Noisy translates that to

Speech AI can now clone voices from 3 seconds of audio — and sounds more natural than commercial tools


A new text-to-speech model (Voxtral TTS) can generate realistic speech in multiple languages using only a brief audio sample, and human listeners prefer it to existing commercial products. This means voice cloning just became easier to do and harder to detect.
Voice cloning has been getting cheaper and faster for years, but this crosses a threshold: you now need less than five seconds of audio to create a convincing synthetic voice. The model is being released openly under a noncommercial license, which means researchers, hobbyists, and bad actors all get access immediately. The practical problem isn't the technology itself — it's that the barrier to entry for voice impersonation just dropped significantly, and detection tools haven't kept pace.
Watch whether synthetic voice detection tools can reliably identify Voxtral-generated speech, or whether it becomes another arms race where detection lags behind generation.

If you insist
Read the original →