AI language models can now generate multiple words at once, speeding up output

What happened

AI language models usually generate text one word at a time. New research shows they can generate several words in parallel. This makes them faster and more efficient to train and use.

Why it matters

AI language models have been limited by generating text one word at a time. This fundamental design choice has constrained how fast and efficiently they can operate. This new method breaks that constraint, allowing models to process and output language in multi-word chunks. It means AI could become significantly faster and cheaper to run, opening up new uses where speed is critical.

The signal

Watch for major AI labs to incorporate this multi-token generation method into their next models, or for new open-source models to adopt it.