What happened
Computer scientists have figured out how to train small language models using extremely compressed numerical representations (ternary weights: -1, 0, or +1) from the start, rather than compressing already-trained models afterward. This cuts memory use by 2.4x and makes it practical to run useful language models on phones, laptops, and other devices that normally can't handle them.
Why it matters
The bottleneck for deploying language models on edge devices has been memory — most useful models are too big to fit. If you can train models that are natively compact rather than post-hoc compressed, you remove a major cost barrier to scaling AI inference on consumer hardware, and you also get a side benefit: the discrete weights act as a regularizer, preventing overfitting on small training datasets.