The world is being quietly rearranged by people who write very long documents.


The title they went with Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction Noisy translates that to

You can now train giant AI models on a handheld gaming device — if you compress the weights the right way


A new training method compresses the dense weight matrices inside AI models using mathematical factorization, reducing memory use by up to 199x per layer without materializing the full dense matrix at any point. This means you can train a 70-billion-parameter model on a Steam Deck with 7.2 GB of RAM instead of needing 1,245 GB — making it feasible to run full model training on consumer hardware rather than specialized clusters.
For years, training large language models required expensive GPUs or TPU clusters because the sheer memory footprint of storing and updating all the weights was the limiting factor. This method breaks that bottleneck by keeping weights in a compressed spectral form throughout training, which means a researcher or small team can now iterate on model architectures without renting cloud compute by the hour. The catch: rank-sweep experiments show that learning rate matters far more than the compression level, meaning the real bottleneck has shifted from memory to something else — and that's useful information for where to optimize next.
Whether rank-128 spectral compression (the efficiency sweet spot identified here) actually scales to training 70B+ models end-to-end without convergence degradation, or if the losses measured at 2,000 steps on a small model stay flat when you train full models for hundreds of billions of tokens.

If you insist
Read the original →