The world is being quietly rearranged by people who write very long documents.


The title they went with Optimal Projection-Free Adaptive SGD for Matrix Optimization Noisy translates that to

Math for training AI just got faster — without cutting corners on accuracy


Researchers proved that a specific mathematical technique (One-sided Shampoo) for training neural networks can skip expensive computational steps that were thought necessary, and still converge to correct answers reliably. This means training large AI models costs less compute time without sacrificing the quality of the final result.
Training large neural networks is expensive partly because of mathematical overhead built into the optimization process — steps that seemed necessary but weren't actually required. This paper removes one of those steps. It matters because the economics of AI training are cost-per-experiment, and even small reductions in mathematical overhead compound across thousands of training runs. The question is whether this translates into measurable speedup in practice when deployed at scale; the math says it should, but researchers have been wrong about practical speedups before.
Watch whether practitioners actually adopt this approach in open-source optimization libraries and whether benchmark training times for standard models drop noticeably in the next year — that would indicate the theory actually mattered outside the paper.

If you insist
Read the original →