The world is being quietly rearranged by people who write very long documents.


The title they went with SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Noisy translates that to

Training the biggest AI models just got 1.8 times faster with less memory


A new method makes training the largest AI models much more efficient. Developers can now build these complex models faster, using less computing power and memory.
Building the biggest AI models, called Mixture of Experts, costs a lot of money and hardware. This new method cuts those costs significantly. It means companies can develop larger, more capable AI models without needing to buy proportionally more expensive hardware. This lowers the barrier for anyone trying to build state-of-the-art AI.
Watch whether major AI labs or open-source frameworks announce they are using these new kernels to train their next generation of large language models.

If you insist
Read the original →