The world is being quietly rearranged by people who write very long documents.


The title they went with mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT Noisy translates that to

New algorithm reduces wasted training when teaching AI multiple tasks


Researchers found that when training large language models on multiple different tasks at once, some tasks learn faster than others—the fast ones overfit (memorize their data) while slow ones don't learn enough. They built an algorithm that detects which task is overfitting earliest, temporarily removes it, and continues training on the others, then brings it back at a better point. This means the same amount of computational work produces better-performing models.
Most AI training budgets are fixed and allocated equally across all tasks, which is wasteful—it's like trying to teach a group of students with one standardized lesson plan when they learn at different speeds. This algorithm makes training more efficient by letting tasks get individualized attention timing, so you either get better performance for the same cost or the same performance for less computing power.

If you insist
Read the original →