What happened
Researchers found that when training large language models on multiple different tasks at once, some tasks learn faster than others—the fast ones overfit (memorize their data) while slow ones don't learn enough. They built an algorithm that detects which task is overfitting earliest, temporarily removes it, and continues training on the others, then brings it back at a better point. This means the same amount of computational work produces better-performing models.
Why it matters
Most AI training budgets are fixed and allocated equally across all tasks, which is wasteful—it's like trying to teach a group of students with one standardized lesson plan when they learn at different speeds. This algorithm makes training more efficient by letting tasks get individualized attention timing, so you either get better performance for the same cost or the same performance for less computing power.