What happened
Researchers propose a small change to how large language models decide which computational experts to use: instead of giving each word the same number of experts, the system can now intelligently assign more experts to harder words and fewer to easy ones, without needing to retrain the model. This means the same language model can work faster and smarter by spending computational effort where it actually matters.
Why it matters
Large language models are becoming more efficient by only activating parts of themselves for each task, but current methods waste computation by treating every word equally — this shows how to fix that in a way that works with models already in use, which matters because efficiency gains compound when applied to billions of words processed daily.