The world is being quietly rearranged by people who write very long documents.


The title they went with Self-Routing: Parameter-Free Expert Routing from Hidden States Noisy translates that to

AI researchers eliminate the routing mechanism in large language models without losing performance


Researchers found that large language models can route computational work to specialized subsystems without needing a separate learned routing component — the hidden state itself contains enough information to do the job. This suggests that some of the complexity engineers have added to make models work faster might be unnecessary overhead.
For the past several years, the standard way to speed up large language models has been to add a routing layer — essentially a small neural network that decides which expert module should process each token. This research shows the routing decision can come directly from the model's internal representation, cutting out parameters and computation. If this holds up across different model sizes and domains, it means some of the complexity built into state-of-the-art systems was solving a problem that didn't need solving, which could simplify how companies build and deploy large models.
Whether subsequent research finds Self-Routing scaling efficiency breaks down on larger models (GPT-3 scale and above) or whether it remains competitive, which would indicate whether this is a genuine simplification or a trick that only works at smaller scales.

If you insist
Read the original →