AI researchers still don't know which tricks actually work when scaling up smaller language models

What happened

Researchers tested different ways to reuse smaller AI models as starting points for bigger ones, and found no single best method — what works depends on whether you're optimizing for speed or accuracy. This matters because scaling up existing models is cheaper than training from scratch, but the field doesn't yet have reliable rules for which shortcuts to take.

Why it matters

For years, AI labs have assumed that the simplest approach to scaling — copying weights directly — would preserve progress. This paper shows that assumption breaks down depending on the specific scenario: exact copies work best in some situations, but structured changes win in others. The practical consequence is that teams building larger models from smaller checkpoints will need to test multiple approaches instead of relying on a single method, which adds time and computational cost to a process that's supposed to be cheaper than starting over.

The signal

The question is whether larger AI labs will adopt these multi-method testing approaches as standard practice, or continue betting on single heuristics and only learn this lesson when they hit scaling walls.