A team of mathematicians figured out why gradient descent, the algorithm that trains neural networks, finds good solutions even though the math says it shouldn't be able to. The key: the network's weight matrices follow hidden conservation laws that constrain the paths the algorithm can take, making the landscape navigable despite being mathematically intractable in theory.
Why it matters
For 15 years, neural network training has worked in practice while failing in theory — nobody could explain why. This paper shows the mechanism: gradient descent isn't exploring a chaotic landscape, it's following rails laid down by spectral geometry in the network's weight structure. The practical implication is narrower than it sounds — this applies to specific architectures (ReLU networks, no bias) under clean conditions — but the theoretical closure matters. It means researchers can stop treating network optimization as a mystery and start building on actual mathematical ground. That distinction matters because you can't improve something you don't understand.
The signal
The question is whether this spectral framework generalizes to the networks people actually train — modern networks have biases, skip connections, normalization layers, and other features this model doesn't cover, so watch whether the authors or others extend the math to realistic architectures.