The world is being quietly rearranged by people who write very long documents.


The title they went with Towards Initialization-dependent and Non-vacuous Generalization Bounds for Overparameterized Shallow Neural Networks Noisy translates that to

Mathematicians tighten the bounds on why neural networks don't memorize — but only in theory


Researchers developed tighter mathematical limits on how well overparameterized shallow neural networks can generalize to new data, by tracking how far the trained network has drifted from its starting point rather than measuring the network's overall size. In practice, this means theorists can now make non-vacuous predictions about when a network won't just memorize training examples — but only for a narrow class of networks that don't match what practitioners actually build.
For years, theoretical machine learning has struggled to explain why neural networks with millions of parameters perform well on datasets with thousands of examples. This paper makes progress on one specific explanation: if a network stays close to its initialization, it can generalize better. The catch is that this bound only works for shallow networks with certain activation functions, and real deep networks used in production are nothing like this. The math works, but it doesn't touch the architectures that actually matter.
Watch whether this approach to initialization-dependent bounds scales to deeper networks or gets abandoned as a dead-end for explaining practical neural network behavior.

If you insist
Read the original →