Rademacher complexity

tags: Machine learning

Definition

Given a function class \(f_w\) and random iid \(y_\mu \in \{\pm 1\}\), the Rademacher complexity is \[ \mathscr{R}_n = \mathbb{E}_{y, X }\text{sup}_w \frac{1}{n} \sum_{\mu = 1}^n y_\mu f_w(X_\mu) \]

It measures how well a function can approximate a dataset with random labels.

(Bartlett, Mendelson 2002) shows bounds for the Rademacher complexity in terms of \(\ell_1\) norm bounds on the weights of the network. However, (Zhang et al. 2017) showed that these bound are mostly void as deep neural networks have no trouble fitting random labels but still generalize relatively well.

Bibliography

Peter L. Bartlett, Shahar Mendelson. 2002. "Rademacher and Gaussian Complexities: Risk Bounds and Structural Results". J. Mach. Learn. Res. 3:463–82. http://jmlr.org/papers/v3/bartlett02a.html.
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. 2017. "Understanding Deep Learning Requires Rethinking Generalization". In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Sy8gdB9xx.

Definition

Bibliography

Comments