Rademacher complexity

Machine learning


Given a function class \(f_w\) and random iid \(y_\mu \in \{\pm 1\}\), the Rademacher complexity is \[ \mathscr{R}_n = \mathbb{E}_{y, X }\text{sup}_w \frac{1}{n} \sum_{\mu = 1}^n y_\mu f_w(X_\mu) \]

It measures how well a function can approximate a dataset with random labels.

(Bartlett, Mendelson 2002) shows bounds for the Rademacher complexity in terms of \(\ell_1\) norm bounds on the weights of the network. However, (Zhang et al. 2017) showed that these bound are mostly void as deep neural networks have no trouble fitting random labels but still generalize relatively well.


  1. . . "Rademacher and Gaussian Complexities: Risk Bounds and Structural Results". J. Mach. Learn. Res. 3:463–82. http://jmlr.org/papers/v3/bartlett02a.html.
  2. . . "Understanding Deep Learning Requires Rethinking Generalization". In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Sy8gdB9xx.
Last changed | authored by


← Back to Notes