- tags
- Machine learning

## Two-layers neural network

Mathematically, a simple two-layers neural network with relu
non-linearities can be written like below. For an input vector \(x
\in \mathbb{R}^D\), \(\mathbf{a} = (a_1, \cdots, a_N)\in
\mathbb{R}^M\) are the *output weights*, \(\mathbf{b} =
(b_1, \cdots, b_N)\in \mathbb{R}^D\) are the *input
weights*

\[ h(x) = \frac{1}{m} \sum_{i=1}^m a_i \max\{ b_i^\top x,0\}, \]

## Universal approximation theorem

Cybenko showed in 1989 that a neural network of arbitrary width with sigmoid activation function could approximate any continuous function (Cybenko 1989).

Barron added rates of convergence by enforcing smoothness condition on the target function (Barron 1993).

## Bibliography

Barron, A. R. 1993. “Universal
Approximation Bounds for Superpositions of a Sigmoidal Function.”
*IEEE Transactions on Information Theory* 39 (3):930–45.

Cybenko, G. 1989. “Approximation by
Superpositions of a Sigmoidal Function.” *Mathematics of
Control, Signals, and Systems* 2 (4):303–14.