LayerNorm

tags: Neural networks
paper: (Ba et al. 2016)

Definition

Layer Normalization is a technique used in deep learning to normalize the inputs to a layer in a neural network.

In batch normalization, the mean and variance of each batch of inputs to a layer are used to normalize the inputs. In layer normalization, the mean and variance of all the features in a layer (i.e., all the inputs for a given instance) are used to normalize the inputs.

For inputs \(x\), the output of LayerNorm is computed as follows:

\begin{equation} y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \end{equation}

Each feature in the layer is normalized independently, which can help when the features have different ranges and distributions. Layer normalization is also less sensitive to the size of the batch, and it can be used with batch sizes of 1.

It has been found to be particularly effective in reducing the internal covariate shift problem, which is the tendency of the distribution of inputs to a layer to change as the parameters of previous layers are updated during training. By normalizing the inputs to each layer, layer normalization can help stabilize the neural network learning process and improve convergence.

Bibliography

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. July 21, 2016. "Layer Normalization". arXiv. DOI.

Definition

Bibliography

Links to this note

Comments