Layer Normalization

Table of Contents

What is Layer Normalization?
#

Layer Normalization is a normalization technique that adjusts mean to 0 and variance to 1 for all values in a specific layer.

$$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \tag{1} $$

Where:

$$ \mu = \frac{1}{H} \sum_{i=1}^{H} x_i \tag{2} $$$$ \sigma^2 = \frac{1}{H} \sum_{i=1}^{H} (x_i - \mu)^2 \tag{3} $$

After normalization, apply scale ($\gamma$) and shift ($\beta$) parameters:

$$ y_i = \gamma \hat{x}_i + \beta \tag{4} $$

Benefit	Description
Activation Stabilization	Prevents values from becoming too large or small during forward propagation
Gradient Protection	Mitigates gradient explosion/vanishing problems