Table of Contents

What is Layer Normalization?
#

Layer Normalization is a normalization technique that adjusts mean to 0 and variance to 1 for all values in a specific layer.

Formula
#

$$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \tag{1} $$

Where:

  • \(\mu\): Mean of all values in the layer
  • \(\sigma^2\): Variance of all values in the layer
  • \(\epsilon\): Small value for numerical stability
$$ \mu = \frac{1}{H} \sum_{i=1}^{H} x_i \tag{2} $$$$ \sigma^2 = \frac{1}{H} \sum_{i=1}^{H} (x_i - \mu)^2 \tag{3} $$

Learnable Parameters
#

After normalization, apply scale (\(\gamma\)) and shift (\(\beta\)) parameters:

$$ y_i = \gamma \hat{x}_i + \beta \tag{4} $$

Key Benefits
#

BenefitDescription
Activation StabilizationPrevents values from becoming too large or small during forward propagation
Gradient ProtectionMitigates gradient explosion/vanishing problems

Batch Norm vs Layer Norm
#

Batch NormLayer Norm
Normalization axisBatch directionFeature direction
Batch size dependencyYesNo
RNN/TransformerNot suitableSuitable