Table of Contents

Loss Functions
#

Mean Square Error (MSE):

$$ L_{MSE} = \frac{1}{n} \sum_{i} (y_i - q_i)^2 \tag{1} $$

Cross-Entropy (CE):

$$ L_{CE} = -\sum_{i} y_i \log(q_i) \tag{2} $$

Key Differences
#

Geometric Foundation
#

  • MSE: Based on Euclidean distance (L2 norm)
  • Cross-Entropy: Based on “Information Geometry”

Computational Approach
#

MSECross-Entropy
Distance calculation on flat 2D planeDirectional calculation on curved probability manifold
Additive errorMultiplicative probability

One-Hot Encoding Impact
#

With one-hot encoding:

  • MSE: Computes errors for all classes (including unnecessary calculations)

    $$ (0 - q_{incorrect})^2 $$
  • Cross-Entropy: Only computes probability of correct class

    $$ -\log(q_{correct}) $$

Convergence Behavior
#

As training progresses and correct probability approaches 1:

$$ \lim_{q_{correct} \to 1} -\log(q_{correct}) = -\log(1) = 0 $$

Cross-Entropy loss naturally converges to 0.