Deep Learning Point

Table of Contents

R² Score (Coefficient of Determination)
#

R² measures how well the model explains variance in data.

$$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2} $$

R² Value	Quality
> 0.8	Very good model
0.6 - 0.8	Acceptable model
0.4 - 0.6	Needs improvement
< 0.4	Requires significant enhancement

Large input values cause unstable gradients during backpropagation.

Example with Chain Rule:

For a simple layer: $y = wx + b$, the gradient is:

$$ \frac{\partial L}{\partial w} = \frac{\partial L}{\partial y} \cdot x $$

Input x	Gradient
x = 2	-0.4 (stable)
x = 1000	-200 (unstable)

Large inputs amplify gradients across layers, causing explosion.

Sigmoid function:

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

Problem: At extreme values, sigmoid saturates:

Derivative approaches zero:

$$ \sigma'(x) = \sigma(x)(1 - \sigma(x)) $$

When $\sigma(x) \approx 0$ or $\sigma(x) \approx 1$:

$$ \sigma'(x) \approx 0 \quad \text{(vanishing gradient)} $$

Normalized inputs:

Input normalization is critical for: