NeRF Summary

Overview
#

NeRF (Neural Radiance Fields) represents 3D scenes as continuous functions learned by neural networks, enabling high-quality novel view synthesis from a set of input images.

Core Concept
#

NeRF learns a function:

$$ F_\theta: (x, y, z, \theta, \phi) \rightarrow (r, g, b, \sigma) $$

Input:

Position: $(x, y, z)$
View direction: $(\theta, \phi)$

Output:

Color: $(r, g, b)$
Density: $\sigma$

Pipeline
#

Input Images → Camera Poses → Ray Casting → MLP Network → Volume Rendering → Output Image

1. Positional Encoding
#

Convert coordinates to higher dimensions for better learning:

$$ \gamma(p) = (\sin(2^0\pi p), \cos(2^0\pi p), ..., \sin(2^{L-1}\pi p), \cos(2^{L-1}\pi p)) $$

3D coordinates → 60-dimensional representation (L=10)

2. Ray Sampling
#

For each pixel, cast ray from camera:

$$ \mathbf{r}(t) = \mathbf{o} + t\mathbf{d} $$

Where:

$\mathbf{o}$: Camera origin
$\mathbf{d}$: Ray direction
$t$: Distance along ray

3. Volume Rendering
#

Accumulate color along ray:

$$ C(\mathbf{r}) = \int_{t_n}^{t_f} T(t) \cdot \sigma(\mathbf{r}(t)) \cdot \mathbf{c}(\mathbf{r}(t), \mathbf{d}) \, dt $$

Where transmittance:

$$ T(t) = \exp\left(-\int_{t_n}^{t} \sigma(\mathbf{r}(s)) \, ds\right) $$

Training
#

Loss Function
#

Photometric loss between rendered and ground truth:

$$ L = \sum_{\mathbf{r} \in R} \| \hat{C}(\mathbf{r}) - C(\mathbf{r}) \|_2^2 $$

Process
#

Sample rays from training images
Sample points along each ray
Query MLP for color and density
Render pixel color via volume rendering
Backpropagate loss

Note: Each image requires backpropagation across all pixels.

Implicit Representation
#

Explicit (Voxels)	Implicit (NeRF)
Discrete coordinates	Continuous function
Fixed resolution	Arbitrary resolution
Memory intensive	Memory efficient
Fast inference	Slow inference

NeRF samples real-valued coordinates continuously, enabling high-detail synthesis without explicit point storage.

Inference
#

Define novel camera pose
Cast rays through each pixel
Sample points along rays
Query network for colors/densities
Accumulate via volume rendering

Complexity: Higher resolution = more computation. Accumulation stops when density reaches maximum threshold.

Limitations
#

Slow training and inference
Requires accurate camera poses
Static scenes only (original NeRF)
Per-scene optimization

Extensions
#

Method	Improvement
Instant-NGP	Fast training via hash encoding
Mip-NeRF	Anti-aliasing
NeRF-W	Handle varying lighting
D-NeRF	Dynamic scenes
3D Gaussian Splatting	Real-time rendering

Overview#

Core Concept#

Pipeline#

1. Positional Encoding#

2. Ray Sampling#

3. Volume Rendering#

Training#

Loss Function#

Process#

Implicit Representation#

Inference#

Limitations#

Extensions#