Table of Contents
Overview#
Artificial Neural Networks (ANNs) like CNNs and Transformers pass continuous floating-point values between layers. But biological neurons do not work that way. Real neurons communicate through spikes — brief electrical pulses that either happen or do not. Spiking Neural Networks (SNNs) replicate this biological mechanism, and this fundamental difference changes everything about how the network computes and learns.
This post walks through the core ideas behind SNNs step by step: how a single spiking neuron works, how information is encoded in spikes, and how the network learns through Spike-Timing-Dependent Plasticity (STDP).
ANN vs SNN: What Is Different?#
Before diving into SNN internals, it helps to see the two paradigms side by side.
ANN Neuron:
Inputs (floats) → [Weighted Sum + Activation Function] → Output (float)
Example: 0.73, -0.12, 1.05 → ReLU(Σ wᵢxᵢ + b) → 0.84
SNN Neuron:
Inputs (spikes) → [Membrane Potential Accumulation] → Spike or Silence
Example: 1, 0, 1, 0, 0, 1 → V(t) accumulates → Spike! (if V ≥ Vth)| Aspect | ANN | SNN |
|---|---|---|
| Information | Continuous real values | Binary spike events |
| Time | No inherent time axis | Time is fundamental |
| Core operation | Multiply-Accumulate (MAC) | Accumulate (AC) |
| Energy per operation | High (FP multiply) | Low (addition only on spike) |
| Target hardware | GPU | Neuromorphic chip |
| Biological plausibility | Low | High |
The key insight is that SNNs are event-driven. A neuron only does work when it receives a spike. If no spike arrives, no computation happens. This is why SNNs can be dramatically more energy-efficient.
The LIF Neuron Model#
The Leaky Integrate-and-Fire (LIF) neuron is the most widely used spiking neuron model. It captures the essential behavior of a biological neuron while remaining simple enough to simulate efficiently.
How a Biological Neuron Works#
A real neuron maintains an electrical potential across its cell membrane. When input signals (from other neurons) arrive, this potential changes. If the potential crosses a threshold, the neuron fires a spike and sends it to downstream neurons. After firing, the potential resets.
LIF Step by Step#
The LIF neuron follows this cycle:
Step 1: Receive input spikes
↓
Step 2: Integrate — add weighted input to membrane potential V(t)
↓
Step 3: Leak — V(t) decays toward resting potential over time
↓
Step 4: Check threshold — is V(t) ≥ Vth?
├── Yes → Fire a spike! Then reset V(t) → Vreset
└── No → Go back to Step 1, continue accumulatingThe Continuous Equation#
The membrane potential \(V(t)\) evolves according to:
$$ \tau_m \frac{dV}{dt} = -(V(t) - V_{rest}) + R \cdot I(t) $$Where:
- \(\tau_m\): Membrane time constant — controls how fast the neuron “forgets” (typical: 10–20 ms)
- \(V_{rest}\): Resting potential — the baseline when no input arrives (often 0 or −70 mV)
- \(R\): Membrane resistance
- \(I(t)\): Input current at time \(t\)
The first term \(-(V - V_{rest})\) is the leak: it always pulls the potential back toward rest. Without new input, the neuron gradually returns to its resting state.
The second term \(R \cdot I(t)\) is the drive: input current pushes the potential up (or down).
Firing Condition#
When the membrane potential reaches the threshold:
$$ V(t) \geq V_{th} \implies \text{spike at time } t, \quad \text{then } V \rightarrow V_{reset} $$After firing, the neuron enters a refractory period during which it cannot fire again. This prevents runaway activity.
Discrete-Time Version (For Simulation)#
In practice, we simulate SNNs in discrete time steps \(\Delta t\). The equation becomes:
$$ V[t+1] = \beta \cdot V[t] + \sum_i w_i \cdot S_i[t] $$Where:
- \(\beta = e^{-\Delta t / \tau_m}\): Leak factor, a number between 0 and 1
- \(w_i\): Synaptic weight from input neuron \(i\)
- \(S_i[t]\): Spike from input neuron \(i\) at time step \(t\) (either 0 or 1)
The leak factor \(\beta\) controls the neuron’s memory:
| \(\beta\) value | Behavior |
|---|---|
| Close to 1.0 | Slow leak — neuron remembers inputs for a long time |
| Close to 0.0 | Fast leak — neuron forgets quickly |
| Exactly 0.0 | No memory — each time step is independent |
| Exactly 1.0 | No leak — membrane potential never decays (Integrate-and-Fire) |
Numerical Example#
Let us trace through a concrete example. Suppose:
- \(\beta = 0.8\), \(V_{th} = 1.0\), \(V_{reset} = 0.0\)
- One input synapse with weight \(w = 0.5\)
| Time | Input Spike | Computation | \(V[t]\) | Output Spike |
|---|---|---|---|---|
| 0 | 0 | 0.8 × 0.0 + 0.5 × 0 | 0.00 | — |
| 1 | 1 | 0.8 × 0.0 + 0.5 × 1 | 0.50 | — |
| 2 | 1 | 0.8 × 0.5 + 0.5 × 1 | 0.90 | — |
| 3 | 1 | 0.8 × 0.9 + 0.5 × 1 | 1.22 | Spike! → Reset to 0 |
| 4 | 0 | 0.8 × 0.0 + 0.5 × 0 | 0.00 | — |
The neuron accumulated input over three time steps, crossed the threshold at \(t=3\), fired, and reset.
Spike Coding: How Information Is Represented#
A fundamental question in SNNs is: how do spikes carry information? There are three main coding schemes, each with different trade-offs.
Rate Coding#
The simplest approach: information is encoded in the firing rate (number of spikes per unit time).
Strong stimulus: | | | | | | | | | | (high firing rate)
Weak stimulus: | | | | (low firing rate)How it works step by step:
- Present a stimulus to the network
- Run the simulation for a time window \(\Delta T\)
- Count the total number of spikes each output neuron fires
- The neuron with the highest count is the network’s prediction
Pros:
- Robust to noise (missing one spike barely changes the rate)
- Easy to understand and implement
Cons:
- Slow: needs many time steps to get a reliable count
- Energy-inefficient: many spikes required
Temporal Coding#
Information is encoded in the precise timing of spikes. A neuron that fires earlier encodes a stronger stimulus.
Strong stimulus: | (fires at t = 2ms)
Medium stimulus: | (fires at t = 5ms)
Weak stimulus: | (fires at t = 8ms)How it works step by step:
- Present a stimulus to the network
- Each output neuron fires at most once
- The neuron that fires first corresponds to the network’s prediction
- A single spike per neuron is enough
Pros:
- Extremely fast (single spike is sufficient)
- Energy-efficient
Cons:
- Sensitive to noise (one mistimed spike changes the result)
- Harder to train
Population Coding#
Information is encoded in the collective pattern of spikes across a group of neurons.
Neuron A: | | | |
Neuron B: | | | | |
Neuron C: | | | | | ← The combined pattern encodes information
Neuron D: | | |This is how biological brains predominantly encode information. No single neuron carries the full picture — the population activity does.
Comparison#
| Coding | Spikes needed | Speed | Noise robustness | Biological relevance |
|---|---|---|---|---|
| Rate | Many | Slow | High | Moderate |
| Temporal | One per neuron | Fast | Low | High |
| Population | Varies | Moderate | High | Very high |
STDP: The Core Learning Rule#
Spike-Timing-Dependent Plasticity (STDP) is the primary unsupervised learning rule for SNNs. It was discovered in biological experiments in the late 1990s and formalizes a simple but powerful idea about how synapses should change.
The Biological Motivation#
Donald Hebb proposed in 1949: “Neurons that fire together, wire together.” STDP refines this idea by adding temporal order: it matters which neuron fires first.
The Rule in Plain Language#
Consider two neurons connected by a synapse — a pre-synaptic neuron (sender) and a post-synaptic neuron (receiver).
Case 1: Pre fires before Post (\(\Delta t > 0\))
Pre neuron spike Post neuron spike
| |
|←───── Δt > 0 ─────→|
Interpretation: Pre's spike contributed to Post's firing
Result: STRENGTHEN the synapse (Long-Term Potentiation, LTP)The pre-synaptic spike was a cause of the post-synaptic spike. The connection was useful, so make it stronger.
Case 2: Post fires before Pre (\(\Delta t < 0\))
Post neuron spike Pre neuron spike
| |
|←───── Δt < 0 ─────→|
Interpretation: Pre's spike arrived too late to cause Post's firing
Result: WEAKEN the synapse (Long-Term Depression, LTD)The pre-synaptic spike arrived after the post-synaptic neuron already fired. It did not contribute, so weaken the connection.
The Mathematical Formulation#
The change in synaptic weight depends on the time difference \(\Delta t = t_{post} - t_{pre}\):
$$ \Delta w = \begin{cases} A_+ \exp\left(-\frac{\Delta t}{\tau_+}\right) & \text{if } \Delta t > 0 \quad \text{(LTP: strengthen)} \\[8pt] -A_- \exp\left(\frac{\Delta t}{\tau_-}\right) & \text{if } \Delta t < 0 \quad \text{(LTD: weaken)} \end{cases} $$Where:
- \(\Delta t = t_{post} - t_{pre}\): Time difference between post and pre spikes
- \(A_+\): Maximum potentiation amplitude (learning rate for strengthening)
- \(A_-\): Maximum depression amplitude (learning rate for weakening)
- \(\tau_+, \tau_-\): Time constants controlling the window width (typically ~20 ms)
STDP Window Shape#
Δw (weight change)
↑
| LTP (strengthen)
A₊| ╲
| ╲
| ╲
| ╲
──┼───────╲──────────── Δt = 0
| ╱
| ╱
| ╱
-A₋| ╱
| LTD (weaken)
|
←── Δt < 0 ──|── Δt > 0 ──→
(Post before Pre) (Pre before Post)Step-by-Step STDP Example#
Suppose \(A_+ = 0.1\), \(A_- = 0.12\), \(\tau_+ = \tau_- = 20\) ms.
Scenario: Pre fires at \(t = 100\) ms, Post fires at \(t = 110\) ms.
- Compute \(\Delta t = 110 - 100 = +10\) ms
- Since \(\Delta t > 0\), apply LTP: $$ \Delta w = 0.1 \times \exp\left(-\frac{10}{20}\right) = 0.1 \times 0.607 = 0.0607 $$
- Update weight: \(w_{new} = w_{old} + 0.0607\)
Scenario: Pre fires at \(t = 100\) ms, Post fires at \(t = 85\) ms.
- Compute \(\Delta t = 85 - 100 = -15\) ms
- Since \(\Delta t < 0\), apply LTD: $$ \Delta w = -0.12 \times \exp\left(\frac{-15}{20}\right) = -0.12 \times 0.472 = -0.0567 $$
- Update weight: \(w_{new} = w_{old} - 0.0567\)
Why Is Depression Stronger Than Potentiation?#
Typically, \(A_- > A_+\). This is intentional. If strengthening and weakening were perfectly balanced, all synapses would gradually drift upward. By making depression slightly stronger, the network becomes competitive: only the synapses that are consistently causal survive. The rest weaken and effectively prune themselves.
This produces sparse, efficient connectivity — similar to what we observe in biological brains.
Surrogate Gradient: Enabling Backpropagation in SNNs#
STDP is an unsupervised learning rule. But what if we want to do supervised learning with labeled data, like in standard deep learning? We need backpropagation. However, there is a fundamental problem.
The Problem#
The spike function is a Heaviside step function:
$$ S(t) = \Theta(V(t) - V_{th}) = \begin{cases} 1 & \text{if } V(t) \geq V_{th} \\ 0 & \text{otherwise} \end{cases} $$Its derivative is zero everywhere except at the threshold, where it is undefined (a Dirac delta):
Spike function Θ(x): Its derivative:
1 ───────── ↑ ∞
| │
| │ (Dirac delta)
0───────── ──────┴──────
Vth Vth
Gradient is 0 almost everywhere → backpropagation gets no useful signalThe Solution: Surrogate Gradients#
During the forward pass, we use the true Heaviside function (spikes are binary). During the backward pass, we swap in a smooth, differentiable surrogate function that approximates the step.
Common surrogate functions:
| Surrogate | Formula | Shape |
|---|---|---|
| Arctangent | \(\frac{1}{\pi} \cdot \frac{1}{1 + (\pi x)^2}\) | Smooth bell curve |
| Sigmoid | \(\sigma’(x) = \sigma(x)(1-\sigma(x))\) | Bell curve |
| Fast Sigmoid | \(\frac{1}{(1 + k|x|)^2}\) | Sharp bell curve |
| Triangular | \(\max(0, 1 - |x|)\) | Triangle |
The arctangent surrogate is one of the most popular:
$$ \frac{\partial S}{\partial V} \approx \frac{1}{\pi} \cdot \frac{1}{1 + (\pi (V - V_{th}))^2} $$How It Works Step by Step#
- Forward pass: Compute \(V[t]\) using the LIF equation. Fire a real binary spike if \(V \geq V_{th}\).
- Loss computation: Compare output spikes (or spike counts) to the target label.
- Backward pass: When computing gradients through the spike function, replace \(\frac{\partial S}{\partial V}\) with the surrogate derivative.
- Weight update: Apply standard gradient descent using the surrogate gradients.
This approach is called Backpropagation Through Time (BPTT) for SNNs, because the network is unrolled across time steps.
Neuromorphic Hardware#
Standard GPUs are designed for dense matrix multiplication. They compute every neuron at every time step, even when most neurons are silent. For SNNs, this is wasteful because spike rates are typically only 1–5%.
Neuromorphic chips are designed specifically for event-driven computation.
Major Neuromorphic Chips#
| Chip | Developer | Neurons | Key Feature |
|---|---|---|---|
| Loihi 2 | Intel | 1 million | On-chip learning (STDP), asynchronous |
| TrueNorth | IBM | 1 million | Ultra-low power (~70 mW), 4096 cores |
| SpiNNaker 2 | Univ. of Manchester | Millions | ARM core based, flexible |
| Akida | BrainChip | — | Edge AI, commercial deployment |
Why Neuromorphic Hardware Is Efficient#
GPU approach (synchronous):
Clock tick → Compute ALL neurons → Clock tick → Compute ALL neurons → ...
For 1 million neurons with 1% spike rate:
990,000 neurons: 0 × weight = 0 (wasted computation)
10,000 neurons: spike × weight (useful computation)
→ 99% of work is wastedNeuromorphic approach (event-driven):
Spike arrives at neuron 42 → Update neuron 42 only
Spike arrives at neuron 7801 → Update neuron 7801 only
No spike at neuron 500 → No computation at all
→ Only ~1-5% of neurons compute at any momentThe energy savings are substantial:
$$ E_{SNN} \approx E_{ANN} \times \text{spike rate} \approx E_{ANN} \times (0.01 \sim 0.05) $$This makes SNNs on neuromorphic hardware 20–100× more energy-efficient than equivalent ANNs on GPUs for suitable workloads.
Where SNNs Excel Today#
| Application | Why SNN fits |
|---|---|
| Event camera (DVS) processing | Input is already spikes |
| Always-on keyword detection | Ultra-low power needed |
| Edge robotics | Battery-constrained |
| Anomaly detection | Sparse events, low latency |
| Biomedical signal processing | Temporal spike patterns |
Summary#
| Concept | Key Idea |
|---|---|
| LIF Neuron | Accumulate input → leak over time → fire when threshold is crossed |
| Rate Coding | Information = firing frequency over a time window |
| Temporal Coding | Information = precise spike timing (earlier = stronger) |
| Population Coding | Information = collective pattern across many neurons |
| STDP | Pre→Post = strengthen; Post→Pre = weaken |
| Surrogate Gradient | Replace non-differentiable spike with smooth approximation during backprop |
| Neuromorphic chips | Event-driven hardware → compute only when spikes occur → 20–100× energy savings |
SNNs do not yet match ANNs in raw accuracy on standard benchmarks like ImageNet. But in domains where low power, low latency, and temporal data matter — such as edge devices, event cameras, and always-on sensors — SNNs offer a compelling advantage that grows as neuromorphic hardware matures.