SNN Learning: STDP and Neuromorphic Computing

Table of Contents

Overview
#

Artificial Neural Networks (ANNs) like CNNs and Transformers pass continuous floating-point values between layers. But biological neurons do not work that way. Real neurons communicate through spikes — brief electrical pulses that either happen or do not. Spiking Neural Networks (SNNs) replicate this biological mechanism, and this fundamental difference changes everything about how the network computes and learns.

This post walks through the core ideas behind SNNs step by step: how a single spiking neuron works, how information is encoded in spikes, and how the network learns through Spike-Timing-Dependent Plasticity (STDP).

ANN vs SNN: What Is Different?
#

Before diving into SNN internals, it helps to see the two paradigms side by side.

ANN Neuron:
  Inputs (floats) → [Weighted Sum + Activation Function] → Output (float)
  Example: 0.73, -0.12, 1.05  →  ReLU(Σ wᵢxᵢ + b)  →  0.84

SNN Neuron:
  Inputs (spikes) → [Membrane Potential Accumulation] → Spike or Silence
  Example: 1, 0, 1, 0, 0, 1  →  V(t) accumulates  →  Spike! (if V ≥ Vth)

Aspect	ANN	SNN
Information	Continuous real values	Binary spike events
Time	No inherent time axis	Time is fundamental
Core operation	Multiply-Accumulate (MAC)	Accumulate (AC)
Energy per operation	High (FP multiply)	Low (addition only on spike)
Target hardware	GPU	Neuromorphic chip
Biological plausibility	Low	High

The key insight is that SNNs are event-driven. A neuron only does work when it receives a spike. If no spike arrives, no computation happens. This is why SNNs can be dramatically more energy-efficient.

The LIF Neuron Model
#

The Leaky Integrate-and-Fire (LIF) neuron is the most widely used spiking neuron model. It captures the essential behavior of a biological neuron while remaining simple enough to simulate efficiently.

How a Biological Neuron Works
#

A real neuron maintains an electrical potential across its cell membrane. When input signals (from other neurons) arrive, this potential changes. If the potential crosses a threshold, the neuron fires a spike and sends it to downstream neurons. After firing, the potential resets.

LIF Step by Step
#

The LIF neuron follows this cycle:

Step 1: Receive input spikes
         ↓
Step 2: Integrate — add weighted input to membrane potential V(t)
         ↓
Step 3: Leak — V(t) decays toward resting potential over time
         ↓
Step 4: Check threshold — is V(t) ≥ Vth?
         ├── Yes → Fire a spike! Then reset V(t) → Vreset
         └── No  → Go back to Step 1, continue accumulating

The Continuous Equation
#

The membrane potential $V(t)$ evolves according to:

$$ \tau_m \frac{dV}{dt} = -(V(t) - V_{rest}) + R \cdot I(t) $$

Where:

$\tau_m$: Membrane time constant — controls how fast the neuron “forgets” (typical: 10–20 ms)
$V_{rest}$: Resting potential — the baseline when no input arrives (often 0 or −70 mV)
$R$: Membrane resistance
$I(t)$: Input current at time $t$

The first term $-(V - V_{rest})$ is the leak: it always pulls the potential back toward rest. Without new input, the neuron gradually returns to its resting state.

The second term $R \cdot I(t)$ is the drive: input current pushes the potential up (or down).

Firing Condition
#

When the membrane potential reaches the threshold:

$$ V(t) \geq V_{th} \implies \text{spike at time } t, \quad \text{then } V \rightarrow V_{reset} $$

After firing, the neuron enters a refractory period during which it cannot fire again. This prevents runaway activity.

Discrete-Time Version (For Simulation)
#

In practice, we simulate SNNs in discrete time steps $\Delta t$. The equation becomes:

$$ V[t+1] = \beta \cdot V[t] + \sum_i w_i \cdot S_i[t] $$

Where:

$\beta = e^{-\Delta t / \tau_m}$: Leak factor, a number between 0 and 1
$w_i$: Synaptic weight from input neuron $i$
$S_i[t]$: Spike from input neuron $i$ at time step $t$ (either 0 or 1)

The leak factor $\beta$ controls the neuron’s memory:

$\beta$ value	Behavior
Close to 1.0	Slow leak — neuron remembers inputs for a long time
Close to 0.0	Fast leak — neuron forgets quickly
Exactly 0.0	No memory — each time step is independent
Exactly 1.0	No leak — membrane potential never decays (Integrate-and-Fire)

Numerical Example
#

Let us trace through a concrete example. Suppose:

$\beta = 0.8$, $V_{th} = 1.0$, $V_{reset} = 0.0$
One input synapse with weight $w = 0.5$

Time	Input Spike	Computation	$V[t]$	Output Spike
0	0	0.8 × 0.0 + 0.5 × 0	0.00	—
1	1	0.8 × 0.0 + 0.5 × 1	0.50	—
2	1	0.8 × 0.5 + 0.5 × 1	0.90	—
3	1	0.8 × 0.9 + 0.5 × 1	1.22	Spike! → Reset to 0
4	0	0.8 × 0.0 + 0.5 × 0	0.00	—

The neuron accumulated input over three time steps, crossed the threshold at $t=3$, fired, and reset.

Spike Coding: How Information Is Represented
#

A fundamental question in SNNs is: how do spikes carry information? There are three main coding schemes, each with different trade-offs.

Rate Coding
#

The simplest approach: information is encoded in the firing rate (number of spikes per unit time).

Strong stimulus:  | | | | | | | | | |   (high firing rate)
Weak stimulus:    |     |     |     |    (low firing rate)

$$ r = \frac{n_{spikes}}{\Delta T} $$

How it works step by step:

Present a stimulus to the network
Run the simulation for a time window $\Delta T$
Count the total number of spikes each output neuron fires
The neuron with the highest count is the network’s prediction

Pros:

Robust to noise (missing one spike barely changes the rate)
Easy to understand and implement

Cons:

Slow: needs many time steps to get a reliable count
Energy-inefficient: many spikes required

Temporal Coding
#

Information is encoded in the precise timing of spikes. A neuron that fires earlier encodes a stronger stimulus.

Strong stimulus:  | (fires at t = 2ms)

Medium stimulus:       | (fires at t = 5ms)

Weak stimulus:              | (fires at t = 8ms)

How it works step by step:

Present a stimulus to the network
Each output neuron fires at most once
The neuron that fires first corresponds to the network’s prediction
A single spike per neuron is enough

Pros:

Extremely fast (single spike is sufficient)
Energy-efficient

Cons:

Sensitive to noise (one mistimed spike changes the result)
Harder to train

Population Coding
#

Information is encoded in the collective pattern of spikes across a group of neurons.

Neuron A:  | |   | |
Neuron B:    | | |   | |
Neuron C:  |     | | | |   ← The combined pattern encodes information
Neuron D:      |   | |

This is how biological brains predominantly encode information. No single neuron carries the full picture — the population activity does.

Comparison
#

Coding	Spikes needed	Speed	Noise robustness	Biological relevance
Rate	Many	Slow	High	Moderate
Temporal	One per neuron	Fast	Low	High
Population	Varies	Moderate	High	Very high

STDP: The Core Learning Rule
#

Spike-Timing-Dependent Plasticity (STDP) is the primary unsupervised learning rule for SNNs. It was discovered in biological experiments in the late 1990s and formalizes a simple but powerful idea about how synapses should change.

The Biological Motivation
#

Donald Hebb proposed in 1949: “Neurons that fire together, wire together.” STDP refines this idea by adding temporal order: it matters which neuron fires first.

The Rule in Plain Language
#

Consider two neurons connected by a synapse — a pre-synaptic neuron (sender) and a post-synaptic neuron (receiver).

Case 1: Pre fires before Post ($\Delta t > 0$)

Pre neuron spike          Post neuron spike
       |                         |
       |←───── Δt > 0 ─────→|

Interpretation: Pre's spike contributed to Post's firing
Result: STRENGTHEN the synapse (Long-Term Potentiation, LTP)

The pre-synaptic spike was a cause of the post-synaptic spike. The connection was useful, so make it stronger.

Case 2: Post fires before Pre ($\Delta t < 0$)

Post neuron spike         Pre neuron spike
       |                         |
       |←───── Δt < 0 ─────→|

Interpretation: Pre's spike arrived too late to cause Post's firing
Result: WEAKEN the synapse (Long-Term Depression, LTD)

The pre-synaptic spike arrived after the post-synaptic neuron already fired. It did not contribute, so weaken the connection.

The Mathematical Formulation
#

The change in synaptic weight depends on the time difference $\Delta t = t_{post} - t_{pre}$:

$$ \Delta w = \begin{cases} A_+ \exp\left(-\frac{\Delta t}{\tau_+}\right) & \text{if } \Delta t > 0 \quad \text{(LTP: strengthen)} \\[8pt] -A_- \exp\left(\frac{\Delta t}{\tau_-}\right) & \text{if } \Delta t < 0 \quad \text{(LTD: weaken)} \end{cases} $$

Where:

$\Delta t = t_{post} - t_{pre}$: Time difference between post and pre spikes
$A_+$: Maximum potentiation amplitude (learning rate for strengthening)
$A_-$: Maximum depression amplitude (learning rate for weakening)
$\tau_+, \tau_-$: Time constants controlling the window width (typically ~20 ms)

STDP Window Shape
#

Δw (weight change)
  ↑
  |    LTP (strengthen)
A₊|   ╲
  |    ╲
  |     ╲
  |      ╲
──┼───────╲──────────── Δt = 0
  |        ╱
  |       ╱
  |      ╱
-A₋|    ╱
  |   LTD (weaken)
  |
  ←── Δt < 0 ──|── Δt > 0 ──→
  (Post before Pre)  (Pre before Post)

Step-by-Step STDP Example
#

Suppose $A_+ = 0.1$, $A_- = 0.12$, $\tau_+ = \tau_- = 20$ ms.

Scenario: Pre fires at $t = 100$ ms, Post fires at $t = 110$ ms.

Compute $\Delta t = 110 - 100 = +10$ ms
Since $\Delta t > 0$, apply LTP: $$ \Delta w = 0.1 \times \exp\left(-\frac{10}{20}\right) = 0.1 \times 0.607 = 0.0607 $$
Update weight: $w_{new} = w_{old} + 0.0607$

Scenario: Pre fires at $t = 100$ ms, Post fires at $t = 85$ ms.

Compute $\Delta t = 85 - 100 = -15$ ms
Since $\Delta t < 0$, apply LTD: $$ \Delta w = -0.12 \times \exp\left(\frac{-15}{20}\right) = -0.12 \times 0.472 = -0.0567 $$
Update weight: $w_{new} = w_{old} - 0.0567$

Why Is Depression Stronger Than Potentiation?
#

Typically, $A_- > A_+$. This is intentional. If strengthening and weakening were perfectly balanced, all synapses would gradually drift upward. By making depression slightly stronger, the network becomes competitive: only the synapses that are consistently causal survive. The rest weaken and effectively prune themselves.

This produces sparse, efficient connectivity — similar to what we observe in biological brains.

Surrogate Gradient: Enabling Backpropagation in SNNs
#

STDP is an unsupervised learning rule. But what if we want to do supervised learning with labeled data, like in standard deep learning? We need backpropagation. However, there is a fundamental problem.

The Problem
#

The spike function is a Heaviside step function:

$$ S(t) = \Theta(V(t) - V_{th}) = \begin{cases} 1 & \text{if } V(t) \geq V_{th} \\ 0 & \text{otherwise} \end{cases} $$

Its derivative is zero everywhere except at the threshold, where it is undefined (a Dirac delta):

Spike function Θ(x):        Its derivative:
     1 ─────────                    ↑ ∞
     |                              │
     |                              │ (Dirac delta)
     0─────────               ──────┴──────
           Vth                      Vth

Gradient is 0 almost everywhere → backpropagation gets no useful signal

The Solution: Surrogate Gradients
#

During the forward pass, we use the true Heaviside function (spikes are binary). During the backward pass, we swap in a smooth, differentiable surrogate function that approximates the step.

Common surrogate functions:

Surrogate	Formula	Shape
Arctangent	$\frac{1}{\pi} \cdot \frac{1}{1 + (\pi x)^2}$	Smooth bell curve
Sigmoid	$\sigma’(x) = \sigma(x)(1-\sigma(x))$	Bell curve
Fast Sigmoid	$\frac{1}{(1 + k\|x\|)^2}$	Sharp bell curve
Triangular	$\max(0, 1 - \|x\|)$	Triangle

The arctangent surrogate is one of the most popular:

$$ \frac{\partial S}{\partial V} \approx \frac{1}{\pi} \cdot \frac{1}{1 + (\pi (V - V_{th}))^2} $$

How It Works Step by Step
#

Forward pass: Compute $V[t]$ using the LIF equation. Fire a real binary spike if $V \geq V_{th}$.
Loss computation: Compare output spikes (or spike counts) to the target label.
Backward pass: When computing gradients through the spike function, replace $\frac{\partial S}{\partial V}$ with the surrogate derivative.
Weight update: Apply standard gradient descent using the surrogate gradients.

This approach is called Backpropagation Through Time (BPTT) for SNNs, because the network is unrolled across time steps.

Neuromorphic Hardware
#

Standard GPUs are designed for dense matrix multiplication. They compute every neuron at every time step, even when most neurons are silent. For SNNs, this is wasteful because spike rates are typically only 1–5%.

Neuromorphic chips are designed specifically for event-driven computation.

Major Neuromorphic Chips
#

Chip	Developer	Neurons	Key Feature
Loihi 2	Intel	1 million	On-chip learning (STDP), asynchronous
TrueNorth	IBM	1 million	Ultra-low power (~70 mW), 4096 cores
SpiNNaker 2	Univ. of Manchester	Millions	ARM core based, flexible
Akida	BrainChip	—	Edge AI, commercial deployment

Why Neuromorphic Hardware Is Efficient
#

GPU approach (synchronous):

Clock tick → Compute ALL neurons → Clock tick → Compute ALL neurons → ...

For 1 million neurons with 1% spike rate:
  990,000 neurons: 0 × weight = 0  (wasted computation)
   10,000 neurons: spike × weight  (useful computation)
  → 99% of work is wasted

Neuromorphic approach (event-driven):

Spike arrives at neuron 42 → Update neuron 42 only
Spike arrives at neuron 7801 → Update neuron 7801 only
No spike at neuron 500 → No computation at all

  → Only ~1-5% of neurons compute at any moment

The energy savings are substantial:

$$ E_{SNN} \approx E_{ANN} \times \text{spike rate} \approx E_{ANN} \times (0.01 \sim 0.05) $$

This makes SNNs on neuromorphic hardware 20–100× more energy-efficient than equivalent ANNs on GPUs for suitable workloads.

Where SNNs Excel Today
#

Application	Why SNN fits
Event camera (DVS) processing	Input is already spikes
Always-on keyword detection	Ultra-low power needed
Edge robotics	Battery-constrained
Anomaly detection	Sparse events, low latency
Biomedical signal processing	Temporal spike patterns

Summary
#

Concept	Key Idea
LIF Neuron	Accumulate input → leak over time → fire when threshold is crossed
Rate Coding	Information = firing frequency over a time window
Temporal Coding	Information = precise spike timing (earlier = stronger)
Population Coding	Information = collective pattern across many neurons
STDP	Pre→Post = strengthen; Post→Pre = weaken
Surrogate Gradient	Replace non-differentiable spike with smooth approximation during backprop
Neuromorphic chips	Event-driven hardware → compute only when spikes occur → 20–100× energy savings

SNNs do not yet match ANNs in raw accuracy on standard benchmarks like ImageNet. But in domains where low power, low latency, and temporal data matter — such as edge devices, event cameras, and always-on sensors — SNNs offer a compelling advantage that grows as neuromorphic hardware matures.

Surrogate	Formula	Shape
Arctangent	\(\frac{1}{\pi} \cdot \frac{1}{1 + (\pi x)^2}\)	Smooth bell curve
Sigmoid	\(\sigma’(x) = \sigma(x)(1-\sigma(x))\)	Bell curve
Fast Sigmoid	\(\frac{1}{(1 + k\|x\|)^2}\)	Sharp bell curve
Triangular	\(\max(0, 1 - \|x\|)\)	Triangle

Overview#

ANN vs SNN: What Is Different?#

The LIF Neuron Model#

How a Biological Neuron Works#

LIF Step by Step#

The Continuous Equation#

Firing Condition#

Discrete-Time Version (For Simulation)#

Numerical Example#

Spike Coding: How Information Is Represented#

Rate Coding#

Temporal Coding#

Population Coding#

Comparison#

STDP: The Core Learning Rule#

The Biological Motivation#

The Rule in Plain Language#

The Mathematical Formulation#

STDP Window Shape#

Step-by-Step STDP Example#

Why Is Depression Stronger Than Potentiation?#

Surrogate Gradient: Enabling Backpropagation in SNNs#

The Problem#

The Solution: Surrogate Gradients#

How It Works Step by Step#

Neuromorphic Hardware#

Major Neuromorphic Chips#

Why Neuromorphic Hardware Is Efficient#

Where SNNs Excel Today#

Summary#

Overview
#

ANN vs SNN: What Is Different?
#

The LIF Neuron Model
#

How a Biological Neuron Works
#

LIF Step by Step
#

The Continuous Equation
#

Firing Condition
#

Discrete-Time Version (For Simulation)
#

Numerical Example
#

Spike Coding: How Information Is Represented
#

Rate Coding
#

Temporal Coding
#

Population Coding
#

Comparison
#

STDP: The Core Learning Rule
#

The Biological Motivation
#

The Rule in Plain Language
#

The Mathematical Formulation
#

STDP Window Shape
#

Step-by-Step STDP Example
#

Why Is Depression Stronger Than Potentiation?
#

Surrogate Gradient: Enabling Backpropagation in SNNs
#

The Problem
#

The Solution: Surrogate Gradients
#

How It Works Step by Step
#

Neuromorphic Hardware
#

Major Neuromorphic Chips
#

Why Neuromorphic Hardware Is Efficient
#

Where SNNs Excel Today
#

Summary
#