Table of Contents
Overview#
RTL (Register-Transfer Level) is the abstraction level at which most digital hardware is designed today. It describes a circuit in terms of registers (flip-flops that store data) and the combinational logic (operations) that transforms data as it moves between those registers.
Think of it this way: if you were describing how a factory works, you wouldn’t describe every gear and bolt — you’d describe the workstations (registers) and what happens to the product as it moves between them (logic). RTL is exactly that abstraction for digital circuits.
Abstraction Hierarchy:
System Level "A processor that runs Linux"
│
Algorithmic Level "Multiply A and B, accumulate result"
│
★ RTL Level ★ "On clock edge: REG_C <= REG_A * REG_B + REG_C"
│
Gate Level "AND gate output connects to OR gate input..."
│
Transistor Level "NMOS/PMOS with W=0.5μm, L=0.18μm..."
│
Physical Level "Metal layers, via connections, silicon doping..."RTL sits in the sweet spot: high enough to think about algorithms and data flow, low enough to precisely control timing and resource usage.
1. The Two Building Blocks#
Every digital circuit at the RTL level is built from exactly two types of elements:
1.1 Combinational Logic#
Combinational logic computes an output purely from its current inputs — it has no memory. The output changes immediately (after propagation delay) when inputs change.
Combinational Logic:
Inputs ──→ [Logic Function] ──→ Output
A, B, C f(A,B,C) Y
Y depends ONLY on current A, B, C
No clock, no memory, no state
Examples:
- Adder: Y = A + B
- MUX: Y = sel ? B : A
- ALU: Y = op(A, B) (add, sub, and, or, shift...)
- Decoder: 3-bit input → 8-bit one-hot output1.2 Sequential Logic (Registers)#
Sequential logic has memory — its output depends on both current inputs and previous state. In synchronous design, state changes happen only on clock edges.
Sequential Logic (D Flip-Flop):
┌─────────┐
D ────→│ │
│ D FF ├────→ Q (output = stored value)
CLK ──→│ │
└─────────┘
On rising edge of CLK: Q takes the value of D
Between clock edges: Q holds its previous value
This is the fundamental "register" in RTL.1.3 The RTL Pattern#
The fundamental RTL pattern is registers separated by combinational logic:
The RTL Paradigm:
CLK CLK CLK CLK
│ │ │ │
┌─────┴─┐ ┌┴──────┴─┐ ┌─┴─────┐
│ REG A │──│ Comb. │──│ REG B │──→ ...
│ │ │ Logic │ │ │
└───────┘ └──────────┘ └───────┘
Clock cycle 1: REG_A captures input data
Clock cycle 2: Combinational logic computes f(REG_A)
REG_B captures the result
Data "flows" from register to register,
transformed by combinational logic between them.This is why it’s called Register-Transfer Level — we describe how data transfers between registers through logic operations.
2. Describing RTL in HDL#
RTL is typically written in a Hardware Description Language (HDL) — either Verilog or VHDL. Here we use Verilog.
2.1 Combinational Logic in Verilog#
// Combinational: 2-to-1 MUX
// Output changes whenever ANY input changes
assign y = sel ? b : a;
// Combinational: Full Adder
assign {carry_out, sum} = a + b + carry_in;
// Combinational: ALU (using always block)
always @(*) begin // @(*) = "whenever any input changes"
case (op)
2'b00: result = a + b;
2'b01: result = a - b;
2'b10: result = a & b;
2'b11: result = a | b;
endcase
end2.2 Sequential Logic in Verilog#
// Sequential: Simple register (D flip-flop)
// Output changes ONLY on clock edge
always @(posedge clk) begin
q <= d; // "<=" is non-blocking assignment
end
// Sequential: Register with synchronous reset
always @(posedge clk) begin
if (reset)
counter <= 8'b0;
else
counter <= counter + 1;
end
// Sequential: Register with enable
always @(posedge clk) begin
if (enable)
data_reg <= data_in;
// else: data_reg retains its value (implicit)
end2.3 The Critical Distinction#
Combinational: Sequential:
always @(*) begin always @(posedge clk) begin
// sensitive to ALL inputs // sensitive to clock edge ONLY
y = a + b; q <= a + b;
end end
┌───────────┐ ┌─────────┐
│ a + b │──→ y │ a + b │──→ q
└───────────┘ │ on CLK↑ │
No clock, instant* └─────────┘
(* after propagation delay) Clocked, stores result3. The Datapath / Control Partition#
Real RTL designs split naturally into two parts:
┌──────────────────────────────────────────────────────┐
│ RTL Design │
│ │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ Control Path │ │ Datapath │ │
│ │ (FSM) │ │ │ │
│ │ │ │ ┌─────┐ ┌─────────┐ │ │
│ │ Decides WHAT │───→│ │ MUX │──│ ALU │ │ │
│ │ to do next │ │ └──┬──┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ Outputs: │ │ ┌──┴──┐ ┌───┴────┐ │ │
│ │ - MUX selects │ │ │ REG │ │ REG │ │ │
│ │ - REG enables │ │ └─────┘ └────────┘ │ │
│ │ - ALU opcodes │ │ │ │
│ │ │←───│ Status flags: │ │
│ │ Inputs: │ │ - zero, carry, overflow│ │
│ │ - status flags │ │ │ │
│ │ - external ctrl │ │ Does the COMPUTATION │ │
│ └──────────────────┘ └────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────┘- Datapath: The “muscles” — registers, adders, multipliers, multiplexers, shifters. Moves and transforms data.
- Control Path: The “brain” — typically a Finite State Machine (FSM) that generates control signals to orchestrate the datapath.
4. Finite State Machines (FSMs)#
The control path is almost always implemented as an FSM. Two standard types:
4.1 Moore Machine#
Output depends only on the current state (not on inputs directly).
Moore FSM:
┌──────────┐
Input ──→ [Next │ State │──→ [Output Logic] ──→ Output
State │ Register │
Logic]──│ │
└──────────┘
↑
CLK
Output = f(state) ← only state
Next state = g(state, input) ← state + input4.2 Mealy Machine#
Output depends on current state AND current inputs — can react faster but may create timing issues.
Mealy FSM:
┌──────────┐
Input ─┬──→[Next │ State │──┬──→ [Output Logic] ──→ Output
│ State │ Register │ │ ↑
│ Logic]─│ │ │ │
│ └──────────┘ │ Input ┘
│ ↑ │
│ CLK │
└───────────────────────┘
Output = f(state, input) ← state + input
Next state = g(state, input) ← state + input4.3 FSM in Verilog#
// Example: Simple traffic light controller
// States: RED, GREEN, YELLOW
localparam RED = 2'b00;
localparam GREEN = 2'b01;
localparam YELLOW = 2'b10;
reg [1:0] state, next_state;
// Sequential: State register
always @(posedge clk or posedge reset) begin
if (reset)
state <= RED;
else
state <= next_state;
end
// Combinational: Next state logic
always @(*) begin
case (state)
RED: next_state = (timer_done) ? GREEN : RED;
GREEN: next_state = (timer_done) ? YELLOW : GREEN;
YELLOW: next_state = (timer_done) ? RED : YELLOW;
default: next_state = RED;
endcase
end
// Combinational: Output logic (Moore)
always @(*) begin
case (state)
RED: begin red = 1; green = 0; yellow = 0; end
GREEN: begin red = 0; green = 1; yellow = 0; end
YELLOW: begin red = 0; green = 0; yellow = 1; end
default: begin red = 1; green = 0; yellow = 0; end
endcase
end5. Timing: The Clock’s Role#
5.1 Setup and Hold Time#
For a flip-flop to correctly capture data, the input must be stable during two critical windows:
Setup Time Hold Time
◄────────► ◄───────►
│ │ │ │
D ─────┤ Stable ├────────┤Stable ├──── D can change
│ │ │ │
▲
│
CLK rising edge
Setup time (t_su): D must be stable BEFORE the clock edge
Hold time (t_h): D must be stable AFTER the clock edge
Violation → metastability → unpredictable output5.2 Critical Path and Clock Frequency#
The critical path is the longest combinational delay between any two registers:
REG ──→ [Logic A] ──→ [Logic B] ──→ [Logic C] ──→ REG
5 ns 3 ns 4 ns
Critical path delay = 5 + 3 + 4 = 12 ns
Minimum clock period = 12 ns + t_su + t_clk_to_q
Maximum frequency ≈ 1 / (12 ns + margins) ≈ ~75 MHzTo increase clock frequency, you must either:
- Simplify the combinational logic (reduce delay)
- Pipeline — insert registers to break long paths into shorter stages
5.3 Pipelining#
Before Pipelining:
REG ──→ [Logic A + B + C] ──→ REG (12 ns path, ~75 MHz)
After Pipelining:
REG ──→ [Logic A] ──→ REG ──→ [Logic B] ──→ REG ──→ [Logic C] ──→ REG
5 ns 3 ns 4 ns
Critical path = 5 ns → ~180 MHz!
Trade-off: Higher frequency, but +2 clock cycles of latency
and more register resources used.6. Common RTL Design Patterns#
6.1 Counter#
reg [7:0] count;
always @(posedge clk) begin
if (reset)
count <= 8'd0;
else if (enable)
count <= count + 1;
end6.2 Shift Register#
reg [7:0] shift_reg;
always @(posedge clk) begin
if (load)
shift_reg <= parallel_in;
else if (shift_en)
shift_reg <= {shift_reg[6:0], serial_in};
end6.3 FIFO (Simplified)#
Write Side: Read Side:
┌───┬───┬───┬───┬───┐
data_in ──→ │ 0 │ 1 │ 2 │ 3 │ 4 │ ──→ data_out
wr_en ──→ └───┴───┴───┴───┴───┘ ←── rd_en
↑ ↑
wr_ptr rd_ptr
Full = (wr_ptr + 1 == rd_ptr)
Empty = (wr_ptr == rd_ptr)6.4 Memory Interface#
// Simple synchronous RAM
reg [7:0] mem [0:255]; // 256 x 8-bit memory
always @(posedge clk) begin
if (we) // Write
mem[addr] <= data_in;
data_out <= mem[addr]; // Read (1-cycle latency)
end7. The RTL Design Flow#
From RTL code to a working chip or FPGA:
┌──────────────────────────────────┐
│ 1. Specification │ "What should it do?"
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 2. RTL Design (Verilog/VHDL) │ Write the hardware description
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 3. Functional Simulation │ Testbench verifies correctness
│ (ModelSim, VCS, Verilator) │ "Does the logic work?"
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 4. Synthesis │ RTL → Gate-level netlist
│ (Synopsys DC, Yosys) │ Maps to actual gates/LUTs
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 5. Place and Route (P&R) │ Gates → physical locations
│ (Cadence Innovus, Vivado) │ and wire connections
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 6. Timing Analysis (STA) │ "Can it run at target frequency?"
│ Setup/hold violations? │ Critical path analysis
└──────────────┬───────────────────┘
▼
┌──────────────────────────────────┐
│ 7. Fabrication (ASIC) │ GDSII → foundry → silicon
│ or Programming (FPGA) │ Bitstream → FPGA device
└──────────────────────────────────┘Synthesis: What Actually Happens#
RTL Code: Gate-Level Netlist:
always @(posedge clk) ┌─────┐ ┌─────┐
if (sel) │ MUX │───→│ DFF │──→ q
q <= a; a──→ │ │ │ │
else b──→ │ │ └──┬──┘
q <= b; sel──→ └─────┘ │
CLKThe synthesis tool automatically:
- Infers flip-flops from
always @(posedge clk)blocks - Maps combinational logic to gates (AND, OR, MUX, etc.)
- Optimizes for area, speed, or power based on constraints
8. FPGA vs ASIC#
| Aspect | FPGA | ASIC |
|---|---|---|
| Development time | Hours to days | Months to years |
| Unit cost | High ($10–$10,000) | Very low at scale ($0.10–$10) |
| NRE cost | Low ($0–$10K) | Very high ($1M–$100M+) |
| Performance | Good | Best (custom silicon) |
| Power | Higher | Lower (optimized) |
| Reconfigurable | Yes (reprogram anytime) | No (fixed at fabrication) |
| Use case | Prototyping, low volume, signal processing | Mass production (phones, SoCs) |
FPGAs implement RTL using Look-Up Tables (LUTs) instead of fixed gates:
FPGA Logic Element:
┌───────────────────────────────┐
│ ┌──────────┐ ┌─────────┐ │
│ │ 4-input │───→│ D FF │─┤──→ Output
│ │ LUT │ │ │ │
│ │ (16-bit │ └─────────┘ │
│ │ SRAM) │ │
│ └──────────┘ │
│ │
│ A LUT can implement ANY │
│ 4-input boolean function │
└───────────────────────────────┘9. Common Pitfalls#
9.1 Unintended Latches#
If a combinational block doesn’t assign a value in all paths, synthesis infers a latch — almost always a bug:
// BAD: Missing else → latch inferred!
always @(*) begin
if (sel)
y = a;
// What is y when sel=0? → Latch!
end
// GOOD: All paths covered → no latch
always @(*) begin
if (sel)
y = a;
else
y = b;
end
// ALSO GOOD: Default assignment
always @(*) begin
y = 0; // default
if (sel)
y = a;
end9.2 Blocking vs Non-Blocking#
// Sequential logic: ALWAYS use non-blocking (<=)
always @(posedge clk) begin
b <= a; // All assignments happen "simultaneously"
c <= b; // c gets OLD value of b (correct pipeline)
end
// Combinational logic: ALWAYS use blocking (=)
always @(*) begin
temp = a + b; // temp updated immediately
result = temp * c; // uses new temp (correct)
end9.3 Clock Domain Crossing#
When data moves between different clock domains, you must use synchronizers to avoid metastability:
Clock Domain A (50 MHz) Clock Domain B (100 MHz)
REG ──→ data ──→ [FF1] ──→ [FF2] ──→ REG
↑ ↑
CLK_B CLK_B
Two flip-flops in series (double synchronizer)
reduce metastability probability to negligible levels.
For multi-bit signals: use Gray coding or async FIFO.10. Summary#
RTL Design in One Picture:
Specification
│
▼
┌─────────────────────────────────────────────┐
│ RTL Description │
│ │
│ ┌────────┐ ┌──────────┐ ┌────────┐ │
│ │ REG │──→│ Comb. │──→│ REG │ │
│ │ │ │ Logic │ │ │ │
│ └────────┘ └──────────┘ └────────┘ │
│ ↑ ↑ ↑ │
│ CLK ─── CLK │
│ │
│ Control (FSM) ←──→ Datapath (ALU, MUX) │
│ │
└─────────────────────────────────────────────┘
│
▼
Synthesis → Gates → Place & Route → Silicon/FPGARTL design is fundamentally about organizing data movement through registers and logic across clock cycles. Master the concepts of combinational vs. sequential logic, the datapath/control split, timing constraints, and the synthesis flow — and you have the foundation to design anything from a simple counter to a complex multi-core processor.