Table of Contents

Overview
#

RTL (Register-Transfer Level) is the abstraction level at which most digital hardware is designed today. It describes a circuit in terms of registers (flip-flops that store data) and the combinational logic (operations) that transforms data as it moves between those registers.

Think of it this way: if you were describing how a factory works, you wouldn’t describe every gear and bolt — you’d describe the workstations (registers) and what happens to the product as it moves between them (logic). RTL is exactly that abstraction for digital circuits.

Abstraction Hierarchy:

  System Level        "A processor that runs Linux"
  Algorithmic Level   "Multiply A and B, accumulate result"
  ★ RTL Level ★       "On clock edge: REG_C <= REG_A * REG_B + REG_C"
  Gate Level          "AND gate output connects to OR gate input..."
  Transistor Level    "NMOS/PMOS with W=0.5μm, L=0.18μm..."
  Physical Level      "Metal layers, via connections, silicon doping..."

RTL sits in the sweet spot: high enough to think about algorithms and data flow, low enough to precisely control timing and resource usage.


1. The Two Building Blocks
#

Every digital circuit at the RTL level is built from exactly two types of elements:

1.1 Combinational Logic
#

Combinational logic computes an output purely from its current inputs — it has no memory. The output changes immediately (after propagation delay) when inputs change.

Combinational Logic:

  Inputs ──→ [Logic Function] ──→ Output
  A, B, C         f(A,B,C)         Y

  Y depends ONLY on current A, B, C
  No clock, no memory, no state

  Examples:
  - Adder: Y = A + B
  - MUX:   Y = sel ? B : A
  - ALU:   Y = op(A, B)  (add, sub, and, or, shift...)
  - Decoder: 3-bit input → 8-bit one-hot output

1.2 Sequential Logic (Registers)
#

Sequential logic has memory — its output depends on both current inputs and previous state. In synchronous design, state changes happen only on clock edges.

Sequential Logic (D Flip-Flop):

         ┌─────────┐
  D ────→│         │
         │   D FF  ├────→ Q (output = stored value)
  CLK ──→│         │
         └─────────┘

  On rising edge of CLK: Q takes the value of D
  Between clock edges: Q holds its previous value

  This is the fundamental "register" in RTL.

1.3 The RTL Pattern
#

The fundamental RTL pattern is registers separated by combinational logic:

The RTL Paradigm:

       CLK    CLK    CLK    CLK
        │      │      │      │
  ┌─────┴─┐  ┌┴──────┴─┐  ┌─┴─────┐
  │ REG A │──│  Comb.   │──│ REG B │──→ ...
  │       │  │  Logic   │  │       │
  └───────┘  └──────────┘  └───────┘

  Clock cycle 1: REG_A captures input data
  Clock cycle 2: Combinational logic computes f(REG_A)
                 REG_B captures the result

  Data "flows" from register to register,
  transformed by combinational logic between them.

This is why it’s called Register-Transfer Level — we describe how data transfers between registers through logic operations.


2. Describing RTL in HDL
#

RTL is typically written in a Hardware Description Language (HDL) — either Verilog or VHDL. Here we use Verilog.

2.1 Combinational Logic in Verilog
#

// Combinational: 2-to-1 MUX
// Output changes whenever ANY input changes
assign y = sel ? b : a;

// Combinational: Full Adder
assign {carry_out, sum} = a + b + carry_in;

// Combinational: ALU (using always block)
always @(*) begin       // @(*) = "whenever any input changes"
    case (op)
        2'b00: result = a + b;
        2'b01: result = a - b;
        2'b10: result = a & b;
        2'b11: result = a | b;
    endcase
end

2.2 Sequential Logic in Verilog
#

// Sequential: Simple register (D flip-flop)
// Output changes ONLY on clock edge
always @(posedge clk) begin
    q <= d;             // "<=" is non-blocking assignment
end

// Sequential: Register with synchronous reset
always @(posedge clk) begin
    if (reset)
        counter <= 8'b0;
    else
        counter <= counter + 1;
end

// Sequential: Register with enable
always @(posedge clk) begin
    if (enable)
        data_reg <= data_in;
    // else: data_reg retains its value (implicit)
end

2.3 The Critical Distinction
#

Combinational:                    Sequential:
always @(*) begin                 always @(posedge clk) begin
    // sensitive to ALL inputs        // sensitive to clock edge ONLY
    y = a + b;                        q <= a + b;
end                               end

  ┌───────────┐                    ┌─────────┐
  │  a + b    │──→ y               │ a + b   │──→ q
  └───────────┘                    │ on CLK↑ │
  No clock, instant*               └─────────┘
  (* after propagation delay)      Clocked, stores result

3. The Datapath / Control Partition
#

Real RTL designs split naturally into two parts:

┌──────────────────────────────────────────────────────┐
│                   RTL Design                          │
│                                                      │
│  ┌──────────────────┐    ┌────────────────────────┐  │
│  │  Control Path     │    │     Datapath            │  │
│  │  (FSM)            │    │                        │  │
│  │                  │    │  ┌─────┐  ┌─────────┐  │  │
│  │  Decides WHAT    │───→│  │ MUX │──│ ALU     │  │  │
│  │  to do next      │    │  └──┬──┘  └────┬────┘  │  │
│  │                  │    │     │          │       │  │
│  │  Outputs:        │    │  ┌──┴──┐  ┌───┴────┐  │  │
│  │  - MUX selects   │    │  │ REG │  │ REG    │  │  │
│  │  - REG enables   │    │  └─────┘  └────────┘  │  │
│  │  - ALU opcodes   │    │                        │  │
│  │                  │←───│  Status flags:          │  │
│  │  Inputs:         │    │  - zero, carry, overflow│  │
│  │  - status flags  │    │                        │  │
│  │  - external ctrl │    │  Does the COMPUTATION  │  │
│  └──────────────────┘    └────────────────────────┘  │
│                                                      │
└──────────────────────────────────────────────────────┘
  • Datapath: The “muscles” — registers, adders, multipliers, multiplexers, shifters. Moves and transforms data.
  • Control Path: The “brain” — typically a Finite State Machine (FSM) that generates control signals to orchestrate the datapath.

4. Finite State Machines (FSMs)
#

The control path is almost always implemented as an FSM. Two standard types:

4.1 Moore Machine
#

Output depends only on the current state (not on inputs directly).

Moore FSM:
                    ┌──────────┐
  Input ──→ [Next   │  State   │──→ [Output Logic] ──→ Output
            State   │ Register │
            Logic]──│          │
                    └──────────┘
                        CLK

  Output = f(state)           ← only state
  Next state = g(state, input) ← state + input

4.2 Mealy Machine
#

Output depends on current state AND current inputs — can react faster but may create timing issues.

Mealy FSM:
                    ┌──────────┐
  Input ─┬──→[Next  │  State   │──┬──→ [Output Logic] ──→ Output
         │   State  │ Register │  │         ↑
         │   Logic]─│          │  │         │
         │          └──────────┘  │    Input ┘
         │               ↑       │
         │              CLK      │
         └───────────────────────┘

  Output = f(state, input)        ← state + input
  Next state = g(state, input)    ← state + input

4.3 FSM in Verilog
#

// Example: Simple traffic light controller
// States: RED, GREEN, YELLOW
localparam RED    = 2'b00;
localparam GREEN  = 2'b01;
localparam YELLOW = 2'b10;

reg [1:0] state, next_state;

// Sequential: State register
always @(posedge clk or posedge reset) begin
    if (reset)
        state <= RED;
    else
        state <= next_state;
end

// Combinational: Next state logic
always @(*) begin
    case (state)
        RED:    next_state = (timer_done) ? GREEN  : RED;
        GREEN:  next_state = (timer_done) ? YELLOW : GREEN;
        YELLOW: next_state = (timer_done) ? RED    : YELLOW;
        default: next_state = RED;
    endcase
end

// Combinational: Output logic (Moore)
always @(*) begin
    case (state)
        RED:    begin red = 1; green = 0; yellow = 0; end
        GREEN:  begin red = 0; green = 1; yellow = 0; end
        YELLOW: begin red = 0; green = 0; yellow = 1; end
        default: begin red = 1; green = 0; yellow = 0; end
    endcase
end

5. Timing: The Clock’s Role
#

5.1 Setup and Hold Time
#

For a flip-flop to correctly capture data, the input must be stable during two critical windows:

         Setup Time        Hold Time
         ◄────────►        ◄───────►
         │         │        │       │
  D ─────┤  Stable  ├────────┤Stable ├──── D can change
         │         │        │       │
              CLK rising edge

  Setup time (t_su): D must be stable BEFORE the clock edge
  Hold time (t_h):   D must be stable AFTER the clock edge

  Violation → metastability → unpredictable output

5.2 Critical Path and Clock Frequency
#

The critical path is the longest combinational delay between any two registers:

REG ──→ [Logic A] ──→ [Logic B] ──→ [Logic C] ──→ REG
         5 ns           3 ns          4 ns

Critical path delay = 5 + 3 + 4 = 12 ns
Minimum clock period = 12 ns + t_su + t_clk_to_q
Maximum frequency ≈ 1 / (12 ns + margins) ≈ ~75 MHz

To increase clock frequency, you must either:

  1. Simplify the combinational logic (reduce delay)
  2. Pipeline — insert registers to break long paths into shorter stages

5.3 Pipelining
#

Before Pipelining:
REG ──→ [Logic A + B + C] ──→ REG    (12 ns path, ~75 MHz)

After Pipelining:
REG ──→ [Logic A] ──→ REG ──→ [Logic B] ──→ REG ──→ [Logic C] ──→ REG
          5 ns                   3 ns                   4 ns

Critical path = 5 ns → ~180 MHz!

Trade-off: Higher frequency, but +2 clock cycles of latency
           and more register resources used.

6. Common RTL Design Patterns
#

6.1 Counter
#

reg [7:0] count;

always @(posedge clk) begin
    if (reset)
        count <= 8'd0;
    else if (enable)
        count <= count + 1;
end

6.2 Shift Register
#

reg [7:0] shift_reg;

always @(posedge clk) begin
    if (load)
        shift_reg <= parallel_in;
    else if (shift_en)
        shift_reg <= {shift_reg[6:0], serial_in};
end

6.3 FIFO (Simplified)
#

Write Side:                              Read Side:
              ┌───┬───┬───┬───┬───┐
  data_in ──→ │ 0 │ 1 │ 2 │ 3 │ 4 │ ──→ data_out
  wr_en   ──→ └───┴───┴───┴───┴───┘ ←── rd_en
                ↑                 ↑
             wr_ptr            rd_ptr

  Full  = (wr_ptr + 1 == rd_ptr)
  Empty = (wr_ptr == rd_ptr)

6.4 Memory Interface
#

// Simple synchronous RAM
reg [7:0] mem [0:255];  // 256 x 8-bit memory

always @(posedge clk) begin
    if (we)                         // Write
        mem[addr] <= data_in;
    data_out <= mem[addr];          // Read (1-cycle latency)
end

7. The RTL Design Flow
#

From RTL code to a working chip or FPGA:

┌──────────────────────────────────┐
│  1. Specification                 │  "What should it do?"
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  2. RTL Design (Verilog/VHDL)    │  Write the hardware description
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  3. Functional Simulation        │  Testbench verifies correctness
│     (ModelSim, VCS, Verilator)   │  "Does the logic work?"
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  4. Synthesis                    │  RTL → Gate-level netlist
│     (Synopsys DC, Yosys)        │  Maps to actual gates/LUTs
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  5. Place and Route (P&R)        │  Gates → physical locations
│     (Cadence Innovus, Vivado)    │  and wire connections
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  6. Timing Analysis (STA)        │  "Can it run at target frequency?"
│     Setup/hold violations?       │  Critical path analysis
└──────────────┬───────────────────┘
┌──────────────────────────────────┐
│  7. Fabrication (ASIC)           │  GDSII → foundry → silicon
│     or Programming (FPGA)        │  Bitstream → FPGA device
└──────────────────────────────────┘

Synthesis: What Actually Happens
#

RTL Code:                          Gate-Level Netlist:

always @(posedge clk)              ┌─────┐    ┌─────┐
  if (sel)                         │ MUX │───→│ DFF │──→ q
    q <= a;                   a──→ │     │    │     │
  else                        b──→ │     │    └──┬──┘
    q <= b;                 sel──→ └─────┘       │
                                               CLK

The synthesis tool automatically:

  1. Infers flip-flops from always @(posedge clk) blocks
  2. Maps combinational logic to gates (AND, OR, MUX, etc.)
  3. Optimizes for area, speed, or power based on constraints

8. FPGA vs ASIC
#

AspectFPGAASIC
Development timeHours to daysMonths to years
Unit costHigh ($10–$10,000)Very low at scale ($0.10–$10)
NRE costLow ($0–$10K)Very high ($1M–$100M+)
PerformanceGoodBest (custom silicon)
PowerHigherLower (optimized)
ReconfigurableYes (reprogram anytime)No (fixed at fabrication)
Use casePrototyping, low volume, signal processingMass production (phones, SoCs)

FPGAs implement RTL using Look-Up Tables (LUTs) instead of fixed gates:

FPGA Logic Element:
┌───────────────────────────────┐
│  ┌──────────┐    ┌─────────┐ │
│  │ 4-input  │───→│  D FF   │─┤──→ Output
│  │   LUT    │    │         │ │
│  │ (16-bit  │    └─────────┘ │
│  │  SRAM)   │                │
│  └──────────┘                │
│                              │
│  A LUT can implement ANY     │
│  4-input boolean function    │
└───────────────────────────────┘

9. Common Pitfalls
#

9.1 Unintended Latches
#

If a combinational block doesn’t assign a value in all paths, synthesis infers a latch — almost always a bug:

// BAD: Missing else → latch inferred!
always @(*) begin
    if (sel)
        y = a;
    // What is y when sel=0? → Latch!
end

// GOOD: All paths covered → no latch
always @(*) begin
    if (sel)
        y = a;
    else
        y = b;
end

// ALSO GOOD: Default assignment
always @(*) begin
    y = 0;          // default
    if (sel)
        y = a;
end

9.2 Blocking vs Non-Blocking
#

// Sequential logic: ALWAYS use non-blocking (<=)
always @(posedge clk) begin
    b <= a;     // All assignments happen "simultaneously"
    c <= b;     // c gets OLD value of b (correct pipeline)
end

// Combinational logic: ALWAYS use blocking (=)
always @(*) begin
    temp = a + b;     // temp updated immediately
    result = temp * c; // uses new temp (correct)
end

9.3 Clock Domain Crossing
#

When data moves between different clock domains, you must use synchronizers to avoid metastability:

Clock Domain A (50 MHz)         Clock Domain B (100 MHz)

REG ──→ data ──→ [FF1] ──→ [FF2] ──→ REG
                  ↑          ↑
                 CLK_B      CLK_B

  Two flip-flops in series (double synchronizer)
  reduce metastability probability to negligible levels.

  For multi-bit signals: use Gray coding or async FIFO.

10. Summary
#

RTL Design in One Picture:

  Specification
  ┌─────────────────────────────────────────────┐
  │              RTL Description                 │
  │                                             │
  │   ┌────────┐   ┌──────────┐   ┌────────┐  │
  │   │  REG   │──→│  Comb.   │──→│  REG   │  │
  │   │        │   │  Logic   │   │        │  │
  │   └────────┘   └──────────┘   └────────┘  │
  │       ↑            ↑              ↑        │
  │      CLK          ───            CLK       │
  │                                             │
  │   Control (FSM) ←──→ Datapath (ALU, MUX)   │
  │                                             │
  └─────────────────────────────────────────────┘
  Synthesis → Gates → Place & Route → Silicon/FPGA

RTL design is fundamentally about organizing data movement through registers and logic across clock cycles. Master the concepts of combinational vs. sequential logic, the datapath/control split, timing constraints, and the synthesis flow — and you have the foundation to design anything from a simple counter to a complex multi-core processor.