Table of Contents

Overview
#

Computer structure describes how hardware components are organized to execute programs. Understanding computer architecture is fundamental for system programming and optimization.

Von Neumann Architecture
#

┌─────────────────────────────────────────┐
│              Memory                      │
│     (Instructions and Data)              │
└───────────────┬─────────────────────────┘
                │ Bus
┌───────────────┴─────────────────────────┐
│                 CPU                      │
│  ┌──────────┐    ┌──────────────────┐   │
│  │ Control  │    │    Datapath      │   │
│  │  Unit    │    │  ┌────┐ ┌────┐   │   │
│  └──────────┘    │  │ALU │ │Regs│   │   │
│                  │  └────┘ └────┘   │   │
│                  └──────────────────┘   │
└─────────────────────────────────────────┘
┌───────────────┴─────────────────────────┐
│           I/O Devices                    │
└─────────────────────────────────────────┘

Key Principles
#

  1. Stored program: Instructions in memory
  2. Sequential execution: Fetch-decode-execute
  3. Single memory: Data and instructions shared

CPU Components
#

Control Unit
#

  • Fetches instructions
  • Decodes opcodes
  • Generates control signals
  • Manages program counter

Datapath
#

  • ALU: Arithmetic Logic Unit
  • Registers: Fast storage
  • Multiplexers: Data routing
  • Buses: Data transfer

Registers
#

RegisterPurpose
PCProgram Counter
IRInstruction Register
MARMemory Address Register
MDRMemory Data Register
AccumulatorResult storage

Instruction Cycle
#

┌────────┐
│ Fetch  │ ← Get instruction from memory
└───┬────┘
┌───┴────┐
│ Decode │ ← Interpret instruction
└───┬────┘
┌───┴────┐
│Execute │ ← Perform operation
└───┬────┘
┌───┴────┐
│ Store  │ ← Write results
└────────┘

Memory Hierarchy
#

        ┌─────────┐
        │Registers│  ← Fastest, smallest
        ├─────────┤
        │ L1 Cache│
        ├─────────┤
        │ L2 Cache│
        ├─────────┤
        │ L3 Cache│
        ├─────────┤
        │  DRAM   │  ← Main memory
        ├─────────┤
        │  SSD    │
        ├─────────┤
        │  HDD    │  ← Slowest, largest
        └─────────┘

Memory Characteristics
#

LevelSizeLatency
Registers~KB<1 ns
L1 Cache32-64 KB~1 ns
L2 Cache256 KB - 1 MB~4 ns
L3 Cache2-32 MB~12 ns
DRAM8-64 GB~100 ns
SSD256 GB - 4 TB~100 μs
HDD1-10 TB~10 ms

Instruction Set Architecture (ISA)
#

CISC vs RISC
#

AspectCISCRISC
InstructionsComplex, variable lengthSimple, fixed length
Addressing modesManyFew
ExecutionMulti-cycleSingle cycle (pipelined)
Examplesx86ARM, RISC-V

Common Instructions
#

TypeExamples
Data transferLOAD, STORE, MOV
ArithmeticADD, SUB, MUL, DIV
LogicAND, OR, XOR, NOT
ControlJMP, CALL, RET
ComparisonCMP, TEST

Pipelining
#

Time:    1   2   3   4   5   6   7
Inst 1: [IF][ID][EX][MEM][WB]
Inst 2:     [IF][ID][EX][MEM][WB]
Inst 3:         [IF][ID][EX][MEM][WB]
Inst 4:             [IF][ID][EX][MEM][WB]

Pipeline Stages
#

  1. IF: Instruction Fetch
  2. ID: Instruction Decode
  3. EX: Execute
  4. MEM: Memory access
  5. WB: Write Back

Hazards
#

TypeCauseSolution
StructuralResource conflictMore hardware
DataRAW dependencyForwarding, stall
ControlBranchPrediction, delay slot

Parallelism
#

Instruction Level Parallelism (ILP)
#

  • Superscalar: Multiple instructions per cycle
  • Out-of-order execution
  • Branch prediction

Thread Level Parallelism (TLP)
#

  • Simultaneous multithreading (SMT)
  • Multi-core processors

Data Level Parallelism (DLP)
#

  • SIMD: Single Instruction Multiple Data
  • Vector processing
  • GPU computing

Performance Equation
#

$$ \text{CPU Time} = \text{Instructions} \times \text{CPI} \times \text{Clock Period} $$

Where:

  • CPI: Cycles Per Instruction
  • Clock Period = 1 / Clock Frequency

Improving Performance
#

MethodReduces
Better algorithmsInstruction count
Better ISACPI
Better implementationCPI, clock period
Better circuitsClock period