Table of Contents

Overview
#

Benchmarks are standardized tests used to evaluate and compare computer system performance. Proper benchmarking is essential for making informed hardware decisions.

Types of Benchmarks
#

Synthetic Benchmarks
#

Artificial workloads designed to stress specific components:

BenchmarkMeasures
DhrystoneInteger performance
WhetstoneFloating-point performance
LINPACKDense linear algebra
StreamMemory bandwidth
IOzoneDisk I/O

Pros: Reproducible, focused Cons: May not reflect real workloads

Application Benchmarks
#

Real programs with defined workloads:

BenchmarkDomain
SPEC CPUGeneral computing
SPEC JBBJava server
TPC-CDatabase transactions
MLPerfMachine learning
Cinebench3D rendering

Pros: Realistic Cons: Complex, many variables

Microbenchmarks
#

Test specific operations:

// Memory latency test
for (int i = 0; i < N; i++) {
    p = *p;  // Pointer chasing
}
// Measures cache/memory latency

SPEC Benchmarks
#

SPEC CPU 2017
#

Integer (SPECint):

  • Compression (gcc, xz)
  • Simulation (mcf, omnetpp)
  • AI/search (deepsjeng)

Floating-Point (SPECfp):

  • Physics simulation
  • Computational chemistry
  • Weather modeling

Calculating SPEC Score
#

$$ \text{Ratio} = \frac{\text{Reference Time}}{\text{System Time}} $$

Overall score (geometric mean):

$$ \text{Score} = \sqrt[n]{\prod_{i=1}^{n} \text{Ratio}_i} $$

Why Geometric Mean?
#

  • Normalizes different scales
  • Prevents domination by outliers
  • Symmetric for speedups and slowdowns

Memory Benchmarks
#

Bandwidth (Stream)
#

Copy:  a[i] = b[i]
Scale: a[i] = q * b[i]
Add:   a[i] = b[i] + c[i]
Triad: a[i] = b[i] + q * c[i]

Reports GB/s for each operation.

Latency
#

Measure time to access memory at various depths:

LevelTypical Latency
L1 cache~1 ns
L2 cache~4 ns
L3 cache~12 ns
DRAM~60-100 ns

Graphics Benchmarks
#

BenchmarkFocus
3DMarkGaming graphics
SPECviewperfProfessional graphics
UnigineGPU stress testing
FurMarkGPU thermal testing

Storage Benchmarks
#

Metrics
#

MetricDescription
IOPSI/O Operations Per Second
ThroughputMB/s transfer rate
LatencyTime per operation

Tools
#

  • fio (Flexible I/O Tester)
  • CrystalDiskMark
  • ATTO Disk Benchmark

Benchmark Methodology
#

Best Practices
#

  1. Warm-up: Run benchmark once before measuring
  2. Multiple runs: Report mean and variance
  3. Controlled environment: Minimal background processes
  4. Full system: Include OS, drivers, compiler

Common Mistakes
#

MistakeWhy It’s Wrong
Single runStatistical noise
Peak performanceRarely achieved
Incomparable testsDifferent configurations
Cherry-pickingBiased results

Reporting Results
#

What to Include
#

System Configuration:
- CPU: Intel Core i7-12700K @ 4.9 GHz
- RAM: 32 GB DDR5-5600
- OS: Ubuntu 22.04
- Compiler: gcc 12.1 -O3

Results (mean ± std, n=10):
- Test A: 1234 ± 12 units
- Test B: 5678 ± 45 units

Statistical Validity
#

$$ \text{CI} = \bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} $$

Report 95% confidence intervals when possible.

Benchmark Suites
#

SPEC Suites
#

SuiteApplication
SPEC CPUProcessor
SPEC PowerEnergy efficiency
SPEC JBBJava business
SPEC CloudCloud computing

TPC (Transaction Processing)
#

BenchmarkWorkload
TPC-COLTP
TPC-HDecision support
TPC-DSBig data analytics

MLPerf
#

  • Training benchmarks
  • Inference benchmarks
  • Edge device benchmarks

Interpreting Results
#

Performance per Dollar
#

$$ \text{Value} = \frac{\text{Performance}}{\text{Price}} $$

Performance per Watt
#

$$ \text{Efficiency} = \frac{\text{Performance}}{\text{Power}} $$

Total Cost of Ownership
#

$$ \text{TCO} = \text{Acquisition} + \text{Operation} + \text{Maintenance} $$

Summary
#

Benchmark TypeBest For
SyntheticComponent testing
ApplicationReal-world performance
MicrobenchmarkSpecific analysis
StandardizedFair comparison