Table of Contents
Overview#
Alpamayo is NVIDIA’s open-source autonomous vehicle AI platform, unveiled by Jensen Huang at CES 2026 in January 2026. Named after the Alpamayo peak in Peru, it represents what Huang called “the ChatGPT moment for physical AI.”
Unlike traditional AV systems that rely on hand-crafted rules or black-box neural networks, Alpamayo is a Vision-Language-Action (VLA) model that can reason about driving scenarios and explain its decisions in natural language. It is a three-component portfolio: a 10.5B parameter VLA model, a simulation framework, and the largest open driving dataset to date.
1. Why Alpamayo Matters#
Traditional autonomous driving pipelines face a fundamental challenge: the long tail of edge cases. No amount of hand-crafted rules can cover every possible scenario — construction zones, unusual pedestrian behavior, debris on the road, complex multi-vehicle interactions.
Traditional AV Pipeline:
Perception ──→ Prediction ──→ Planning ──→ Control
(separate) (separate) (separate) (separate)
Problem: Error accumulates across modules
Problem: No holistic understanding of the scene
Problem: Cannot reason about novel scenarios
Alpamayo Approach:
[Multi-camera images] + [Ego state] + [Command]
│
▼
┌────────────────────┐
│ Alpamayo VLA │
│ (End-to-End) │
│ │
│ Reasoning + Action│
└─────────┬──────────┘
│
┌────────┴────────┐
▼ ▼
Chain-of-Causation Trajectory
(explainable (6.4s future,
reasoning) 64 waypoints)The key difference: Alpamayo generates an explicit Chain-of-Causation (CoC) reasoning trace — a human-readable explanation of why it makes each driving decision.
2. Alpamayo 1: The VLA Model#
2.1 Architecture#
Alpamayo 1 (formally Alpamayo-R1-10B) is a 10.5 billion parameter VLA model:
┌───────────────────────────────────────────────────────┐
│ Alpamayo 1 (10.5B) │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Cosmos-Reason VLM Backbone (8.2B params) │ │
│ │ │ │
│ │ Input: 4 cameras × 4 frames (0.4s @ 10Hz) │ │
│ │ + ego motion (3D translation + 9D rot) │ │
│ │ + text command │ │
│ │ │ │
│ │ Output: Chain-of-Causation reasoning trace │ │
│ │ + latent context for action decoder │ │
│ └──────────────────────┬──────────────────────────┘ │
│ │ │
│ ┌──────────────────────┴──────────────────────────┐ │
│ │ Diffusion-Based Trajectory Decoder (2.3B) │ │
│ │ │ │
│ │ Output: 64 waypoints @ 10Hz (6.4s future) │ │
│ │ 3D position + 9D rotation matrix │ │
│ │ in ego-vehicle coordinates │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────┘2.2 Input Specification#
| Input | Format | Details |
|---|---|---|
| Cameras | 4 views | Front-wide, front-tele, cross-left, cross-right |
| Resolution | 1080×1920 → 320×576 | Downsampled for processing |
| Temporal | 4 frames per camera | 0.4s history at 10Hz |
| Ego motion | 12D | 3D translation (x,y,z) + 9D rotation matrix |
| Trajectory | 16 waypoints | Past trajectory at 10Hz with timestamps |
| Command | Text string | Natural language driving instruction |
2.3 Output Specification#
| Output | Format | Details |
|---|---|---|
| Reasoning | Natural language | Chain-of-Causation trace |
| Trajectory | 64 waypoints | 6.4s future at 10Hz |
| Coordinates | 12D per waypoint | 3D position + 9D rotation in ego frame |
| Internal | Unicycle model | Acceleration + curvature in BEV |
2.4 Chain-of-Causation (CoC) Reasoning#
This is Alpamayo’s most distinctive feature. Instead of a black-box decision, the model generates an explicit reasoning trace:
Scene: Approaching construction zone with lane narrowing
CoC Output:
┌─────────────────────────────────────────────────────────┐
│ OBSERVATION: Construction cones detected encroaching │
│ into the right side of the current lane. │
│ │
│ REASONING: The effective lane width is reduced. │
│ Maintaining current lateral position would bring the │
│ vehicle dangerously close to the cones. │
│ │
│ ACTION: Nudge to the left to increase clearance from │
│ construction cones while remaining within lane bounds. │
│ │
│ PREDICTION: Vehicle ahead is decelerating due to the │
│ same obstruction. Reduce speed to maintain safe │
│ following distance. │
└─────────────────────────────────────────────────────────┘This is critical for:
- Regulatory compliance: Auditable decision logic for Level 4 certification
- Debugging: Engineers can understand why the system made a mistake
- Trust: Passengers and fleet operators can verify the system’s reasoning
3. Training Data#
3.1 Scale#
| Metric | Value |
|---|---|
| Images | 1+ billion |
| Driving hours | 80,000 hours of multi-camera video |
| Trajectory data | 80,000 hours at 10Hz sampling |
| CoC annotations | 700,000+ reasoning traces |
| Text tokens | <1 billion |
3.2 The RoaD Algorithm#
A key innovation is the RoaD (Robust open-loop to closed-loop Distillation) algorithm that addresses a fundamental challenge in AV training:
The Problem: Covariate Shift
┌─────────────────────────────────────────────────┐
│ Training (Open-Loop): │
│ Model sees: human expert trajectories │
│ Model learns: imitate the expert │
│ │
│ Deployment (Closed-Loop): │
│ Model's own actions change future observations │
│ Small errors compound over time │
│ Model enters states never seen in training │
│ │
│ → Performance degrades significantly │
└─────────────────────────────────────────────────┘
RoaD Solution:
Concurrent training that mitigates covariate shift
while being more data-efficient than pure RL3.3 Hybrid Labeling#
Alpamayo uses a combination of labeling approaches:
Data Labeling Pipeline:
├── Automatic (sensor-derived)
│ └── Trajectories, ego-motion, LiDAR point clouds
├── VLM-generated (synthetic)
│ └── Chain-of-Causation traces generated by large VLMs
└── Human-verified
└── Quality assurance on critical labels4. AlpaSim: The Simulation Framework#
AlpaSim is a fully open-source AV simulation framework with a microservice architecture:
┌──────────────────────────────────────────────────┐
│ AlpaSim │
│ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Runtime │────→│ Driver │ │
│ │ (orchestr.) │ │ (inference)│ │
│ └─────┬──────┘ └────────────┘ │
│ │ │
│ ┌─────┴──────┐ ┌────────────┐ │
│ │ Renderer │ │ TrafficSim │ │
│ │ (Omniverse │ │ (dynamic │ │
│ │ NuRec / │ │ agents) │ │
│ │ 3DGUT) │ └────────────┘ │
│ └────────────┘ │
│ ┌────────────┐ │
│ ┌────────────┐ │ Physics │ │
│ │ Config │ │ (vehicle │ │
│ │ (Hydra │ │ dynamics) │ │
│ │ YAML) │ └────────────┘ │
│ └────────────┘ │
│ │
│ Communication: gRPC between all services │
│ Rendering: NVIDIA Omniverse NuRec (3DGUT) │
│ Key: Pipeline parallelism for GPU utilization │
└──────────────────────────────────────────────────┘Sim2Val: Simulation-Based Validation#
AlpaSim’s most powerful capability is Sim2Val — using simulation rollouts to validate models before real-world deployment:
Traditional Validation:
Train model ──→ Deploy on real car ──→ Drive thousands of miles ──→ Evaluate
(Expensive, slow, potentially dangerous)
Sim2Val:
Train model ──→ Run in AlpaSim ──→ Correlate with real metrics
(Reduces variance by up to 83%)AlpaSim rollouts are realistic enough to reduce variance in real-world metrics by up to 83%, enabling faster and more confident model validation.
5. Open Datasets#
Alpamayo includes the largest open driving dataset to date:
| Metric | Value |
|---|---|
| Total driving data | 1,727 hours |
| Countries | 25 |
| Cities | 2,500+ |
| Total clips | 310,895 (20 seconds each) |
| Camera coverage | 100% of clips |
| LiDAR coverage | 100% of clips |
| Radar coverage | 163,850 clips (53%) |
| Reconstructed scenes | 900 (for simulation) |
| Geographic scope | North America, Europe, Asia |
6. Benchmarks and Performance#
6.1 Evaluation Metrics#
| Metric | Score | Dataset |
|---|---|---|
| AlpaSim Score (closed-loop) | 0.72 | PhysicalAI-AV-NuRec |
| minADE_6 @ 6.4s (open-loop) | 0.85m | PhysicalAI-AV |
6.2 Hardware Requirements#
| Requirement | Specification |
|---|---|
| Minimum GPU | 1x GPU with 24GB+ VRAM (RTX 3090/4090, A5000) |
| Tested on | NVIDIA H100 |
| OS | Linux |
| Python | 3.12.x |
| PyTorch | 2.8+ |
7. Competitive Landscape#
vs. Tesla FSD#
| Aspect | Alpamayo | Tesla FSD |
|---|---|---|
| Approach | Open-source, reasoning VLA | Proprietary, end-to-end NN |
| Reasoning | Explicit CoC traces | Black-box |
| Data | 1,727 hrs (open) | 3B+ miles (~9M vehicles) |
| Autonomy | Targeting L4 | L2 (human supervision required) |
| Sensors | Camera + LiDAR + Radar | Vision-only |
| Transparency | Auditable logic | Not interpretable |
vs. Waymo#
| Aspect | Alpamayo | Waymo |
|---|---|---|
| Role | Platform for OEMs | Vertically integrated robotaxi |
| Autonomy | Targeting L4 | Operating L4 (4 cities) |
| Approach | Foundation model + CoC | Two-system + explicit rules |
| Hardware | Flexible sensor suite | LiDAR-dependent |
| Scale | Open for any manufacturer | Geofenced |
Strategic Position#
Alpamayo represents NVIDIA’s bet that:
- The next leap in autonomy comes from reasoning-based foundation models
- Safety validation requires interpretability (CoC reasoning)
- The industry will standardize around open tools rather than each company building from scratch
8. Industry Adoption#
Current Partners#
- Mercedes-Benz CLA: First production car with Alpamayo on NVIDIA DRIVE full-stack. AI-defined driving expected on U.S. roads in 2026.
- Lucid Group: Integrating Alpamayo for their next-generation vehicles
- Uber Technologies: Exploring Alpamayo for autonomous ride-hailing
- Jaguar Land Rover: Evaluating the platform
Open-Source Availability#
| Resource | Location |
|---|---|
| Model weights | HuggingFace: nvidia/Alpamayo-R1-10B |
| VLA code | GitHub: NVlabs/alpamayo |
| Simulator | GitHub: NVlabs/alpasim |
| Datasets | HuggingFace: nvidia/PhysicalAI-AV |
| Paper | arXiv: 2511.00088 |
9. Current Limitations and Future Roadmap#
v1.0 Limitations#
The current release explicitly excludes several features planned for future versions:
Alpamayo v1.0 ── Current
├── ✓ Chain-of-Causation reasoning
├── ✓ Multi-camera trajectory prediction
├── ✓ Open-source model + simulator + data
├── ✗ RL post-training (planned)
├── ✗ Route/navigation conditioning (planned)
├── ✗ Meta-actions (lane changes, turns) (planned)
└── ✗ General VQA capability (planned)Known Challenges#
- Data collection: Still requires extensive human-guided data collection
- Model biases: Vulnerable to biases in training data distribution
- Hallucination: VLM backbone may hallucinate objects or scenarios
- Public trust: Autonomous vehicle incidents (e.g., 2023 Cruise ban) have increased scrutiny
10. Summary#
Alpamayo Platform:
┌────────────────────────────────────────────────────┐
│ │
│ ┌──────────────┐ ┌─────────┐ ┌──────────────┐ │
│ │ Alpamayo 1 │ │ AlpaSim │ │ Open Datasets│ │
│ │ (VLA Model) │ │ (Sim) │ │ (1,727 hrs) │ │
│ │ │ │ │ │ │ │
│ │ 10.5B params │ │ NuRec │ │ 25 countries │ │
│ │ CoC reasoning│ │ gRPC │ │ 2,500 cities │ │
│ │ 6.4s traj. │ │ Sim2Val │ │ Camera+LiDAR │ │
│ └──────────────┘ └─────────┘ └──────────────┘ │
│ │
│ "The ChatGPT moment for physical AI" │
│ — Jensen Huang, CES 2026 │
└────────────────────────────────────────────────────┘Alpamayo represents a fundamental shift in autonomous driving development — from proprietary, black-box systems to open, interpretable, reasoning-based AI. By making the model, simulator, and data all open-source, NVIDIA is betting that the AV industry will rally around a shared foundation rather than fragmented, duplicated efforts. Whether this bet pays off depends on how well CoC reasoning translates to real-world safety gains — but the transparency alone may prove essential for regulatory approval of Level 4 autonomy.