Overview

This page provides a technical overview of the PyTorch codebase, focusing on its high-level architecture, major systems, and their relationships. The goal is to orient developers and contributors to the main subsystems, their code entry points, and how they interact to enable dynamic graph capture, optimization, and execution across multiple hardware backends.

PyTorch’s architecture is organized around a modular compilation pipeline, device backends, distributed training support, and advanced features such as export, quantization, and shape analysis. The most critical systems are the compilation pipeline (TorchDynamo, FX, TorchInductor), device backends (CUDA, MPS, CPU, XPU, MTIA), and deployment/export infrastructure.

For detailed information on the compilation pipeline, see page [2]. For device backends, see page [3]. For distributed training, see page [4].

High-Level System Architecture

Diagram: PyTorch Core System Overview Key points:

  • User entrypoints are torch.compile and torch.export.
  • torch._dynamo traces Python code and produces FX graphs.
  • torch._inductor lowers FX graphs to device-specific kernels.
  • Device backends execute generated kernels.
  • Distributed backends enable multi-device/multi-node training.

Sources:

torch/_dynamo/eval_frame.py1-100 torch/_inductor/config.py1-100 torch/_inductor/compile_fx.py1-100 torch/_inductor/codecache.py1-100 torch/_dynamo/guards.py1-100 torch/_dynamo/utils.py1-100

Major Systems and Code Entry Points

The following diagram maps major system names to their primary code entities and files.

Diagram: System-to-Code Mapping

SystemMain Code Entry Points / Files
TorchDynamotorch/_dynamo/eval_frame.py, torch/_dynamo/guards.py
FXtorch/fx/graph.py, torch/fx/symbolic_shapes.py
TorchInductortorch/_inductor/compile_fx.py, torch/_inductor/ir.py
Device Backendstorch/cuda/, torch/xpu/, torch/mps/, aten/src/ATen/
Code Cachetorch/_inductor/codecache.py
Exporttorch/export/, torch/export/graph_signature.py
Distributedtorch/distributed/

Sources: torch/_dynamo/eval_frame.py1-100 torch/_dynamo/guards.py1-100 torch/_inductor/compile_fx.py1-100 torch/_inductor/ir.py1-100 torch/_inductor/codecache.py1-100

torch/export/graph_signature.py1-100

Compilation and Execution Pipeline

The core of PyTorch’s system is the compilation pipeline, which transforms user code into optimized device code.

Diagram: Compilation and Execution Pipeline

  • torch._dynamo traces Python bytecode and builds FX graphs.
  • torch._inductor lowers FX graphs, schedules, fuses, and generates device code.
  • Device backends execute the generated code.
  • Caching and autotuning are used for repeated executions.

Sources:

torch/_dynamo/eval_frame.py1-100 torch/_dynamo/guards.py1-100 torch/_inductor/compile_fx.py1-100 torch/_inductor/codecache.py1-100

System Responsibilities and Relationships

SubsystemResponsibilitiesKey Files / Classes / Functions
torch._dynamoPython bytecode tracing, guard management, FX graph extractioneval_frame.py, guards.py, symbolic_convert.py
torch.fxGraph IR, graph transformations, pattern matchinggraph.py, symbolic_shapes.py
torch._inductorFX graph lowering, IR, scheduling, codegen, kernel caching, autotuningcompile_fx.py, ir.py, scheduler.py, codecache.py
Device BackendsDevice management, memory allocation, kernel executiontorch/cuda/, torch/xpu/, torch/mps/
torch.exportModel export, symbolic shape tracing, deployment workflowsexport/, graph_signature.py
DistributedMulti-device/multi-node training, collective communicationProcessGroupNCCL, ProcessGroupGloo

Sources:

torch/_dynamo/eval_frame.py1-100 torch/_dynamo/guards.py1-100 torch/_inductor/compile_fx.py1-100 torch/_inductor/ir.py1-100 torch/_inductor/scheduler.py1-100 torch/_inductor/codecache.py1-100

Guard and Cache Systems

The guard system and cache layers are critical for correctness and performance.

Diagram: Guard Management and Caching

  • Guards are created during tracing to track dependencies.
  • Guarded code cache ensures compiled code is only reused when valid.
  • If guards fail, recompilation is triggered.

Sources:

torch/_dynamo/guards.py1-100 torch/_dynamo/eval_frame.py1-100 torch/_inductor/codecache.py1-100

Device Backends and Distributed Support

PyTorch supports multiple device backends and distributed training.

Diagram: Device Backend and Memory Management

  • Device backends provide APIs for device selection, memory management, and kernel execution.
  • Distributed communication is handled via NCCL and Gloo.

Sources:

torch/_inductor/codegen/mps.py1-100 torch/_inductor/codegen/triton.py1-100 torch/_inductor/codegen/cpp_wrapper_cpu.py1-100

torch/_inductor/config.py1-100

Configuration System

Both TorchDynamo and TorchInductor have extensive configuration options:

  • Runtime Behavior: Controls guard settings, specialization options
  • Debugging Options: Enables logging and development features
  • Performance Tuning: Adjusts optimization parameters
  • Feature Toggles: Enables/disables experimental features

Sources:

torch/_dynamo/config.py101-200

torch/_inductor/config.py101-300

Integration with PyTorch Ecosystem

The compilation system integrates with other PyTorch components:

  • torch.export: Enables model serialization for deployment
  • Distributed Training: Supports optimized distributed operations
  • Quantization: Works with PyTorch’s quantization system
  • Autograd: Preserves automatic differentiation capabilities

Sources:

torch/_dynamo/output_graph.py1-100

torch/init.py1-100

Variable Tracking System

TorchDynamo uses a sophisticated variable tracking system to follow values through Python code:

Sources: torch/_dynamo/variables/builder.py1-100 torch/_dynamo/variables/functions.py1-100 torch/_dynamo/variables/builtin.py1-100torch/_dynamo/variables/user_defined.py1-100

Conclusion

PyTorch’s compilation system provides a powerful way to optimize PyTorch code for better performance. By capturing Python operations, transforming them into optimized code, and managing execution with a sophisticated guard system, it enables efficient execution across different hardware targets while maintaining the flexibility and ease of use of PyTorch.