Skip to content

09. Reproducer Guide

Yueming Hao edited this page Dec 19, 2025 · 1 revision

Reproducer Guide

This guide provides comprehensive documentation for TritonParse's reproducer system, which generates standalone Python scripts to reproduce specific kernel executions from trace files.

📋 Overview

What is a Reproducer?

A reproducer is a self-contained Python script that:

  • Recreates the exact execution environment of a kernel launch
  • Reconstructs input tensors using various strategies
  • Can be run independently without the original codebase
  • Enables debugging, benchmarking, and sharing of kernel issues

Why Use Reproducers?

Use Case Description
Bug Isolation Extract a single kernel execution to debug in isolation
Performance Analysis Benchmark specific kernel configurations
Issue Sharing Share reproducible test cases with collaborators
Regression Testing Compare kernel behavior across versions
Documentation Create executable examples of kernel usage

Workflow Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Generate Trace │────▶│   Parse Trace   │────▶│   Reproduce     │
│  (with launch)  │     │  (unified_parse)│     │   (reproduce)   │
└─────────────────┘     └─────────────────┘     └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
   *.ndjson logs          *.ndjson.gz files      repro_*.py scripts

🚀 Quick Start

Command Line

# Generate reproducer for a specific launch event
tritonparseoss reproduce ./parsed_output/trace.ndjson.gz --line 1 --out-dir repro_output

# Using kernel name instead of line index
tritonparseoss reproduce ./trace.ndjson.gz --kernel matmul_kernel --out-dir repro_output

# With custom template
tritonparseoss reproduce ./trace.ndjson.gz --line 1 --template tritonbench --out-dir bench_output

Python API

from tritonparse.reproducer.orchestrator import reproduce

result = reproduce(
    input_path="./parsed_output/trace.ndjson.gz",
    line_index=1,                    # 0-based index (0 = compilation, 1+ = launches)
    out_dir="./repro_output",
    template="example",              # Built-in template
)

print(f"Script: {result['repro_script']}")
print(f"Context: {result['repro_context']}")

Generated Files

repro_output/<kernel_name>/
├── repro_<timestamp>.py              # Standalone executable script
├── repro_context_<timestamp>.json    # Kernel metadata, args, and launch info
└── <hash>.bin                        # Tensor blob files (if enabled during tracing)

🔑 Core Concepts

Launch Events and line_index

A trace file contains multiple events in chronological order:

Line Index Event Type Description
0 compilation Kernel compilation metadata and IR
1 launch First kernel execution
2 launch Second kernel execution
... ... ...
N launch_diff Summary of launch variations

When generating a reproducer:

  • line_index=0 targets the compilation event (rarely used for reproduction)
  • line_index=1 targets the first launch event (most common)
  • Higher indices target subsequent launches

ContextBundle Structure

Internally, the reproducer builds a ContextBundle containing all information needed:

@dataclass
class ContextBundle:
    kernel_info: KernelInfo       # Function name, file path, source code
    compile: Dict[str, Any]       # num_warps, num_stages, arch, backend
    launch: Dict[str, Any]        # grid, kwargs
    args: Dict[str, Any]          # All arguments (scalars + tensors)
    tensor_args: Dict[str, Any]   # Tensor-specific information
    raw_launch_event: Dict        # Original launch event data
    raw_comp_event: Dict          # Original compilation event data

KernelImportMode

Controls how the kernel function is imported in the generated script:

Mode Description When to Use
DEFAULT Import from original source file Standard case, original file accessible
COPY Embed kernel source in reproducer Share without original codebase
OVERRIDE_TTIR Use TTIR with monkeypatch Debug specific IR versions

CLI Usage:

tritonparseoss reproduce trace.ndjson.gz --line 1 --kernel-import copy
tritonparseoss reproduce trace.ndjson.gz --line 1 --kernel-import override-ttir

Python Usage:

from tritonparse.reproducer.orchestrator import reproduce
from tritonparse.reproducer.types import KernelImportMode

result = reproduce(
    input_path="./trace.ndjson.gz",
    line_index=1,
    out_dir="./repro",
    kernel_import=KernelImportMode.COPY,
)

📊 Tensor Reconstruction Strategies

The reproducer supports three tensor reconstruction strategies, applied in priority order:

1. Blob Files (Highest Fidelity)

Exact tensor data saved during tracing.

Enable during tracing:

tritonparse.structured_logging.init(
    "./logs/",
    enable_trace_launch=True,
    enable_tensor_blob_storage=True,      # Save actual tensor data
    tensor_storage_quota=10 * 1024**3,    # 10GB limit
)

Or with environment variables:

export TRITONPARSE_SAVE_TENSOR_BLOBS=1
export TRITONPARSE_TENSOR_STORAGE_QUOTA=10737418240  # 10GB

Behavior:

  • Saves .bin files alongside traces
  • Reproducer loads exact values via load_tensor()
  • Best for numerical accuracy debugging

2. Statistical Reconstruction (Good Approximation)

Uses saved statistics to generate similar data.

Enable during tracing:

tritonparse.structured_logging.init(
    "./logs/",
    enable_trace_launch=True,
    enable_more_tensor_information=True,  # Save statistics
)

Statistics saved:

  • mean - Average value
  • std - Standard deviation
  • min - Minimum value
  • max - Maximum value

Reconstruction logic:

# For floating point tensors
tensor = torch.randn(shape, dtype=torch.float32, device=device) * std + mean
tensor = torch.clamp(tensor, min=min_val, max=max_val)
tensor = tensor.to(target_dtype)

# For integer tensors
tensor = torch.round(tensor).to(target_dtype)

3. Random Data (Fallback)

Basic random generation when no statistics available.

# Floating point: random values
torch.empty(shape, dtype=dtype, device=device).random_()

# Integer: random integers
torch.empty(shape, dtype=dtype, device=device).random_()

# Complex: random real + imaginary
torch.complex(real_part, imag_part)

Stride and Storage Offset Handling

The reproducer preserves non-contiguous tensor layouts:

# If tensor has custom stride or storage_offset
strided_view = storage_tensor.as_strided(
    size=shape,
    stride=stride,
    storage_offset=storage_offset
)

This ensures:

  • Transposed tensors work correctly
  • Sliced tensors maintain their layout
  • Memory access patterns match original

📝 Template System

Built-in Templates

Template Description Use Case
example Basic standalone script General debugging
tritonbench TritonBench-compatible operator Performance benchmarking

Template Placeholders

Templates use placeholders that get replaced during generation:

Placeholder Description
{{KERNEL_IMPORT_PLACEHOLDER}} Import statements for the kernel
{{KERNEL_INVOCATION_PLACEHOLDER}} Kernel launch code with arguments
{{KERNEL_SYSPATH_PLACEHOLDER}} sys.path setup for imports
{{JSON_FILE_NAME_PLACEHOLDER}} Context JSON filename
{{UTILITY_FUNCTIONS_PLACEHOLDER}} Helper functions (create_args_from_json, etc.)
{{IR_OVERRIDE_SETUP_PLACEHOLDER}} TTIR override setup (for OVERRIDE_TTIR mode)

Creating Custom Templates

Step 1: Create template file

# my_template.py
"""Custom reproducer template for my use case"""

import torch
import logging

logger = logging.getLogger(__name__)

# {{IR_OVERRIDE_SETUP_PLACEHOLDER}}

# {{KERNEL_SYSPATH_PLACEHOLDER}}

# {{KERNEL_IMPORT_PLACEHOLDER}}

# {{UTILITY_FUNCTIONS_PLACEHOLDER}}


def run_kernel():
    """Execute the reproduced kernel."""
    from pathlib import Path

    script_dir = Path(__file__).resolve().parent
    json_file = script_dir / "{{JSON_FILE_NAME_PLACEHOLDER}}"
    grid, args_dict = create_args_from_json_file(str(json_file))

    print("=" * 60)
    print("CUSTOM REPRODUCER")
    print("=" * 60)
    print(f"Grid: {grid}")
    for name, arg in args_dict.items():
        if torch.is_tensor(arg):
            print(f"  {name}: tensor {arg.shape} {arg.dtype}")
        else:
            print(f"  {name}: {arg}")

    # {{KERNEL_INVOCATION_PLACEHOLDER}}

    torch.cuda.synchronize()
    print("Execution complete!")


if __name__ == "__main__":
    run_kernel()

Step 2: Use the template

tritonparseoss reproduce trace.ndjson.gz --line 1 --template /path/to/my_template.py
result = reproduce(
    input_path="./trace.ndjson.gz",
    line_index=1,
    template="/path/to/my_template.py",
    out_dir="./repro",
)

Example: TritonBench Template

The built-in tritonbench template generates a TritonBench-compatible operator:

tritonparseoss reproduce trace.ndjson.gz --line 1 --template tritonbench --out-dir bench

This creates a script compatible with:

python -m tritonbench --op <operator_name> --mode latency

🔧 Advanced Usage

Finding Kernels by Name

Instead of specifying line_index, you can find kernels by name:

from tritonparse.reproducer.orchestrator import reproduce

result = reproduce(
    input_path="./trace.ndjson.gz",
    kernel_name="matmul_kernel",    # Find by exact name
    launch_id=0,                     # Which launch instance (0 = first)
    out_dir="./repro",
)

CLI equivalent:

tritonparseoss reproduce trace.ndjson.gz --kernel matmul_kernel --launch-id 0

Custom PlaceholderReplacer

For advanced customization, implement a custom replacer:

from tritonparse.reproducer.placeholder_replacer import PlaceholderReplacer, DefaultPlaceholderReplacer
from tritonparse.reproducer.orchestrator import reproduce

class MyReplacer(DefaultPlaceholderReplacer):
    def replace(self, template_code, context_bundle, **kwargs):
        # Call parent for standard replacements
        code = super().replace(template_code, context_bundle, **kwargs)

        # Add custom modifications
        code = code.replace("{{MY_CUSTOM_PLACEHOLDER}}", "my_value")

        return code

result = reproduce(
    input_path="./trace.ndjson.gz",
    line_index=1,
    out_dir="./repro",
    replacer=MyReplacer(),
)

triton_kernels Custom Types

For projects using triton_kernels library:

# The reproducer automatically handles these types if triton_kernels is installed:
from triton_kernels.tensor import Tensor, Storage, StridedLayout

Supported types:

  • triton_kernels.tensor.Tensor
  • triton_kernels.tensor.Storage
  • StridedLayout

If not installed, you'll see:

RuntimeError: Optional dependency 'triton_kernels.tensor' is not installed

Solution:

pip install triton_kernels

📁 CLI Reference

reproduce command

tritonparseoss reproduce <input_file> [options]

Arguments:

Argument Description
<input_file> Path to trace file (.ndjson or .ndjson.gz)

Options:

Option Description Default
--line <N> Line index (0-based) of launch event 0
--kernel <name> Find kernel by exact name -
--launch-id <N> Launch instance when using --kernel 0
--out-dir <path> Output directory repro_output/<kernel>/
`--template <name path>` Template name or path to custom template
--kernel-import <mode> Import mode: default, copy, override-ttir default

Examples:

# Basic usage
tritonparseoss reproduce ./trace.ndjson.gz --line 1 --out-dir ./repro

# Find by kernel name
tritonparseoss reproduce ./trace.ndjson.gz --kernel add_kernel --out-dir ./repro

# Use tritonbench template
tritonparseoss reproduce ./trace.ndjson.gz --line 1 --template tritonbench

# Embed kernel source
tritonparseoss reproduce ./trace.ndjson.gz --line 1 --kernel-import copy

# Multiple options
tritonparseoss reproduce ./trace.ndjson.gz \
    --kernel matmul_kernel \
    --launch-id 2 \
    --template /path/to/custom_template.py \
    --kernel-import copy \
    --out-dir ./my_repro

info command

Query kernel information from traces (useful before reproducing):

# List all kernels
tritonparseoss info ./trace.ndjson.gz

# Query specific kernel
tritonparseoss info ./trace.ndjson.gz --kernel matmul_kernel

# Show argument details
tritonparseoss info ./trace.ndjson.gz --kernel matmul_kernel --args-list

🎯 Use Cases

Bug Isolation

Extract a problematic kernel and debug in isolation:

# 1. Generate reproducer
tritonparseoss reproduce trace.ndjson.gz --line 42 --out-dir bug_repro

# 2. Run reproducer
cd bug_repro/<kernel_name>
python repro_*.py

# 3. Modify and debug
# Edit the generated script to add debugging

Performance Benchmarking

Create benchmarkable kernel scripts:

# Generate tritonbench-compatible reproducer
tritonparseoss reproduce trace.ndjson.gz --line 1 --template tritonbench --out-dir bench

# Run with tritonbench
python -m tritonbench --op bench/<kernel_name>/repro_*.py --mode latency

Or add manual timing:

# In generated script, add:
import time

start = time.perf_counter()
for _ in range(100):
    # {{KERNEL_INVOCATION_PLACEHOLDER}}
    torch.cuda.synchronize()
elapsed = time.perf_counter() - start
print(f"Average time: {elapsed / 100 * 1000:.3f} ms")

Kernel Comparison

Compare behavior across versions:

# Generate reproducers for two traces
tritonparseoss reproduce trace_v1.ndjson.gz --line 1 --out-dir v1
tritonparseoss reproduce trace_v2.ndjson.gz --line 1 --out-dir v2

# Compare outputs
python v1/<kernel>/repro_*.py > v1_output.txt
python v2/<kernel>/repro_*.py > v2_output.txt
diff v1_output.txt v2_output.txt

Integration with File Diff View

Combine reproducer workflow with File Diff for comprehensive analysis:

  1. Generate traces from both versions
  2. Use File Diff View to identify differing kernels
  3. Generate reproducers for specific kernels
  4. Debug/benchmark in isolation
  5. Re-trace after fixes and compare again
# After identifying kernel differences in File Diff View
result = reproduce(
    input_path="./trace_v1.ndjson.gz",
    kernel_name="problematic_kernel",
    out_dir="./debug_v1",
)

# Modify and test the reproducer
# ...

# After fix, re-trace and compare in File Diff View

Sharing Test Cases

Create portable test cases:

# Use COPY mode to embed kernel source
tritonparseoss reproduce trace.ndjson.gz \
    --line 1 \
    --kernel-import copy \
    --out-dir shareable_repro

# The generated script is self-contained
# Share the entire shareable_repro/<kernel>/ directory

❓ Troubleshooting

Common Issues

Q: "Event at index N is not a launch event"

The specified line_index points to a non-launch event (e.g., compilation or launch_diff).

Solution: Use --line 1 or higher for launch events. Use tritonparseoss info to list events.

Q: "Could not find compilation hash in launch event"

The trace may be incomplete or corrupted.

Solution: Re-generate the trace with proper initialization.

Q: "Optional dependency 'triton_kernels.tensor' is not installed"

The trace uses custom tensor types from triton_kernels.

Solution: pip install triton_kernels

Q: Reproducer runs but produces different results

Tensor data reconstruction may not perfectly match original.

Solution:

  • Enable enable_tensor_blob_storage=True during tracing for exact data
  • Or enable enable_more_tensor_information=True for better approximation

Q: "Could not resolve kernel file path"

The kernel source file path cannot be determined.

Solution: Use --kernel-import copy to embed the source directly.

Debug Mode

Enable verbose logging:

export TRITONPARSE_DEBUG=1
tritonparseoss reproduce trace.ndjson.gz --line 1

Or in Python:

import logging
logging.getLogger("tritonparse").setLevel(logging.DEBUG)

🔗 Related Documentation

Clone this wiki locally