[TRTLLM-6823][doc] Add checkpoint refactor docs (#6592)

shaharmor98 · web-flow · commit b6baa9ed9baa · 2025-08-10T19:47:39.000-04:00
Signed-off-by: Shahar Mor &lt;17088876+shaharmor98@users.noreply.github.com&gt;
diff --git a/docs/source/torch/features/checkpoint_loading.md b/docs/source/torch/features/checkpoint_loading.md
@@ -0,0 +1,332 @@
+# Checkpoint Loading
+
+The PyTorch backend provides a flexible and extensible infrastructure for loading model checkpoints from different sources and formats, such as HuggingFace (HF) or custom formats, by implementing required components like the checkpoint's weight loader, mapper, and configuration parser.
+
+## Table of Contents
+1. [Overview](#overview)
+2. [Core Components](#core-components)
+3. [Built-in Checkpoint Formats](#built-in-checkpoint-formats)
+4. [Using Checkpoint Loaders](#using-checkpoint-loaders)
+5. [Creating Custom Checkpoint Loaders](#creating-custom-checkpoint-loaders)
+
+## Overview
+
+The checkpoint loading design is built around a plugin-like architecture that is separated into four distinct components:
+
+- **Checkpoint Loaders**: Orchestrates the loading process for specific formats.
+- **Config Loaders**: Handles model configuration parsing and validation.
+- **Weight Loaders**: Manages the actual loading of model weights from storage into memory.
+- **Weight Mappers**: Maps and transforms loaded weights to the TRTLLM model's definition.
+
+This modular design allows for easy extension to support new checkpoint formats while maintaining backward compatibility and performance optimizations. By separating checkpoint loading into four subcomponents, users can leverage existing implementations and introduce custom, checkpoint-specific components.
+
+To support a new checkpoint format, you must implement all four components.
+If the format shares components with an existing framework (such as HF), you only need to implement the components that differ.
+
+## Core Components
+
+### BaseCheckpointLoader
+
+The `BaseCheckpointLoader` is the central interface for all checkpoint loading operations. It provides a unified API regardless of the underlying checkpoint format. This interface is responsible for holding and exposing all objects required for the loading and parsing process.
+
+**Key Methods:**
+- `load_config(checkpoint_dir, **kwargs)`: Loads and returns a `ModelConfig` object
+- `load_weights(checkpoint_dir, **kwargs)`: Loads and returns a dictionary of weights
+- `get_initialized_weight_mapper(model, config)`: Returns a weight mapper initialized at runtime for the model
+- `cleanup()`: Releases resources and cleans up internal state
+
+### BaseConfigLoader
+
+Loads model configurations from checkpoint directories and parses them into a TRTLLM `ModelConfig`:
+
+```python
+from tensorrt_llm._torch.models.checkpoints.base_config_loader import BaseConfigLoader
+
+class CustomConfigLoader(BaseConfigLoader):
+    def load(self, checkpoint_dir: str, **kwargs) -> ModelConfig:
+        # Load and parse configuration from your custom format
+        pretrained_config = self._get_pretrained_config(checkpoint_dir, **kwargs)
+
+        return ModelConfig(pretrained_config=pretrained_config,
+                            ...)
+
+    def _get_pretrained_config(self, checkpoint_dir, **kwargs):
+        ...
+
+```
+
+### BaseWeightLoader
+
+Handles the loading of model weights from storage:
+
+```python
+from tensorrt_llm._torch.models.checkpoints.base_weight_loader import BaseWeightLoader
+
+class CustomWeightLoader(BaseWeightLoader):
+    def load_weights(self, checkpoint_dir: str) -> dict[str, Any]:
+        # Load weights from your custom format
+        # Return a dictionary mapping parameter names to tensors
+        return weights_dict
+```
+
+### BaseWeightMapper
+
+Transforms weights between different naming conventions and applies model-specific transformations to the TRTLLM model object.
+
+## Built-in Checkpoint Formats
+
+### HuggingFace Format
+
+Currently, the HF checkpoint loader is the primary built-in format and supports:
+
+- **Weights loading** (`.safetensors, .bin, .pth`): Load HF-compatible weights from disk
+- **Configuration parser** - Parse configuration information stored by HF into a TRTLLM `ModelConfig` object
+- **Weights Mapping** - Convert HF weights into a TRTLLM-compatible representation
+
+## Using Checkpoint Loaders
+
+### Basic Usage
+
+There are two main approaches for using checkpoint loading objects
+
+The first approach is through the llm-api, as shown in the following example:
+
+```python
+from tensorrt_llm import LLM
+
+hf_model_dir = "llama-models-v2/llama-v2-13b-hf"
+
+llm = LLM(model=hf_model_dir)
+```
+
+In this example, the `HfCheckpointLoader` is selected by default.
+
+To explicitly set the checkpoint loader, specify the required checkpoint-specific loader:
+
+```python
+from tensorrt_llm import LLM
+from tensorrt_llm._torch.models.checkpoints.hf.checkpoint_loader import HfCheckpointLoader
+
+hf_model_dir = "llama-models-v2/llama-v2-13b-hf"
+
+llm = LLM(model=hf_model_dir,
+          checkpoint_loader=HfCheckpointLoader())
+```
+
+Similarly, to use a basic checkpoint loader with a specific subcomponent, provide the desired subcomponent as needed:
+
+```python
+from tensorrt_llm import LLM
+from tensorrt_llm._torch.models.checkpoints.hf.checkpoint_loader import HfCheckpointLoader
+
+hf_model_dir = "llama-models-v2/llama-v2-13b-hf"
+
+llm = LLM(model=hf_model_dir,
+          checkpoint_loader=HfCheckpointLoader(weight_loader=MyCustomWeightLoader()))
+```
+
+In the second approach, you can directly use the individual checkpoint loading components:
+
+```python
+from tensorrt_llm._torch.models.checkpoints.hf.gemma3_weight_mapper import \
+    Gemma3HfWeightMapper
+from tensorrt_llm._torch.models.modeling_gemma3 import Gemma3ForCausalLM
+
+gemma3 = Gemma3ForCausalLM(model_config)
+weight_mapper = Gemma3HfWeightMapper()
+weight_mapper.init_model_and_config(gemma3, model_config)
+gemma3.load_weights(hf_gemma3.state_dict(), weight_mapper)
+```
+## Creating Custom Checkpoint Loaders
+
+To support a new checkpoint format, implement all four components. This section provides minimal templates for each.
+
+### When to Create Custom Components
+
+- **Complete New Format**: Implement all four components to support a new checkpoint format
+- **Custom Weight Storage**: Implement only a custom weight loader if you have a unique weight storage format (such as a custom binary format or database storage)
+- **Custom Configuration**: Implement only a custom config loader if your configuration format cannot be parsed by existing loaders
+- **Custom Weight Mapping**: Implement only a custom weight mapper if your model has unique weight naming or transformation requirements that are checkpoint-specific
+
+### Step 1: Create the Checkpoint Loader
+
+```python
+from typing import Optional
+from tensorrt_llm._torch.models.checkpoints.base_checkpoint_loader import BaseCheckpointLoader
+from tensorrt_llm._torch.models.checkpoints.base_config_loader import BaseConfigLoader
+from tensorrt_llm._torch.models.checkpoints.base_weight_loader import BaseWeightLoader
+from tensorrt_llm._torch.models.checkpoints.base_weight_mapper import BaseWeightMapper
+from tensorrt_llm._torch.models.modeling_utils import register_checkpoint_loader
+
+@register_checkpoint_loader("CUSTOM_FORMAT")
+class CustomCheckpointLoader(BaseCheckpointLoader):
+    def __init__(self,
+                 *,
+                 weight_loader: Optional[BaseWeightLoader] = None,
+                 weight_mapper: Optional[BaseWeightMapper] = None,
+                 config_loader: Optional[BaseConfigLoader] = None):
+        self._weight_loader = weight_loader or self.get_default_weight_loader()
+        self._config_loader = config_loader or self.get_default_config_loader()
+        self._weight_mapper = weight_mapper
+        self._checkpoint_format = "CUSTOM_FORMAT" # Set the checkpoint format name
+
+    def get_default_weight_loader(self) -> BaseWeightLoader:
+        return CustomWeightLoader()
+
+    def get_default_config_loader(self) -> BaseConfigLoader:
+        return CustomConfigLoader()
+```
+
+### Step 2: Create the Checkpoint Weight Loader
+
+```python
+from typing import Any
+from tensorrt_llm._torch.models.checkpoints.base_weight_loader import BaseWeightLoader
+from tensorrt_llm._torch.models.modeling_utils import register_checkpoint_weight_loader
+
+@register_checkpoint_weight_loader("CUSTOM_FORMAT")
+class CustomWeightLoader(BaseWeightLoader):
+    def load_weights(self, checkpoint_dir: str, **kwargs) -> dict[str, Any]:
+        """
+        Load weights from your custom format.
+
+        Args:
+            checkpoint_dir: Directory containing checkpoint files
+            **kwargs: Additional loading parameters
+
+        Returns:
+            Dictionary mapping parameter names to tensors
+        """
+        weights = {} # Implement your custom weight loading logic here
+
+        # Examples:
+        # - Load from custom binary files
+        # - Load from databases
+        # - Load from compressed archives
+        # - Apply custom preprocessing
+
+        return weights
+```
+
+### Step 3: Create the Checkpoint Config Loader
+
+```python
+from tensorrt_llm._torch.model_config import ModelConfig
+from tensorrt_llm._torch.models.checkpoints.base_config_loader import BaseConfigLoader
+from tensorrt_llm._torch.models.modeling_utils import register_config_loader
+
+@register_config_loader("CUSTOM_FORMAT")
+class CustomConfigLoader(BaseConfigLoader):
+    def load(self, checkpoint_dir: str, **kwargs) -> ModelConfig:
+        """
+        Load and parse configuration from your custom format.
+
+        Args:
+            checkpoint_dir: Directory containing configuration files
+            **kwargs: Additional loading parameters
+
+        Returns:
+            ModelConfig object containing parsed configuration
+        """
+        # Load your custom configuration format here
+        # Examples:
+        # - Parse YAML/TOML files
+        # - Convert from proprietary formats
+
+        pretrained_config = self._load_pretrained_config(checkpoint_dir, **kwargs)
+
+        return ModelConfig(
+            pretrained_config=pretrained_config,
+            # Add other ModelConfig parameters as needed
+        )
+
+    def _load_pretrained_config(self, checkpoint_dir: str, **kwargs):
+        """Load the raw configuration from your custom format."""
+        # Implement as needed
+        pass
+```
+
+### Step 4: Create the Checkpoint Weight Mapper
+
+```python
+from torch import nn
+from tensorrt_llm._torch.models.checkpoints.base_weight_mapper import BaseWeightMapper
+from tensorrt_llm._torch.models.modeling_utils import register_mapper
+
+@register_mapper("CUSTOM_FORMAT")
+class CustomWeightMapper(BaseWeightMapper):
+    def __init__(self):
+        super().__init__()
+        # Define any weight transformation callbacks
+        self._callbacks = [
+            # Add your custom weight transformation functions
+            # self._custom_transform_function,
+        ]
+
+    def map_weights(self) -> None:
+        """
+        Define mappings between source and target weight names.
+        """
+        self.mapping.update({
+            # Map source names to target names
+            # 'target_module_name': ['source_param1', 'source_param2'],
+            # For example: 'qkv_proj': ['q_proj', 'k_proj', 'v_proj']
+        })
+
+    def apply_callbacks(self, module: nn.Module, module_name: str,
+                        module_names_breakdown: list[str],
+                        weights: dict) -> list[dict]:
+        """
+        Apply weight transformations for modules that require special handling.
+
+        Args:
+            module: The target module
+            module_name: The specific module name being processed
+            module_names_breakdown: Module path components
+            weights: Source weights dictionary
+
+        Returns:
+            List of transformed weight dictionaries
+        """
+        module_weights = []
+
+        for new_name in self._mapping[module_name]:
+            # Filter weights for this specific parameter
+            fw = self.filter_weights(
+                '.'.join(module_names_breakdown + [new_name]), weights)
+
+            # Apply transformation callbacks
+            for callback in self._callbacks:
+                fw = callback(module, new_name, fw)
+
+            module_weights.append(fw)
+
+        return module_weights
+
+    def should_skip_module(self, module_name: str) -> bool:
+        """
+        Define which modules should be skipped during loading.
+        """
+        # Add logic to skip specific modules based on your requirements
+        # Examples:
+        # - Skip LoRA-specific modules
+        # - Skip temporary/auxiliary modules
+
+        return super().should_skip_module(module_name)
+```
+
+Note: When creating a custom mapper, you can define either a checkpoint-format-specific mapper. For example:
+
+```python
+@register_mapper("CUSTOM_FORMAT")
+class CustomWeightMapper(BaseWeightMapper)
+```
+
+Alternatively, you can define a checkpoint-model-specific mapper. For example:
+
+```python
+@register_mapper("CUSTOM_FORMAT", "Gemma3ForCausalLM")
+class CustomWeightMapper(BaseWeightMapper)
+```
+
+By setting the model name, the registered mapper will be associated with the specific model.
diff --git a/tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py b/tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py
@@ -67,8 +67,8 @@ def get(cls, checkpoint_format: str, **kwargs) -> "BaseCheckpointLoader":
                 f"available formats are: {CHECKPOINT_LOADER_FORMAT_DEFAULT_MAPPING.keys()}"
             )
 
-    def get_initilized_weight_mapper(self, model: nn.Module,
-                                     config: ModelConfig) -> BaseWeightMapper:
+    def get_initialized_weight_mapper(self, model: nn.Module,
+                                      config: ModelConfig) -> BaseWeightMapper:
         weight_mapper = None
         if self.weight_mapper is not None:
             self.weight_mapper.init_model_and_config(model, config)
diff --git a/tensorrt_llm/_torch/pyexecutor/model_engine.py b/tensorrt_llm/_torch/pyexecutor/model_engine.py
@@ -1079,7 +1079,7 @@ def init_meta_tensor(t: torch.Tensor):
                 else:
                     weights = checkpoint_loader.load_weights(checkpoint_dir)
 
-                weight_mapper = checkpoint_loader.get_initilized_weight_mapper(
+                weight_mapper = checkpoint_loader.get_initialized_weight_mapper(
                     model, config)
                 self._call_load_weights(model.load_weights, weights,
                                         weight_mapper)