Skip to content

Commit 4a2ae1d

Browse files
committed
Add: save_pretrained readme
Signed-off-by: Rahul Tuli <[email protected]>
1 parent cfa2abc commit 4a2ae1d

File tree

2 files changed

+110
-0
lines changed

2 files changed

+110
-0
lines changed

docs/save_pretrained.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Enhanced `save_pretrained` Arguments
2+
3+
The `llmcompressor` library extends Hugging Face's `save_pretrained` method with additional arguments to support model compression functionality. This document explains these extra arguments and how to use them effectively.
4+
5+
## How It Works
6+
7+
When you import `llmcompressor`, it automatically wraps the model's original `save_pretrained` method with an enhanced version that supports compression. This happens in two ways:
8+
9+
1. **Direct modification**: When you call `modify_save_pretrained(model)` directly
10+
2. **Automatic wrapping**: When you call `oneshot(...)`, which wraps `save_pretrained` under the hood
11+
12+
This means that after applying compression with `oneshot`, your model's `save_pretrained` method is already enhanced with compression capabilities, and you can use the additional arguments described below.
13+
14+
## Additional Arguments
15+
16+
When saving your compressed models, you can use the following extra arguments with the `save_pretrained` method:
17+
18+
| Parameter | Type | Default | Description |
19+
|-----------|------|---------|-------------|
20+
| `sparsity_config` | `Optional[SparsityCompressionConfig]` | `None` | Optional configuration for sparsity compression. If None and `skip_sparsity_compression_stats` is False, configuration will be automatically inferred from the model. |
21+
| `quantization_format` | `Optional[str]` | `None` | Optional format string for quantization. If not provided, it will be inferred from the model. |
22+
| `save_compressed` | `bool` | `True` | Controls whether to save the model in a compressed format. Set to `False` to save in the original dense format. |
23+
| `skip_sparsity_compression_stats` | `bool` | `True` | Controls whether to skip calculating sparsity statistics (e.g., global sparsity and structure) when saving the model. Set to `False` to include these statistics. |
24+
| `disable_sparse_compression` | `bool` | `False` | When set to `True`, skips any sparse compression during save, even if the model has been previously compressed. |
25+
26+
## Examples
27+
28+
### Applying Compression with oneshot
29+
30+
The simplest approach is to use `oneshot`, which handles both compression and wrapping `save_pretrained`:
31+
32+
```python
33+
from transformers import AutoModelForCausalLM, AutoTokenizer
34+
from llmcompressor import oneshot
35+
from llmcompressor.modifiers.quantization import GPTQModifier
36+
37+
# Load model
38+
model = AutoModelForCausalLM.from_pretrained("your-model")
39+
tokenizer = AutoTokenizer.from_pretrained("your-model")
40+
41+
# Apply compression - this also wraps save_pretrained
42+
oneshot(
43+
model=model,
44+
recipe=[GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"])],
45+
# Other oneshot parameters...
46+
)
47+
48+
# Now you can use the enhanced save_pretrained
49+
SAVE_DIR = "your-model-W8A8-compressed"
50+
model.save_pretrained(
51+
SAVE_DIR,
52+
save_compressed=True # Use the enhanced functionality
53+
)
54+
tokenizer.save_pretrained(SAVE_DIR)
55+
```
56+
57+
### Manual Approach (Without oneshot)
58+
59+
If you need more control, you can wrap `save_pretrained` manually:
60+
61+
```python
62+
from transformers import AutoModelForCausalLM
63+
from llmcompressor.transformers.sparsification import modify_save_pretrained
64+
65+
# Load model
66+
model = AutoModelForCausalLM.from_pretrained("your-model")
67+
68+
# Manually wrap save_pretrained
69+
modify_save_pretrained(model)
70+
71+
# Now you can use the enhanced save_pretrained
72+
model.save_pretrained(
73+
"your-model-path",
74+
save_compressed=True,
75+
skip_sparsity_compression_stats=False # to infer sparsity config
76+
)
77+
```
78+
79+
### Saving with Custom Sparsity Configuration
80+
81+
```python
82+
from compressed_tensors.sparsification import SparsityCompressionConfig
83+
84+
# Create custom sparsity config
85+
custom_config = SparsityCompressionConfig(
86+
format="2:4",
87+
block_size=16
88+
)
89+
90+
# Save with custom config
91+
model.save_pretrained(
92+
"your-model-custom-sparse",
93+
sparsity_config=custom_config,
94+
)
95+
```
96+
97+
## Notes
98+
99+
- When loading compressed models with `from_pretrained`, the compression format is automatically detected.
100+
- To use compressed models with vLLM, simply load them as you would any model:
101+
```python
102+
from vllm import LLM
103+
model = LLM("./your-model-compressed")
104+
```
105+
- Compression configurations are saved in the model's config file and are automatically applied when loading.
106+
107+
For more information about compression algorithms and formats, please refer to the documentation and examples in the llmcompressor repository.

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ def modify_save_pretrained(model: PreTrainedModel) -> None:
4343
2. Saves the recipe, appending any current recipes to existing recipe files
4444
3. Copies any necessary python files from the model cache
4545
46+
For more information on the compression parameterrs and model saving in
47+
llmcompressor, refer to docs/save_pretrained.md
48+
4649
:param model: The model whose save_pretrained method will be modified
4750
"""
4851
original = model.save_pretrained

0 commit comments

Comments
 (0)