Skip to content

Commit efb4633

Browse files
Merge branch 'main' into issue-1927-type-hints
2 parents 57255e6 + 175d562 commit efb4633

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ venv.bak/
126126

127127
# mkdocs documentation
128128
/site
129-
docs/.cache/
129+
docs/.cache/*
130130

131131
# mypy
132132
.mypy_cache/

docs/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Review the [LLM Compressor v0.8.0 release notes](https://github.com/vllm-project
2020
!!! info "Support for multiple modifiers in oneshot compression runs"
2121
LLM Compressor now supports using multiple modifiers in oneshot compression runs such as applying both AWQ and GPTQ in a single model.
2222

23-
Using multiple modifiers is an advanced usage of LLM Compressor and an active area of research. See [Non-uniform Quantization](examples/quantization_non_uniform/) for more detail and example usage.
23+
Using multiple modifiers is an advanced usage of LLM Compressor and an active area of research. See [Non-uniform Quantization](/examples/quantization_non_uniform/) for more detail and example usage.
2424

2525
!!! info "Quantization and calibration support for Qwen3 models"
2626
Quantization and calibration support for Qwen3 Next models has been added to LLM Compressor.
@@ -39,16 +39,16 @@ Review the [LLM Compressor v0.8.0 release notes](https://github.com/vllm-project
3939
## Recent Updates
4040

4141
!!! info "QuIP and SpinQuant-style Transforms"
42-
The newly added [`QuIPModifier`](../examples/transform/quip_example.py) and [`SpinQuantModifier`](../examples/transform/spinquant_example.py) allow you to quantize models after injecting hadamard weights into the computation graph, reducing quantization error and greatly improving accuracy recovery for low bit-weight and activation quantization.
42+
The newly added [`QuIPModifier` and `SpinQuantModifier`](/examples/transform) transforms allow you to quantize models after injecting hadamard weights into the computation graph, reducing quantization error and greatly improving accuracy recovery for low bit-weight and activation quantization.
4343

4444
!!! info "DeepSeekV3-style Block Quantization Support"
45-
Allows for more efficient compression of large language models without needing a calibration dataset. Quantize a Qwen3 model to [W8A8](../examples/quantization_w8a8_fp8/fp8_block_example.py).
45+
Allows for more efficient compression of large language models without needing a calibration dataset. Quantize a Qwen3 model to [W8A8](/examples/quantization_w8a8_fp8/).
4646

4747
!!! info "FP4 Quantization - now with MoE and non-uniform support"
48-
Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the [NVFP4 configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [FP4 activation support](../examples/quantization_w4a4_fp4/llama3_example.py), [MoE support](../examples/quantization_w4a4_fp4/qwen_30b_a3b.py), and [Non-uniform quantization support](../examples/quantization_non_uniform/README.md) where some layers are selectively quantized to FP8 for better recovery. You can also mix other quantization schemes, such as INT8 and INT4.
48+
Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the [NVFP4 configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [FP4 activation support](/examples/quantization_w4a4_fp4/), [MoE support](/examples/quantization_w4a4_fp4/), and [Non-uniform quantization support](/examples/quantization_non_uniform/) where some layers are selectively quantized to FP8 for better recovery. You can also mix other quantization schemes, such as INT8 and INT4.
4949

5050
!!! info "Llama4 Quantization Support"
51-
Quantize a Llama4 model to [W4A16](../examples/quantization_w4a16) or [NVFP4](../examples/quantization_w4a4_fp4/llama4_example.py). The checkpoint produced can seamlessly run in vLLM.
51+
Quantize a Llama4 model to [W4A16](/examples/quantization_w4a16/) or [NVFP4](/examples/quantization_w4a4_fp4/). The checkpoint produced can seamlessly run in vLLM.
5252

5353
For more information, check out the [latest release on GitHub](https://github.com/vllm-project/llm-compressor/releases/latest).
5454

0 commit comments

Comments
 (0)