[MoE] MoE Calibration with `calibrate_all_experts` #1760

kylesayrs · 2025-08-19T20:27:44Z

Coauthored with @dichn!

Purpose

Add support for calibrate_all_experts option, which sends all tokens to all experts, but still produces the same outputs as if tokens had been gated

Changes

Modify model definitions such that, in the case of calibrate_all_experts=True token gating occurs after passing tokens to experts, rather than before

# `calibrate_all_experts=True` by default
model = replace_modules_for_calibration(model, calibrate_all_experts=True)

Testing

Added correctness tests for new model definitions which checks that outputs are exactly the same
Added hook tests to make sure all experts are being sent tokens

Change Purpose: - Add calibrate_all_experts option to improve MoE calibration Change Details: - Add `calibrate_all_experts` flag to MoE layers - Update `replace_modules_for_calibration` and `moe_calibration_context` to propagate the flag into modules - Modify expert forward passes: * Normal mode (default): compute output only for tokens routed to top-k experts, and combine their weighted results in the final output * Calibration mode (`calibrate_all_experts=True`): compute output for all tokens on every expert, but still apply the top-k gating to decide which token outputs contribute to the final result. Testing: - Add unit test to verify all experts are triggered during MoE calibration

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-08-19T20:27:51Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

src/llmcompressor/modeling/llama4.py

Signed-off-by: Kyle Sayers <[email protected]>

src/llmcompressor/modeling/llama4.py

fynnsu

Left a comment below. I also agree with @brian-dellabetta's point that this could maybe be simplified by patching self.top_k temporarily.

src/llmcompressor/modeling/deepseek_v3.py

Signed-off-by: Kyle Sayers <[email protected]>

dsikka

I did not get a chance to run through these as of yet but it would be good to run through nvfp4 for llama4 and qwen3 and validating performance on the b200 before landing this, if anybody has bandwidth to run these

kylesayrs · 2025-09-09T14:19:45Z

Running those examples now

src/llmcompressor/utils/helpers.py

dsikka · 2025-09-12T22:11:35Z

FYI - Produced the following Qwen Model: nm-testing/Qwen3-30B-A3B-NVFP4-0912

With the following set-up:

lm_eval \
  --model vllm \
  --model_args pretrained="Qwen/Qwen3-30B-A3B",dtype=auto,max_model_len=4096,add_bos_token=True\
  --tasks gsm8k \
  --batch_size auto

NVFP4:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8734|±  |0.0092|
|     |       |strict-match    |     5|exact_match|↑  |0.8719|±  |0.0092|

Dense:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8544|±  |0.0097|
|     |       |strict-match    |     5|exact_match|↑  |0.8916|±  |0.0086|

The only thing I noticed was speed but that may also be because of beaker just being slow

dsikka

Similarly, for Llama4 NVFP4

lm_eval \
  --model vllm \
  --model_args pretrained="nm-testing/Llama-4-Scout-17B-16E-Instruct-NVFP4-0913",dtype=auto,max_model_len=4096,tensor_parallel_size=2,enable_chunked_prefill=True,enforce_eager=True \
  --tasks gsm8k_llama \
  --apply_chat_template \
  --fewshot_as_multiturn \
  --batch_size auto

dsikka

LGTM.
Two small comments

tests/testing_utils.py

src/llmcompressor/utils/helpers.py

kylesayrs · 2025-09-15T15:09:01Z

Thanks a ton @dsikka

Signed-off-by: Kyle Sayers <[email protected]>

dichn and others added 2 commits August 17, 2025 17:17

changes, qwen still doesn't work

20f1ed2

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[Calibrat] Llama4 and More tests~~ [MoE] Llama4 and More tests Aug 19, 2025

dsikka reviewed Aug 19, 2025

View reviewed changes

src/llmcompressor/modeling/llama4.py Outdated Show resolved Hide resolved

reduce precision expectations

53fbdb7

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[MoE] Llama4 and More tests~~ [MoE] MoE Calibration with calibrate_all_experts Aug 28, 2025

kylesayrs added 3 commits August 28, 2025 16:56

add note

c685b51

Signed-off-by: Kyle Sayers <[email protected]>

remove config

95d3402

Signed-off-by: Kyle Sayers <[email protected]>

default to true

81c2e1a

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs marked this pull request as ready for review August 28, 2025 21:00

kylesayrs added 2 commits August 28, 2025 17:04

Merge remote-tracking branch 'origin' into kylesayrs/calib

d2df4eb

remove unneeded imports

dd2d9e5

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs requested review from dsikka and shanjiaz August 28, 2025 21:05

brian-dellabetta reviewed Aug 28, 2025

View reviewed changes

src/llmcompressor/modeling/llama4.py Show resolved Hide resolved

fynnsu reviewed Aug 29, 2025

View reviewed changes

src/llmcompressor/modeling/deepseek_v3.py Show resolved Hide resolved

update test

dc62205

Signed-off-by: Kyle Sayers <[email protected]>

dsikka reviewed Sep 2, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/calib

3003c83

kylesayrs marked this pull request as draft September 9, 2025 11:49

brian-dellabetta previously approved these changes Sep 9, 2025

View reviewed changes

src/llmcompressor/utils/helpers.py Show resolved Hide resolved

Merge branch 'main' into kylesayrs/calib

3fdeee8

dsikka added the ready When a PR is ready for review label Sep 13, 2025

dsikka reviewed Sep 13, 2025

View reviewed changes

tests/testing_utils.py Show resolved Hide resolved

src/llmcompressor/utils/helpers.py Show resolved Hide resolved

kylesayrs marked this pull request as ready for review September 15, 2025 14:26

add kv cache disable tests

153c4dc

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs dismissed brian-dellabetta’s stale review via 153c4dc September 15, 2025 15:24

dsikka approved these changes Sep 15, 2025

View reviewed changes

brian-dellabetta approved these changes Sep 15, 2025

View reviewed changes

dsikka merged commit cf149b8 into main Sep 15, 2025
7 of 8 checks passed

dsikka deleted the kylesayrs/calib branch September 15, 2025 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] MoE Calibration with `calibrate_all_experts` #1760

[MoE] MoE Calibration with `calibrate_all_experts` #1760

kylesayrs commented Aug 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

fynnsu left a comment •

edited

Loading

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

kylesayrs commented Sep 9, 2025

Uh oh!

Uh oh!

dsikka commented Sep 12, 2025 •

edited

Loading

Uh oh!

dsikka left a comment

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

[MoE] MoE Calibration with calibrate_all_experts #1760

[MoE] MoE Calibration with calibrate_all_experts #1760

Conversation

kylesayrs commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

fynnsu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Sep 9, 2025

Uh oh!

Uh oh!

dsikka commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

[MoE] MoE Calibration with `calibrate_all_experts` #1760

[MoE] MoE Calibration with `calibrate_all_experts` #1760

kylesayrs commented Aug 19, 2025 •

edited

Loading

fynnsu left a comment •

edited

Loading

dsikka commented Sep 12, 2025 •

edited

Loading