enable llama4 int8 quantization baseline #522

WeiweiZhang1 · 2025-04-16T10:18:55Z

for Xeon SGlang INT8 llama4 model workround

pull this PR code
pip install torch torchvision
cd auto-round/
pip install -e .[cpu]
sh run_llama4_quant.sh model_path save_path
sh run_qwen3.sh model_path save_path

Signed-off-by: Zhang, Weiwei1 <[email protected]>

wenhuach21 · 2025-04-18T05:35:53Z

auto_round/utils.py

+        state_dict = translate_2_sglang_int8(model)
+    max_shard_size = 40 * 1024**3  # 40GB
+    shards = {}
+    current_shard = {}


this part should be refined and supported in main branch.
Better follow the origin code style if possible

Signed-off-by: Zhang, Weiwei1 <[email protected]>

enable llama4 int8 quantization baseline

456b4d6

Signed-off-by: Zhang, Weiwei1 <[email protected]>

WeiweiZhang1 added wontfix This will not be worked on draft labels Apr 16, 2025

WeiweiZhang1 added 3 commits April 16, 2025 23:00

add save config

9337aa7

Signed-off-by: Zhang, Weiwei1 <[email protected]>

fixtypo

e268987

Signed-off-by: Zhang, Weiwei1 <[email protected]>

refine script

470927a

Signed-off-by: Zhang, Weiwei1 <[email protected]>

wenhuach21 reviewed Apr 18, 2025

View reviewed changes

WeiweiZhang1 added 5 commits April 18, 2025 13:58

typofix

dd84a4d

Signed-off-by: Zhang, Weiwei1 <[email protected]>

refine shell

87e1615

Signed-off-by: Zhang, Weiwei1 <[email protected]>

enable qwen3 sglang int8 quantize

8a464a1

Signed-off-by: Zhang, Weiwei1 <[email protected]>

add qwen3 shell script

1b5b5e2

Signed-off-by: Zhang, Weiwei1 <[email protected]>

fix llama4 export issue

b058fc0

Signed-off-by: Zhang, Weiwei1 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable llama4 int8 quantization baseline #522

enable llama4 int8 quantization baseline #522

Uh oh!

WeiweiZhang1 commented Apr 16, 2025 •

edited

Loading

Uh oh!

wenhuach21 Apr 18, 2025

Uh oh!

Uh oh!

enable llama4 int8 quantization baseline #522

Are you sure you want to change the base?

enable llama4 int8 quantization baseline #522

Uh oh!

Conversation

WeiweiZhang1 commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenhuach21 Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WeiweiZhang1 commented Apr 16, 2025 •

edited

Loading