[main] Fix AddRMSNormW8A8Quant init bug #2440

socrahow · 2025-08-19T08:38:59Z

…gemmarmsnorm operator of the gemma3 model on NPU

What this PR does / why we need it?

Fix AddRMSNormW8A8Quant init bug and optimize the performance of the gemmarmsnorm operator of the gemma3 model on NPU

Before fixing bug，it will raise error "TypeError: wrapper_rmsnorm_init..init() takes 2 positional arguments but 6 were given". After fixing, it can run smoothly.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test by running the gemma3 model

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@f4cd80f

github-actions · 2025-08-19T08:39:47Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for the Gemma3 model on Ascend NPUs, including performance optimizations for the gemmarmsnorm operator and a fix for an initialization bug in AddRMSNormW8A8Quant. The changes are well-structured, adding the necessary model definitions, registration, and a test case.

However, I've found a critical issue in the new AscendGemma3DecoderLayer implementation where an incorrect layer is used for quantization fusion in the MLP block, likely due to a copy-paste error. This would lead to incorrect model execution. A fix is suggested in the detailed comments.

gemini-code-assist · 2025-08-19T08:39:50Z

vllm_ascend/models/gemma3.py

+                      AscendW8A8LinearMethod):
+            self.pre_feedforward_layernorm = AddRMSNormW8A8Quant(
+                config.hidden_size,
+                layer=self.self_attn.qkv_proj,


There appears to be a copy-paste error. The pre_feedforward_layernorm is applied just before the MLP block. For the AddRMSNormW8A8Quant fusion to work correctly, it needs to be linked with the subsequent linear layer, which is self.mlp.gate_up_proj, not self.self_attn.qkv_proj.

Suggested change

layer=self.self_attn.qkv_proj,

layer=self.mlp.gate_up_proj,

realliujiaxu · 2025-08-19T08:45:30Z

Please briefly describe the optimization principles, solutions, and the optimization results.

socrahow · 2025-08-19T09:06:56Z

Please briefly describe the optimization principles, solutions, and the optimization results.

Before optimizing，the rmsnorm time in one decoding is 531.5us. After optimizing，the rmsnorm time in one decoding is 105us.

ApsarasX · 2025-08-19T09:40:09Z

vllm_ascend/ops/layernorm.py

        dtype: Optional[torch.dtype] = None,
    ) -> None:
-        super().__init__(hidden_size, eps, var_hidden_size, has_weight, dtype)
+        super().__init__(hidden_size=hidden_size, eps=eps, var_hidden_size=var_hidden_size, has_weight=has_weight, dtype=dtype)


I suggest splitting this PR into two PRs: one focused on the bugfix, and the other focused on the new model.

OK, I delete new model part

ApsarasX · 2025-08-20T08:23:41Z

Maybe we should modify the wrapper_rmsnorm_init method just like the following?

def wrapper_rmsnorm_init(func):

-    def init(self, hidden_size: int, **extra_args) -> None:
-        func(self, hidden_size, **extra_args)
+    def init(self, hidden_size: int,  *args, **kwargs) -> None:
+        func(self, hidden_size, *args, **kwargs)
        self.ignore_anti = True
        self.bias = torch.nn.Parameter(torch.zeros(hidden_size),
                                       requires_grad=False)

    return init

codecov · 2025-08-22T07:58:26Z

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.71%. Comparing base (2bb7e55) to head (7223570).
⚠️ Report is 195 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/layernorm.py	50.00%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (50.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2440      +/-   ##
==========================================
+ Coverage   76.18%   77.71%   +1.52%     
==========================================
  Files         120      132      +12     
  Lines       13532    17520    +3988     
==========================================
+ Hits        10310    13615    +3305     
- Misses       3222     3905     +683

Flag	Coverage Δ
unittests	`77.71% <50.00%> (+1.52%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-16T14:33:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions bot added module:tests module:ops labels Aug 19, 2025

gemini-code-assist bot reviewed Aug 19, 2025

View reviewed changes

socrahow changed the title ~~Fix AddRMSNormW8A8Quant init bug and optimize the performance of the gemmarmsnorm operator of the gemma3 model on NPU~~ [main] Fix AddRMSNormW8A8Quant init bug and optimize the performance of the gemmarmsnorm operator of the gemma3 model on NPU Aug 19, 2025

ApsarasX reviewed Aug 19, 2025

View reviewed changes

socrahow force-pushed the main branch from fafb95e to 0b0946c Compare August 20, 2025 06:57

socrahow changed the title ~~[main] Fix AddRMSNormW8A8Quant init bug and optimize the performance of the gemmarmsnorm operator of the gemma3 model on NPU~~ [main] Fix AddRMSNormW8A8Quant init bug Aug 20, 2025

github-actions bot removed the module:tests label Aug 20, 2025

socrahow mentioned this pull request Aug 20, 2025

[Model] Optimizing gemma3 model's GemmaRMSNorm function #2456

Closed

socrahow force-pushed the main branch from 7caf726 to 541a7cb Compare August 22, 2025 06:29

github-actions bot added the merge-conflicts label Sep 16, 2025

socrahow closed this Sep 18, 2025

socrahow force-pushed the main branch from 7223570 to af2a886 Compare September 18, 2025 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main] Fix AddRMSNormW8A8Quant init bug #2440

[main] Fix AddRMSNormW8A8Quant init bug #2440

Uh oh!

socrahow commented Aug 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 19, 2025

Uh oh!

socrahow Aug 19, 2025

Uh oh!

realliujiaxu commented Aug 19, 2025

Uh oh!

socrahow commented Aug 19, 2025

Uh oh!

ApsarasX Aug 19, 2025

Uh oh!

socrahow Aug 20, 2025

Uh oh!

ApsarasX commented Aug 20, 2025

Uh oh!

codecov bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

Uh oh!

[main] Fix AddRMSNormW8A8Quant init bug #2440

[main] Fix AddRMSNormW8A8Quant init bug #2440

Uh oh!

Conversation

socrahow commented Aug 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

socrahow Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

realliujiaxu commented Aug 19, 2025

Uh oh!

socrahow commented Aug 19, 2025

Uh oh!

ApsarasX Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

socrahow Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX commented Aug 20, 2025

Uh oh!

codecov bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

Uh oh!

socrahow commented Aug 19, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 22, 2025 •

edited

Loading