Fix the unit test errors / enable accuracy tests #150

nvchenghaoz · 2025-10-01T18:50:49Z

Several updates

Enable the accuracy testing for nemotron-h, added tests for MMLU and gsm8k
Fix two unit tests.

tensorrt_llm/_torch/auto_deploy/transform/library/kvcache.py

lucaslie · 2025-10-01T19:00:03Z

tensorrt_llm/_torch/auto_deploy/custom_ops/cuda_backend_causal_conv.py

+        # For generate-only (s == 1), caches must carry prior state.
+        if num_prefill > 0 and slot_idx_decode.numel() > 0:
+            zero_rows = torch.zeros_like(conv_state_cache.index_select(0, slot_idx_decode))
+            conv_state_cache.index_copy_(0, slot_idx_decode, zero_rows)


why would we reset cache state for decode?

lucaslie · 2025-10-02T22:19:02Z

tensorrt_llm/_torch/auto_deploy/custom_ops/cuda_backend_causal_conv.py

+        slot_idx_decode = slot_idx[num_prefill:].to(torch.long)
        y_dec = causal_conv1d_update(
            x_decode,  # [batch, dim]
            conv_state_cache,
            w2d,
            bias,
            activation=None,
            cache_seqlens=None,
-            conv_state_indices=slot_idx[num_prefill:].to(torch.int32),
+            conv_state_indices=slot_idx_decode.to(torch.int32),


there is two type casts here?

lucaslie · 2025-10-02T22:19:41Z

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_backend_mamba.py

+            # Initialize to zeros so brand-new sequences start from a clean state.
+            return torch.zeros(


nit: not needed when we correctly index caches

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_backend_mamba.py

lucaslie · 2025-10-03T15:12:13Z

see NVIDIA#8133 for the accuracy test

* [None][auto_deploy] Bamba Signed-off-by: William Zhang <[email protected]> * debugging export accuracy diff for bamba Signed-off-by: Lucas Liebenwein <[email protected]> --------- Signed-off-by: William Zhang <[email protected]> Signed-off-by: Lucas Liebenwein <[email protected]> Co-authored-by: William Zhang <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the bamba unit test Signed-off-by: Chenghao Zhang <[email protected]> * none: Add triton backend for ssm_transform and cuda backend for conv Signed-off-by: Chenghao Zhang <[email protected]> * Fully Use the TRT LLM kernels Signed-off-by: Chenghao Zhang <[email protected]> * Add fake version for ssm transform op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the datatype error in fake op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the conv test error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the triton ssm error Signed-off-by: Chenghao Zhang <[email protected]> --------- Signed-off-by: Chenghao Zhang <[email protected]>

…es with better reset/sizing (#140) Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

* Fix the bamba unit test Signed-off-by: Chenghao Zhang <[email protected]> * none: Add triton backend for ssm_transform and cuda backend for conv Signed-off-by: Chenghao Zhang <[email protected]> * Fully Use the TRT LLM kernels Signed-off-by: Chenghao Zhang <[email protected]> * Add fake version for ssm transform op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the datatype error in fake op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the conv test error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the triton ssm error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the DemoLLM sampler mismatch Signed-off-by: Chenghao Zhang <[email protected]> * Update the implementation for triton/cuda kernels Signed-off-by: Chenghao Zhang <[email protected]> * Fix the d2d memcpy for decode Signed-off-by: Chenghao Zhang <[email protected]> * Revert the generator and remove the redundant code Signed-off-by: Chenghao Zhang <[email protected]> --------- Signed-off-by: Chenghao Zhang <[email protected]> Signed-off-by: Suyog Gupta <[email protected]> Co-authored-by: Suyog Gupta <[email protected]>

* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <[email protected]> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <[email protected]> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <[email protected]> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <[email protected]> --------- Signed-off-by: William Zhang <[email protected]> Signed-off-by: Lucas Liebenwein <[email protected]> Co-authored-by: William Zhang <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Chenghao Zhang <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

This reverts commit 67ee3d8.

Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz requested a review from suyoggupta October 1, 2025 18:50

lucaslie reviewed Oct 1, 2025

View reviewed changes

lucaslie reviewed Oct 2, 2025

View reviewed changes

lucaslie and others added 22 commits October 3, 2025 10:09

torch ssm and causal conv support (#134)

928d3a7

Signed-off-by: Lucas Liebenwein <[email protected]>

Fix the bamba unit test (#136)

73a8db9

Signed-off-by: Chenghao Zhang <[email protected]>

[https://nvbugs/5527956][fix] AutoDeploy: fix metadata to device copi…

507c988

…es with better reset/sizing (#140) Signed-off-by: Lucas Liebenwein <[email protected]>

small bamba test for parallel execution (#142)

0c7a7ec

Signed-off-by: Lucas Liebenwein <[email protected]>

fix overflow expression for number of pages in sequence interface (#144)

631d051

Signed-off-by: Lucas Liebenwein <[email protected]>

waive mamba tests (#149)

de2ce2d

Signed-off-by: Lucas Liebenwein <[email protected]>

Fix the causal conv error

e44be92

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the triton kernel test error

e7c44df

Signed-off-by: Chenghao Zhang <[email protected]>

Add Nemotron-h acc test

3e6ed8a

Signed-off-by: Chenghao Zhang <[email protected]>

Add gsm8k testing

e26846f

Signed-off-by: Chenghao Zhang <[email protected]>

resolve rebase issue

dabd3d7

Signed-off-by: Chenghao Zhang <[email protected]>

Pass the torch-compile and torch-simple

fdee78a

Signed-off-by: Chenghao Zhang <[email protected]>

Disable the test as the golden data is wrong

ffc646d

Signed-off-by: Chenghao Zhang <[email protected]>

fill seq info data with valid dummy data

cd6ed8b

Signed-off-by: Lucas Liebenwein <[email protected]>

Revert "fill seq info data with valid dummy data"

ac1670c

This reverts commit 67ee3d8.

fill seq info data with valid dummy data

8ad51fe

Signed-off-by: Lucas Liebenwein <[email protected]>

Update the test setup

af04130

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the redundant type conversion

fe22f1c

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz force-pushed the chenghao/fix-causal-conv branch from 1199afe to fe22f1c Compare October 3, 2025 17:35

nvchenghaoz added 2 commits October 3, 2025 10:43

Resolve the rebase issue

5644504

Signed-off-by: Chenghao Zhang <[email protected]>

Merge branch 'main' into chenghao/fix-causal-conv

e719b2c

Signed-off-by: Chenghao Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the unit test errors / enable accuracy tests #150

Fix the unit test errors / enable accuracy tests #150

Uh oh!

nvchenghaoz commented Oct 1, 2025

Uh oh!

Uh oh!

lucaslie Oct 1, 2025

Uh oh!

lucaslie Oct 2, 2025

Uh oh!

lucaslie Oct 2, 2025

Uh oh!

Uh oh!

lucaslie commented Oct 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Initialize to zeros so brand-new sequences start from a clean state.
		return torch.zeros(

Fix the unit test errors / enable accuracy tests #150

Are you sure you want to change the base?

Fix the unit test errors / enable accuracy tests #150

Uh oh!

Conversation

nvchenghaoz commented Oct 1, 2025

Uh oh!

Uh oh!

lucaslie Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

lucaslie Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

lucaslie Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaslie commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucaslie commented Oct 3, 2025 •

edited

Loading