Skip to content

Conversation

nvchenghaoz
Copy link

@coderabbitai summary

Several updates

  1. Enable the accuracy testing for nemotron-h, added tests for MMLU and gsm8k

  2. Fix two unit tests.

@nvchenghaoz nvchenghaoz requested a review from suyoggupta October 1, 2025 18:50
# For generate-only (s == 1), caches must carry prior state.
if num_prefill > 0 and slot_idx_decode.numel() > 0:
zero_rows = torch.zeros_like(conv_state_cache.index_select(0, slot_idx_decode))
conv_state_cache.index_copy_(0, slot_idx_decode, zero_rows)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we reset cache state for decode?

Comment on lines 189 to 197
slot_idx_decode = slot_idx[num_prefill:].to(torch.long)
y_dec = causal_conv1d_update(
x_decode, # [batch, dim]
conv_state_cache,
w2d,
bias,
activation=None,
cache_seqlens=None,
conv_state_indices=slot_idx[num_prefill:].to(torch.int32),
conv_state_indices=slot_idx_decode.to(torch.int32),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is two type casts here?

Comment on lines +245 to +246
# Initialize to zeros so brand-new sequences start from a clean state.
return torch.zeros(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed when we correctly index caches

@lucaslie
Copy link
Collaborator

lucaslie commented Oct 3, 2025

see NVIDIA#8133 for the accuracy test

lucaslie and others added 22 commits October 3, 2025 10:09
* [None][auto_deploy] Bamba

Signed-off-by: William Zhang <[email protected]>

* debugging export accuracy diff for bamba

Signed-off-by: Lucas Liebenwein <[email protected]>

---------

Signed-off-by: William Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
* Fix the bamba unit test

Signed-off-by: Chenghao Zhang <[email protected]>

* none: Add triton backend for ssm_transform and cuda backend for conv

Signed-off-by: Chenghao Zhang <[email protected]>

* Fully Use the TRT LLM kernels

Signed-off-by: Chenghao Zhang <[email protected]>

* Add fake version for ssm transform op

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the datatype error in fake op

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the conv test error

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the triton ssm error

Signed-off-by: Chenghao Zhang <[email protected]>

---------

Signed-off-by: Chenghao Zhang <[email protected]>
…es with better reset/sizing (#140)

Signed-off-by: Lucas Liebenwein <[email protected]>
* Fix the bamba unit test

Signed-off-by: Chenghao Zhang <[email protected]>

* none: Add triton backend for ssm_transform and cuda backend for conv

Signed-off-by: Chenghao Zhang <[email protected]>

* Fully Use the TRT LLM kernels

Signed-off-by: Chenghao Zhang <[email protected]>

* Add fake version for ssm transform op

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the datatype error in fake op

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the conv test error

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the triton ssm error

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the DemoLLM sampler mismatch

Signed-off-by: Chenghao Zhang <[email protected]>

* Update the implementation for triton/cuda kernels

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the d2d memcpy for decode

Signed-off-by: Chenghao Zhang <[email protected]>

* Revert the generator and remove the redundant code

Signed-off-by: Chenghao Zhang <[email protected]>

---------

Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Suyog Gupta <[email protected]>
Co-authored-by: Suyog Gupta <[email protected]>
* [None][feat] Add patches for NemotronH

Signed-off-by: William Zhang <[email protected]>

* [None][test] unittest for nemotron_h

Signed-off-by: William Zhang <[email protected]>

* nemotron-h support finished

Signed-off-by: Lucas Liebenwein <[email protected]>

* added anticapted path for new models on llm_models trt-llm CI

Signed-off-by: Lucas Liebenwein <[email protected]>

---------

Signed-off-by: William Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
@nvchenghaoz nvchenghaoz force-pushed the chenghao/fix-causal-conv branch from 1199afe to fe22f1c Compare October 3, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants