-
Notifications
You must be signed in to change notification settings - Fork 0
Fix the unit test errors / enable accuracy tests #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# For generate-only (s == 1), caches must carry prior state. | ||
if num_prefill > 0 and slot_idx_decode.numel() > 0: | ||
zero_rows = torch.zeros_like(conv_state_cache.index_select(0, slot_idx_decode)) | ||
conv_state_cache.index_copy_(0, slot_idx_decode, zero_rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we reset cache state for decode?
slot_idx_decode = slot_idx[num_prefill:].to(torch.long) | ||
y_dec = causal_conv1d_update( | ||
x_decode, # [batch, dim] | ||
conv_state_cache, | ||
w2d, | ||
bias, | ||
activation=None, | ||
cache_seqlens=None, | ||
conv_state_indices=slot_idx[num_prefill:].to(torch.int32), | ||
conv_state_indices=slot_idx_decode.to(torch.int32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is two type casts here?
# Initialize to zeros so brand-new sequences start from a clean state. | ||
return torch.zeros( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not needed when we correctly index caches
see NVIDIA#8133 for the accuracy test |
* [None][auto_deploy] Bamba Signed-off-by: William Zhang <[email protected]> * debugging export accuracy diff for bamba Signed-off-by: Lucas Liebenwein <[email protected]> --------- Signed-off-by: William Zhang <[email protected]> Signed-off-by: Lucas Liebenwein <[email protected]> Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
* Fix the bamba unit test Signed-off-by: Chenghao Zhang <[email protected]> * none: Add triton backend for ssm_transform and cuda backend for conv Signed-off-by: Chenghao Zhang <[email protected]> * Fully Use the TRT LLM kernels Signed-off-by: Chenghao Zhang <[email protected]> * Add fake version for ssm transform op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the datatype error in fake op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the conv test error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the triton ssm error Signed-off-by: Chenghao Zhang <[email protected]> --------- Signed-off-by: Chenghao Zhang <[email protected]>
…es with better reset/sizing (#140) Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
* Fix the bamba unit test Signed-off-by: Chenghao Zhang <[email protected]> * none: Add triton backend for ssm_transform and cuda backend for conv Signed-off-by: Chenghao Zhang <[email protected]> * Fully Use the TRT LLM kernels Signed-off-by: Chenghao Zhang <[email protected]> * Add fake version for ssm transform op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the datatype error in fake op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the conv test error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the triton ssm error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the DemoLLM sampler mismatch Signed-off-by: Chenghao Zhang <[email protected]> * Update the implementation for triton/cuda kernels Signed-off-by: Chenghao Zhang <[email protected]> * Fix the d2d memcpy for decode Signed-off-by: Chenghao Zhang <[email protected]> * Revert the generator and remove the redundant code Signed-off-by: Chenghao Zhang <[email protected]> --------- Signed-off-by: Chenghao Zhang <[email protected]> Signed-off-by: Suyog Gupta <[email protected]> Co-authored-by: Suyog Gupta <[email protected]>
* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <[email protected]> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <[email protected]> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <[email protected]> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <[email protected]> --------- Signed-off-by: William Zhang <[email protected]> Signed-off-by: Lucas Liebenwein <[email protected]> Co-authored-by: William Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
This reverts commit 67ee3d8.
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
1199afe
to
fe22f1c
Compare
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>
@coderabbitai summary
Several updates
Enable the accuracy testing for nemotron-h, added tests for MMLU and gsm8k
Fix two unit tests.