[TRTLLM-4789] Support logit softcapping during the graph import and optimization #65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR enables the full support for logit softcapping. The main change includes:
Test Coverage
Flash Infer Output
2025-06-24 22:08:52,873 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_256_head_dim_vo_256_posenc_0_use_swa_False_use_logits_cap_True_f16qk_False
2025-06-24 22:08:52,901 - INFO - flashinfer.jit: Finished loading JIT ops: page
2025-06-24 22:08:52,934 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_256_head_dim_vo_256_posenc_0_use_swa_False_use_logits_cap_True_f16qk_False
2025-06-24 22:09:12,674 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_256_head_dim_vo_256_posenc_0_use_swa_False_use_logits_cap_True_f16qk_False
2025-06-24 22:09:12,726 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_256_head_dim_vo_256_posenc_0_use_swa_False_use_logits_cap_True_f16qk_False
Processed requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:55<00:00, 27.74s/it]
[06/24/2025-22:09:16] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 0] How big is the universe? :
We don't know!
Here's why:
Triton Output
Processed requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.82s/it]
[06/25/2025-13:28:59] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 0] How big is the universe? :
There's no easy answer to this, because we can't directly observe the edge of the universe. Here's what we do know:
[06/25/2025-13:28:59] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 1] In simple words and in a single sentence, explain the concept of gravity: :
Gravity is the force that pulls objects towards each other.
You can also elaborate on that statement and explain different aspects:
Gravity's pull gets weaker the further away an object is from a massive object, like Earth. That's why the Moon orbits the Earth, and why a feather falls to the ground while a bowling ball stays put.
Here are some potential titles for a paragraph explaining gravity: