[TRTLLM-4880, TRTLLM-4595] Add soft logit capping in custom kernel and flashinfer #62

nvchenghaoz · 2025-06-16T21:02:35Z

Description

As title, this PR adds the support for soft logitcapping introduced in the Gemma 2 paper. The paper - https://arxiv.org/pdf/2408.00118, the soft logitcapping is the calculation that applies before the softmax op. And the math is
: LOGIT_CAP * tanh(attn / LOGIT_CAP)

Test Coverage

Added two new tests to test the soft logit capping. test_gqa_op_with_logit_cap and test_flashinfer_attention_op_context_with_logit_cap

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz added 3 commits June 11, 2025 23:18

Add the softcap to the triton kernel

3aa6ae8

Signed-off-by: Chenghao Zhang <[email protected]>

Add the softcap to flashinfer attention and add tests.

49e626b

Signed-off-by: Chenghao Zhang <[email protected]>

Minor change - Add log

0a3ec4c

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz requested review from lucaslie and suyoggupta June 16, 2025 21:02

nvchenghaoz self-assigned this Jun 16, 2025

nvchenghaoz changed the base branch from main to feat/ad-2025-06-24 June 24, 2025 23:41

Merge branch 'feat/ad-2025-06-24' into chenghao/softcap

a2971d5

lucaslie approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'feat/ad-2025-06-24' into chenghao/softcap

8cb7afe

nvchenghaoz enabled auto-merge (squash) June 27, 2025 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-4880, TRTLLM-4595] Add soft logit capping in custom kernel and flashinfer #62

[TRTLLM-4880, TRTLLM-4595] Add soft logit capping in custom kernel and flashinfer #62

Uh oh!

nvchenghaoz commented Jun 16, 2025

Uh oh!

Uh oh!

[TRTLLM-4880, TRTLLM-4595] Add soft logit capping in custom kernel and flashinfer #62

Are you sure you want to change the base?

[TRTLLM-4880, TRTLLM-4595] Add soft logit capping in custom kernel and flashinfer #62

Uh oh!

Conversation

nvchenghaoz commented Jun 16, 2025

Description

Test Coverage

Uh oh!

Uh oh!