Skip to content

Conversation

nvchenghaoz
Copy link

Description

As title, this PR adds the support for soft logitcapping introduced in the Gemma 2 paper. The paper - https://arxiv.org/pdf/2408.00118, the soft logitcapping is the calculation that applies before the softmax op. And the math is
: LOGIT_CAP * tanh(attn / LOGIT_CAP)

Test Coverage

Added two new tests to test the soft logit capping. test_gqa_op_with_logit_cap and test_flashinfer_attention_op_context_with_logit_cap

@nvchenghaoz nvchenghaoz self-assigned this Jun 16, 2025
@nvchenghaoz nvchenghaoz changed the base branch from main to feat/ad-2025-06-24 June 24, 2025 23:41
@nvchenghaoz nvchenghaoz enabled auto-merge (squash) June 27, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants