Skip to content

Conversation

@remi-or
Copy link
Collaborator

@remi-or remi-or commented Oct 15, 2025

This PR fixes the test test_flash_attn_2_fp32_ln for several models:

  • bark was failing the test because it call _flash_attention_forward directly without checking the queries dtype, and so the test could fail if the dtype was torch.float32. To fix this we re-factored out a code block into a function get_target_dtype that takes care of infering whether to cast the fp32 tesnor to fp16 or bf16, and added a called to it before the call to FA
  • same for stablelm
  • mllama was failing the test because MllamaTextSelfAttention lacks the is_causalattribute, which was added and set to True (it's a text attention so it's causal, as discussed in Mllama fixes #39182)
  • same for kosmos2 but the test still fails for many many other reasons

The list of fixed test is here:

FAILED tests/models/bark/test_modeling_bark.py::BarkSemanticModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/bark/test_modeling_bark.py::BarkCoarseModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForCausalLMModelTest::test_flash_attn_2_fp32_ln - AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForConditionalGenerationModelTest::test_flash_attn_2_fp32_ln - AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bark, blt, kosmos2, mllama, stablelm

@remi-or remi-or requested a review from ArthurZucker October 15, 2025 14:29
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@remi-or remi-or merged commit 2935a1b into main Oct 16, 2025
23 checks passed
@remi-or remi-or deleted the fix-fp32-ln branch October 16, 2025 12:18
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* Add is_causal to KosmosTextAttention

* Move get target_dtype to be imported elsewhere

* Fix fp32 flash attention bug in bark

* Fix is_causal in mllama

* Fix fp32 issue on StableLM

* Fix repo-consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants