Fix fp32_ln for various models #41605

remi-or · 2025-10-15T09:44:32Z

This PR fixes the test test_flash_attn_2_fp32_ln for several models:

bark was failing the test because it call _flash_attention_forward directly without checking the queries dtype, and so the test could fail if the dtype was torch.float32. To fix this we re-factored out a code block into a function get_target_dtype that takes care of infering whether to cast the fp32 tesnor to fp16 or bf16, and added a called to it before the call to FA
same for stablelm
mllama was failing the test because MllamaTextSelfAttention lacks the is_causalattribute, which was added and set to True (it's a text attention so it's causal, as discussed in Mllama fixes #39182)
same for kosmos2 but the test still fails for many many other reasons

The list of fixed test is here:

FAILED tests/models/bark/test_modeling_bark.py::BarkSemanticModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/bark/test_modeling_bark.py::BarkCoarseModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForCausalLMModelTest::test_flash_attn_2_fp32_ln - AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForConditionalGenerationModelTest::test_flash_attn_2_fp32_ln - AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_flash_attn_2_fp32_ln - RuntimeError: FlashAttention only support fp16 and bf16 data type

HuggingFaceDocBuilderDev · 2025-10-15T09:53:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-10-15T12:58:44Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bark, blt, kosmos2, mllama, stablelm

ArthurZucker

LGTM thanks

* Add is_causal to KosmosTextAttention * Move get target_dtype to be imported elsewhere * Fix fp32 flash attention bug in bark * Fix is_causal in mllama * Fix fp32 issue on StableLM * Fix repo-consistency

remi-or added 5 commits October 14, 2025 19:33

Add is_causal to KosmosTextAttention

20db1ac

Move get target_dtype to be imported elsewhere

346cbb8

Fix fp32 flash attention bug in bark

f4c0185

Fix is_causal in mllama

4aa9726

Fix fp32 issue on StableLM

4c4a388

remi-or and others added 2 commits October 15, 2025 09:55

Fix repo-consistency

fcadece

Merge branch 'main' into fix-fp32-ln

1bb5bce

remi-or requested a review from ArthurZucker October 15, 2025 14:29

ArthurZucker approved these changes Oct 16, 2025

View reviewed changes

remi-or merged commit 2935a1b into main Oct 16, 2025
23 checks passed

remi-or deleted the fix-fp32-ln branch October 16, 2025 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix fp32_ln for various models #41605

Fix fp32_ln for various models #41605

Uh oh!

remi-or commented Oct 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix fp32_ln for various models #41605

Fix fp32_ln for various models #41605

Uh oh!

Conversation

remi-or commented Oct 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants