Mllama fixes #39182

remi-or · 2025-07-02T16:33:37Z

This PR adds the is_causal attribute to some Attention modules in mllama and disables FA2 for the MllamaVisionModel .
Some tests used to fail when is_causal was missing: MllamaForCausalLMModelTest::test_flash_attn_2_fp32_ln, MllamaForConditionalGenerationModelTest::test_eager_matches_fa2_generate, ...and when it was added FA2 failed for these tests, on both Mi355 and A100. This is because the vision model uses a 4D attn mask which is not supported by FA2.
Not an expert in VLM so can you chek is_causal is right @zucchini-nlp please?

I also added Expectations for AMD MI355. After these changes, we go from 15 failed, 185 passed, 94 skipped to 196 passed, 98 skipped on AMD MI355.

HuggingFaceDocBuilderDev · 2025-07-02T16:47:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-07-02T21:33:11Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mllama

zucchini-nlp · 2025-07-07T04:13:01Z

src/transformers/models/mllama/modeling_mllama.py

 class MllamaVisionModel(MllamaPreTrainedModel):
    config_class = MllamaVisionConfig
    base_model_prefix = "vision_model"
+    _supports_flash_attn_2 = False  # the vision model always adds a 4D attn mask which is not supported by FA2


Nice catch! IMO we can still run FA2 with vision module but we need to prepare the mask correctly. In text models usually for FA2, we keep the 2D mask and don't expand it to 4D.

We can do similar thing in Mllama and skip the Reshape to 2D and create 4D attention mask part in case of FA2. We might need to check cross attention as well, which also uses 4D mask

zucchini-nlp · 2025-07-07T04:15:52Z

src/transformers/models/mllama/modeling_mllama.py

 class MllamaModel(MllamaPreTrainedModel):
    _checkpoint_conversion_mapping = {"language_model.model": "language_model"}
    _supports_quantized_cache = False  # quant cache not supported in encoder-decoder setting
+    _supports_flash_attn_2 = False  # the vision model does not support FA2


_supports_flash_attn_2 = False should be defined only on the module that doesn't support it, i.e. on MllamaVisionModel. I guess it's the issue with tests is that they don't check all submodules for _supports_flash_attn_2

Ok, I was wondering about that. I will check it out and revert the change

remi-or added 3 commits July 2, 2025 08:50

Fixed some mmlama text

9afa605

Style

c253ccc

Re-apply cuda changes

b052ae1

remi-or requested a review from zucchini-nlp July 2, 2025 16:33

Merge branch 'main' into mllama-fixes

9da67e4

zucchini-nlp reviewed Jul 7, 2025

View reviewed changes

remi-or mentioned this pull request Oct 15, 2025

Fix fp32_ln for various models #41605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mllama fixes #39182

Mllama fixes #39182

Uh oh!

remi-or commented Jul 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 2, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

zucchini-nlp Jul 7, 2025

Uh oh!

zucchini-nlp Jul 7, 2025

Uh oh!

remi-or Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mllama fixes #39182

Are you sure you want to change the base?

Mllama fixes #39182

Uh oh!

Conversation

remi-or commented Jul 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 2, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

zucchini-nlp Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

remi-or Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants