Skip to content

Gemma3 with flash-attention2 outputs warning #40723

@tolgadur

Description

@tolgadur

System Info

When training with the GRPO trainer with flash-attention2 or any other mechanism other than eager you get a warning that can be quite confusing. The warning comes from here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L657-L661

However, looking at the code and related issues #31997 (comment) https://huggingface.co/google/gemma-2-9b-it/discussions/9#66923856e338220687e4fe1e it looks like this issue doesn't really exist anymore fore Gemma3 with the latest flash attention package. Could we remove this warning?

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

run the grpo trl trainer with any gemma3 model.

Expected behavior

not see a warning when using flash-attention2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions