-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Closed
Labels
Description
System Info
When training with the GRPO trainer with flash-attention2 or any other mechanism other than eager you get a warning that can be quite confusing. The warning comes from here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L657-L661
However, looking at the code and related issues #31997 (comment) https://huggingface.co/google/gemma-2-9b-it/discussions/9#66923856e338220687e4fe1e it looks like this issue doesn't really exist anymore fore Gemma3 with the latest flash attention package. Could we remove this warning?
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
run the grpo trl trainer with any gemma3 model.
Expected behavior
not see a warning when using flash-attention2
konstantinjdobler