Skip to content

Conversation

SamuelBarryCS
Copy link
Contributor

@SamuelBarryCS SamuelBarryCS commented Sep 14, 2025

What

  • Fixes sft_gemma3 example doesn't work #3957 by checking if there is image and adapting the call to the processor accordingly
  • Fixes VLM config tests that were failing (mismatch in between config.text_config.num_hidden_layers and layer_types was causing the tests to fail)

How to review

  • Read diff

Test performed

  • Issue reproduced with a small script (pushed, but will be deleted before merging). Before the fix, the script yielded [rank0]: KeyError: 'images' like in sft_gemma3 example doesn't work #3957 but after the fix it is working just fine
  • No major logic change so existing tests still passing

@SamuelBarryCS SamuelBarryCS changed the title [WIP] Fix usage of VLM using text only Fix usage of VLM using text only Sep 14, 2025
@SamuelBarryCS SamuelBarryCS marked this pull request as ready for review September 14, 2025 06:22
@@ -0,0 +1,31 @@
"""Test for issue #3957 - VLM KeyError fix"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be deleted before merging

@SamuelBarryCS
Copy link
Contributor Author

SamuelBarryCS commented Sep 14, 2025

cc @qgallouedec for review when you get the time.
Merci!

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
We could add a modified version of the script as a test in test_sft_trainer.py and remove the script

@SamuelBarryCS
Copy link
Contributor Author

SamuelBarryCS commented Sep 17, 2025

Thanks for the fix! We could add a modified version of the script as a test in test_sft_trainer.py and remove the script

Thanks for your quick reply @sergiopaniego !
Done in 6465594 :), and the test is passing fine:
image

Good to merge on my side (but it looks like we still need approval for the workflows to run)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SamuelBarryCS
Copy link
Contributor Author

@sergiopaniego do you understand why the tests are failing ? It doesn't look related to my changes ... :x
Otherwise ready to merge on my side (once this test issue is solved).

@qgallouedec
Copy link
Member

Hi, thanks for your contribution. I opted for a different approach: when the dataset does not contain images, it can be pre-processed, and there is no point in doing it on the fly.

@qgallouedec qgallouedec changed the title Fix usage of VLM using text only 👩‍🦯 Fix usage of VLM using text only Sep 23, 2025
@qgallouedec qgallouedec merged commit 9e5e60c into huggingface:main Sep 23, 2025
4 of 10 checks passed
@SamuelBarryCS
Copy link
Contributor Author

Hi, thanks for your contribution. I opted for a different approach: when the dataset does not contain images, it can be pre-processed, and there is no point in doing it on the fly.

This works as well! Thanks for the edit @qgallouedec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sft_gemma3 example doesn't work
4 participants