Skip to content

Conversation

@aditi-dsi
Copy link

This PR enables dynamic loading of different VLMs based on the --model argument.

  • Gemma models use the native Gemma3ForConditionalGeneration class.
  • Non-Gemma VLMs fallback to appropriate Hugging Face model classes dynamically.
  • We can add more models & model_classes as required to model_class_map to extend support for testing other VLMs.

Additionally, added a minor fix in train.py:

  • fixed project_name to fallback to None when no project name is given (which otherwise interrupts the training with an error).

@aditi-dsi
Copy link
Author

@sergiopaniego this PR is ready for review.

@ariG23498
Copy link
Owner

I think AutoModelForImageTextToText should entail everything. Could you give it a try?

@aditi-dsi
Copy link
Author

Sure, taking a look, will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants