Skip to content

Conversation

manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Aug 18, 2025

Coming from #39799, adding Ministral model to support its interleaved attention.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Aug 20, 2025

still have to check the slow tests, but rest is ready @ArthurZucker

@manueldeprada manueldeprada marked this pull request as ready for review August 20, 2025 17:16
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@manueldeprada manueldeprada removed the request for review from Rocketknight1 August 21, 2025 13:27
@huggingface huggingface deleted a comment from github-actions bot Aug 21, 2025
@manueldeprada
Copy link
Contributor Author

run-slow: ministral

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/ministral']
quantizations: [] ...

@manueldeprada
Copy link
Contributor Author

run-slow: ministral

@huggingface huggingface deleted a comment from github-actions bot Aug 22, 2025
@manueldeprada
Copy link
Contributor Author

@ArthurZucker tests are ready. One note: after this merges, if i open a pr in mistralai/Ministral-8B-Instruct-2410 to change the architecture to MinistralForCausalLM, older transformers versions won’t load the model. Are we ok with this? Is there a preferred way of handling these situations? maybe a warning in the readme?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!
Regarding BC, we are kinda obliged to break with a proper ⚠️ 🔴 on PR title should be alirght

@manueldeprada manueldeprada changed the title Add ministral model ⚠️ 🔴 Add ministral model Aug 25, 2025
@manueldeprada
Copy link
Contributor Author

run-slow: auto, ministral

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/auto', 'models/ministral']
quantizations: [] ...

@ArthurZucker
Copy link
Collaborator

maybe a warning in the readme?

Yep we can have that!
Older version will load Mistral instead but should be fine it's not breaking

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Aug 25, 2025

Yep we can have that! Older version will load Mistral instead but should be fine it's not breaking

It is breaking: once we change mistralai/Ministral-8B-Instruct-2410/config.json to be

{
"architectures": [
"MiNIstralForCausalLM"
], ...

old versions will not recognize that architecture. Thats why I was worried 😅

@ArthurZucker
Copy link
Collaborator

No but you don't have to change them online IMO

@ArthurZucker
Copy link
Collaborator

What I meant by not breaking is that the old mistral does not support sliding non sliding so we are fixing something, the best way IMO is not to update the old model but to have some kind of filter for the models on the hub if possible!

@ArthurZucker
Copy link
Collaborator

Adding a layer type attribute will not break older version, and we can check if model type is mistral but it has layer_type -> we map to ministral

@ArthurZucker
Copy link
Collaborator

instead of chaging the model type

@manueldeprada
Copy link
Contributor Author

great, I get it now! 🙏 will ping you back

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ministral, mistral

@manueldeprada
Copy link
Contributor Author

@ArthurZucker should be ready now! Let me know if it is the correct place to map mistral -> ministral

Also, if the user explicitly instantiates a Mistral class, I added a warning telling them that they might want to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants