Add Devstral Small 1.1 #2468

omkar-334 · 2025-12-09T12:09:02Z

Description of the change

Added presets for Devstral Small 1.1

Reference

Github Issue - #2333
Model HF - https://huggingface.co/mistralai/Devstral-Small-2507
The Devstral HF contains only tekken.json but we need tokenizers.json format. There are 2 solutions for this -

Use the conversion script from huggingface/transformers - This converts the tekken.json and loads Autotokenizer
Use tokenizer from mistralai/Mistral-Small-24B-Base-2501 - referencing this issue from unsloth, Since Devstral is just finetuned, we can use the earlier model to obtain tokenizer (yes, this model has a tokenizers.json format)

I've gone ahead with the Option 2 and implemented it in the code.
I've updated presets in mistral_presets.py, convert_mistral.py, and convert_mistral_checkpoints.py.

Colab Notebook

I could not load it in colab since the model is 24B and runtime is crashing, but i will try it in Modal/Lambda and attach the results here.

Doubts

This is the kaggle link for the model - https://www.kaggle.com/models/mistral-ai/devstral-small-2507, but I'm unsure on the kaggle_handle format
cc @sachinprasadhs let me know if there any changes to make

gemini-code-assist · 2025-12-09T12:09:20Z

Summary of Changes

Hello @omkar-334, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Devstral Small 1.1 model into the system by adding its configuration and adapting the tokenizer conversion process. The primary challenge addressed was the absence of a tokenizers.json file for the Devstral model, which was resolved by leveraging the tokenizer from a compatible Mistral base model. This ensures the new model can be properly utilized within the existing framework.

Highlights

New Model Integration: Presets for the Devstral Small 1.1 model have been added, which is a 24B parameter model finetuned from Mistral-Small-3.1.
Tokenizer Resolution: A specific solution was implemented to use the tokenizer from "mistralai/Mistral-Small-24B-Base-2501" for Devstral Small 1.1, addressing the issue where the original Devstral HF model lacked a tokenizers.json file.
Configuration Updates: Relevant configuration files, including mistral_presets.py, convert_mistral.py, and convert_mistral_checkpoints.py, were updated to support the new model and its unique tokenizer loading requirements.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds presets for the Devstral Small 1.1 model. The changes correctly handle the tokenizer issue by using the tokenizer from a compatible base model. However, there's a minor typo in the model description within the preset file. Additionally, the code in convert_mistral.py and convert_mistral_checkpoints.py for handling the special case of the 'devstral' model can be improved by using a more robust check and avoiding hardcoded strings to enhance maintainability and readability. I've provided suggestions to address these points.

keras_hub/src/models/mistral/mistral_presets.py

keras_hub/src/utils/transformers/convert_mistral.py

tools/checkpoint_conversion/convert_mistral_checkpoints.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

sachinprasadhs

Thanks for the PR, please attach screenshots matching numerics, parameter count, tokenizer matching and output matching.

sachinprasadhs · 2025-12-09T19:15:59Z

tools/checkpoint_conversion/convert_mistral_checkpoints.py

+
+        if preset == "devstral_small_1_1":
+            hf_tokenizer = AutoTokenizer.from_pretrained(
+                "mistralai/Mistral-Small-24B-Base-2501"
+            )
+        else:
+            hf_tokenizer = AutoTokenizer.from_pretrained(hf_preset)


Can't we use tekken.json since they have mentioned Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

we would need to add a dependancy on https://github.com/mistralai/mistral-common since transformers Autotokenizer does not support tekken.json

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

If adding dependency is required by observing other models of mistral, if they only have tekken.json like this model, then we can think of adding dependency.

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

I think they are including tokenizer.json so that people can continue using them until frameworks support tekken.json

This is the current state of their tokenizer formats for newer models -

mistralai/Devstral-Small-2507 - tekken.json (Add Devstral Small 1.1 #2333)

mistralai/Devstral-Small-2-24B-Instruct-2512 - tekken.json, tokenizer.json

mistralai/Mistral-Small-24B-Base-2501 - tekken.json, tokenizer.json

mistralai/Mistral-Small-3.1-24B-Base-2503 - tekken.json, tokenizer.json (Add Mistral-Small-3.1 #2334)

mistralai/Ministral-3-8B-Base-2512 - tekken.json, tokenizer.json

mistralai/Magistral-Small-2509 - tekken.json (Add Magistral to Keras-Hub #2314)

mistralai/Voxtral-Mini-3B-2507 - tekken.json (Add Voxtral #2349)

Older Models -

All of the mistral and mixtral models that are implemented in keras-hub include tokenizer.model and tokenizer.json.

Hence, the keras-hub implementation loads the tokenizer using tokenizer.model file format.

My earlier changes do not work since we don't use the tokenizer.json format.
Going forward, we need to use tekken.json

transformers has started supporting the tekken tokenizer and has used the mistral-common as its backend for the Mistral models. (https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/tokenization_mistral_common.py)

Great, Thanks for putting all the details here. if Autotokenizer is supporting tekken.json loading, we can handle with if else condition.

yes, we can do that for hf_tokenizer, but how to support it in Keras-hub? Should we write a NewMistralTokenizer for tekken.json based models?

I think we can add the underlying TikTokenizer implementation to Keras Hub here https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/tokenizers unless NewMistralTokenizer is different than the TikTokenizer.

And an option to use base TikTokenizer in the Mistral model here https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/mistral/mistral_tokenizer.py.

Okay, I'll work on that and update this. Till then I'll mark this PR as draft.
thanks !

omkar-334 · 2025-12-11T11:13:00Z

/gemini review

gemini-code-assist

Code Review

This pull request adds presets for the Devstral Small 1.1 model. A significant part of the change is the introduction of a new TiktokenTokenizer to handle tekken.json based tokenizers. However, there seems to be a contradiction between the implementation and the stated goal. The checkpoint conversion script is configured to use a tokenizer.model from a different model (mistralai/Mistral-Small-24B-Base-2501) for devstral_small_1_1, which means the newly added TiktokenTokenizer and related logic for handling tekken.json will not be used for this preset. Could you please clarify if the TiktokenTokenizer is intended for future use, or if the conversion script should be updated to use it?

Beyond this high-level point, I've found a few issues, including a critical bug in the tokenizer conversion logic and several style guide violations in the new tiktoken_tokenizer.py file. Please see the detailed comments below.

gemini-code-assist · 2025-12-11T11:15:47Z

keras_hub/src/utils/tensor_utils.py

+    # undo this in final commit
+    if tf is None:


This change appears to be a temporary workaround, as indicated by the comment # undo this in final commit. This must be reverted before merging to ensure all required dependencies are checked correctly.

Suggested change

# undo this in final commit

if tf is None:

if tf_text is None or tf is None:

keras_hub/src/utils/transformers/convert_mistral.py

keras_hub/src/models/mistral/mistral_tokenizer.py

gemini-code-assist · 2025-12-11T11:15:47Z

keras_hub/src/models/mistral/mistral_tokenizer.py

+    """
+    Tekken-based tokenizer for Mistral models.
+
+    Responsibilities:
+      • Add required Mistral special tokens (<s>, </s>, pad)
+      • Delegate tekken.json parsing to TiktokenTokenizer
+      • Use Tiktoken backend via TiktokenTokenizer normalisation
+    """


The docstring for NewMistralTokenizer is missing the Args and Examples sections, which are required by the style guide (lines 168-184, 367-371). Please add them to improve documentation and usability.

References

Docstrings should include comprehensive examples and document all parameters. ^(link)

keras_hub/src/tokenizers/tiktoken_tokenizer.py

gemini-code-assist · 2025-12-11T11:15:48Z

keras_hub/src/tokenizers/tiktoken_tokenizer.py

+    """
+    Format-agnostic tiktoken tokenizer with Tekken support.
+
+    The tokenizer can consume:
+      • A normalized proto dict with keys:
+        {
+            "pattern": str,
+            "mergeable_ranks": Dict[base64|bytes → rank],
+            "special_tokens": list[{token_str, rank}] (optional),
+            "special_lookup": Dict[token_str → rank] (optional),
+            "num_special_tokens": int,
+            "vocab_size": int,
+            "raw_json": original JSON (optional)
+        }
+      • A Tekken JSON (path/bytes/dict). It will be normalized internally.
+    """


The docstring for TiktokenTokenizer is missing the Args and Examples sections, which are required by the style guide (lines 529-530). Please add them to document the initializer parameters and show usage.

References

Document thoroughly including call behavior through a class level docstring. Include usage examples. ^(link)

tools/checkpoint_conversion/convert_mistral_checkpoints.py

omkar-334 added 4 commits December 9, 2025 16:12

add devstral in preset_map

61cbf9f

changes for devstral tokenizer

abf1276

edit hf tokenizer for devstral

64107d0

add preset

70a8bc8

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

omkar-334 and others added 4 commits December 9, 2025 17:54

linting fixes

59df4e4

Update keras_hub/src/utils/transformers/convert_mistral.py

1325743

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update tools/checkpoint_conversion/convert_mistral_checkpoints.py

71d1b9f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix

0ee78e2

sachinprasadhs self-requested a review December 9, 2025 19:04

sachinprasadhs reviewed Dec 9, 2025

View reviewed changes

omkar-334 marked this pull request as draft December 11, 2025 05:03

add tiktoken tokenizer for mistral

5fe3b35

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

omkar-334 added 4 commits December 11, 2025 16:55

fixes (need to test this script)

9b1be1e

fixes

1e071d1

fix rope_theta

e0985f5

try

9792af1

	# undo this in final commit
	if tf is None:
	if tf_text is None or tf is None:

Add Devstral Small 1.1 #2468

Are you sure you want to change the base?

Add Devstral Small 1.1 #2468

Uh oh!

Conversation

omkar-334 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Reference

Colab Notebook

Doubts

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sachinprasadhs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omkar-334 Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omkar-334 commented Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omkar-334 commented Dec 9, 2025 •

edited

Loading

omkar-334 Dec 10, 2025 •

edited

Loading