Skip to content

Conversation

@juliendenize
Copy link
Contributor

What does this PR do?

Right now the convert_tekken_tokenizer does not add bos_tokens, eos_token to the special tokens via the add_special_tokens method.

This prevents the chat templates that expect eos_token and bos_token to work properly.

Previously this was working as when saving the tokenizer a special_tokens_map.json was created which is no longer the case. Unknown to me why but I'd assume this is due to the V5 refactoring ?

This PR fixes that by adding explicitly these tokens to the tokenizer and when saving they're now stored in tokenizer_config.json.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

Comment on lines +118 to +119
if special_token in all_special:
tokenizer.add_special_tokens({special_key: special_token})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for backward and forward compatibility

@Rocketknight1
Copy link
Member

cc @itazap

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: ministral3, mistral3

@itazap
Copy link
Collaborator

itazap commented Dec 8, 2025

run-slow: ministral3, mistral3

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

This comment contains run-slow, running the specified jobs:

models: ["models/ministral3", "models/mistral3"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@itazap
Copy link
Collaborator

itazap commented Dec 8, 2025

Hey! Thanks for the PR, can you please share a short reproducer of the problem (you mentioned in chat templates)? perhaps we'll need to add a test !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants