-
Notifications
You must be signed in to change notification settings - Fork 421
Checkpoint conversion utility: gpt-oss orbax scan to hf, many-to-one transform, inhomogeneous scan block #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fb84898 to
86e7885
Compare
|
🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This pull request introduces significant enhancements to the checkpoint conversion utility to support gpt-oss models. The addition of many-to-one parameter mapping and support for inhomogeneous scan blocks are well-implemented and make the tool more flexible for future models. The overall code quality is high, with clear logic and necessary updates to configurations and mappings.
🔍 General Feedback
- The refactoring in
to_huggingface.pyandutils.pyto support the new mapping features is excellent and improves robustness. - The use of
NotImplementedErrorfor features planned as future work is a good practice. - I've identified one potential bug where the logic for detecting unscanned MoE layers might fail with the new tuple-based keys for many-to-one mappings. Please see the inline comment.
RissyRan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work! LGTM at high level, just a few comments.
84590a4 to
09bd80d
Compare
|
🤖 Hi @shuningjin, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This Pull Request introduces support for GPT-OSS models in the checkpoint conversion utility, addressing complex scenarios like many-to-one parameter mappings and inhomogeneous scan blocks. The changes are well-structured and include robust validation for parameter keys.
🔍 General Feedback
- The new
_check_param_map_keysfunction significantly improves the robustness of parameter mapping validation. - The
process_maxtext_paramfunction (formerlyprocess_leaf_param) has been refactored effectively to handle various mapping complexities. - Comprehensive docstrings and comments enhance the maintainability of the new and modified code.
- Consider adding tracking issues or TODOs for the
NotImplementedErrorcases to ensure future support for unscanned layers and reverse conversions.
09bd80d to
154cf5b
Compare
52a1246 to
2fdb878
Compare
…transform, inhomogeneous scan block
2fdb878 to
8e086db
Compare
Description
The goal is to support gpt-oss in the checkpoint utility tool. However, two features are previously missing from the utility, which need to be implemented first
1 Support many-to-one transform
gpt-oss has two maxtext params mapped to one huggingface key
To implement this many-to-one mapping
param_mapping.py: previousmt: hf, now also has tuple as key(mt1, mt2): hfto_huggingface.py: loop over param_map instead leaves (pre-check coverage), when key is tuple, collect weights into a listutils.py-process_maxtext_param, previously handle key is str + weight is single array, now also handle key tuple + list weightsother models can follow the structure similarly, e.g.,
2 Support inhomogeneous scan block
gpt-oss has inhomogenous cycle interval = 2, both 20b and 120b: sliding attention->full attention
param_mapping.pylayer_cycle_intervalargument to allMAXTEXT_TO_HF_PARAM_MAPPINGandMAXTEXT_TO_HF_PARAM_HOOK_FNGPT_OSS_MAXTEXT_TO_HF_PARAM_MAPPINGandGPT_OSS_TO_HF_PARAM_HOOK_FNother models can follow the structure similarly, e.g.,
3 Add gpt-oss: orbax (scan) to hf
Future work
Tests
run orbax scan -> hf, and forward logit check
gpt-oss-20b
https://paste.googleplex.com/6054915302621184
/home/shuningjin/gpt-oss-20b/gpt-oss-20b-hf-2025-11-13-08-11-24
https://paste.googleplex.com/4806878806802432
gpt-oss-120b
https://paste.googleplex.com/6024552484306944
/home/shuningjin/gpt-oss-120b/gpt-oss-120b-hf-2025-11-10-11-55-23
https://paste.googleplex.com/5431988781711360
check other models just in case
qwen3-4b
https://paste.googleplex.com/4788178485641216
https://paste.googleplex.com/5246501760663552
gemma3-4b
https://paste.googleplex.com/5830899908345856
https://paste.googleplex.com/6449070222737408
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.