-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Drop the deprecated binary format. #11307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Remove. Basic model test. Cleanup. cli. adaptive test.
@hcho3 Could you please help re-upload the test models to s3, with binary models removed? Also, the RMM build is failing ;-( |
@trivialfis Should we also remove RDS model files from xgboost_r_model_compatibility_test.zip ? It's probably using the legacy binary format. |
Please do. @hcho3 . |
Done |
Thank you! |
@hcho3 Could you please share the latest download link? |
Let me generate some new models using 3.0 instead |
Models generated by the new script using 3.0.2 and 2.1.4: |
Are we dropping compatibility for JSON models from XGBoost 1.x? As for the download link, I uploaded the zip file to the same s3 bucket as before. Were you not be able to access it? |
Let me work on tests for some older models. |
Tests on CRAN are also failing https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-gcc/xgboost-00check.html . |
My bad, let me change the access setting. |
I uploaded new models. After this PR, Python and R will test with the same set of models. The old models are still in S3, just in case we need them. Also, we can't remove the R test models until the CRAN package is updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes support for the deprecated binary model format and fully transitions to JSON and UBJSON serialization. It also adds compatibility tests for categorical features and AFT survival, and standardizes the set of test models used in both Python and R.
- Remove legacy Load/Save implementations for the old binary format across C++ code.
- Update CLI and C API to recognize
.ubj
and.json
extensions exclusively. - Refactor Python and R tests to cover categorical features, AFT survival, and use a unified model set.
Reviewed Changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/learner.cc | Dropped old binary serialization code and updated deprecation warnings. |
src/cli_main.cc | Updated CLI to save/load .ubj and .json formats only. |
tests/python/test_model_compatibility.py | Refactored compatibility test harness and download logic. |
tests/python/generate_models.py | Extended model generator with categorical and AFT survival cases. |
Comments suppressed due to low confidence (2)
tests/python/test_model_compatibility.py:13
- The test uses
xgboost.Booster
and otherxgboost
APIs but does not import thexgboost
module. Addimport xgboost
(or alias) at the top of the file to avoid NameError during test execution.
from xgboost import testing as tm
src/cli_main.cc:347
- [nitpick] This line is misaligned in the
else if (ext == "ubj")
block. Adjust the indentation to match the surrounding 6-space indent for consistency with project style.
learner->LoadModel(in);
@hcho3 Could you please help take a look when you are available? |
I need to let the R check skip the compatibility check when the link is down to make CRAN tests more resilient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some questions, but otherwise looks good
def write_versions(): | ||
versions = {'numpy': np.__version__, | ||
'xgboost': version} | ||
with open(os.path.join(target_dir, 'version'), 'w') as fd: | ||
fd.write(str(versions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping write_versions()
? Is it because the file name for each model artifact already contains the version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Also the models we test are from multiple versions.(1-3)
auto ext = common::FileExtension(path); | ||
auto read_file = [&]() { | ||
auto str = common::LoadSequentialFile(path); | ||
CHECK_GE(str.size(), 3); // "{}\0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does common::LoadSequentialFile
return a string with \0
? I thought the terminating \0
won't be part of the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns a vector of char as it consumes binary inputs. I removed the requirement for \0
and added a small test for the JSON parser.
Close #7547 .
todos:
Models
xgboost_model_compatibility_tests-3.0.2.zip