-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Reduce Python and Nuget GPU package size #26002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The size of python wheel for Linux is 304 MB with "80-real;86-real;90a-real;90a-virtual" CUDA architecture enabled, which is still slightly over 300 MB size limit. SM80 (A100), SM86 (A10), SM90 (H100) seems to be the main GPUs our customers are using, and we can't remove them from ORT support list. |
Discussed offline that we might also want to reduce some heaviest cuda kernels, i.e. beam_search_topk. if (k <= 8) {
TopKLauncher(8)
} else {
ORT_THROW("K>8 is not supported for beam search");
} For Linux, Update: We don't need to modify beam_search_topk as removing FPA_INTB_GEMM can give us space back. |
### Description The package size limit for PyPI and Nuget are: - python package size under 300MB - Nuget package size under 250MB To meet the size limit, this PR firstly removes some old GPU arch support in CMAKE_CUDA_ARCHITECTURE. Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel. #### Python wheel | OS | cmake_cuda_architecture | CUDA kernel removal |Package size | Under 300MB| |---------|--------------------------------------------------------|-|-------------|---| | Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual | |341 MB |No (original)| | Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329 MB |No| | Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No| | Linux | 80-real;86-real;90a-real;90a-virtual | |304 MB |No| | Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual. | FPA_INTB_GEMM|287 MB |Yes| | Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272 MB |Yes (original)| #### Nuget | OS | cmake_cuda_architecture | CUDA kernel removal |Package size |Under 250MB| |---------|--------------------------------------------------------|---|--------------|---| | Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB |No (original)| | Linux | 75-real;80-real;90a-real;90a-virtual | |253 MB |No| | Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual |FPA_INTB_GEMM| 230 MB |Yes| | Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264 MB |No (original)| | Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No| | Windows | 75-real;86-real;89-real;90a-virtual || 242 MB |Yes| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
This PR has been cherry-picked into the |
Description
The package size limit for PyPI and Nuget are:
To meet the size limit,
this PR firstly removes some old GPU arch support in CMAKE_CUDA_ARCHITECTURE.
Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel.
Python wheel
Nuget
Motivation and Context