-
Notifications
You must be signed in to change notification settings - Fork 248
feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Calculate has_hot_loop, num_loop, and tail_number on device side for each GEMM problem instead of using default values. This fixes incorrect results when different problems in the group have different K dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for double shared memory buffer (DoubleSmemBuffer) with weight pre-shuffling (PreshuffleB) for BQuantGrouped quantization mode in grouped GEMM operations.
- Implements a new overload of the pipeline operator to support double shared memory buffers with TailNumber parameter
- Adds conditional logic to allocate and use two shared memory buffers when both DoubleSmemBuffer and BQuantGrouped are enabled
- Introduces a new configuration
GemmConfigPreshuffleB_Bquant_prefillwith PreshuffleB and DoubleSmemBuffer enabled
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| include/ck_tile/ops/gemm_quant/pipeline/gemm_wp_bquant_pipeline_ag_bg_cr_v2.hpp | Adds new operator overload accepting TailNumber parameter for double buffer support |
| include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp | Implements conditional double shared memory buffer allocation and adds RunGemmWithPipelineSelection2LDS method |
| include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp | Minor formatting changes (blank lines added) |
| include/ck_tile/ops/gemm/pipeline/wp_pipeline_agmem_bgmem_creg_v2.hpp | Changes BlockHasHotloop from host-only to host-device callable |
| example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc | Adds support for BQuantGrouped stride configuration, pre-shuffling B tensor, and initialization method parameter |
| example/ck_tile/17_grouped_gemm/quant_grouped_gemm.hpp | Adds new configuration and removes pipeline-related macros, adds get_k_from_preshuffled_warp_tile function |
| example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp | Updates pipeline selection logic and switches main to use new PreshuffleB configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bq_dev_buf.push_back( | ||
| std::make_unique<ck_tile::DeviceMem>(bq_tensors[i].get_element_space_size_in_bytes())); | ||
|
|
||
| if constexpr(GemmConfig::PreshuffleB && QuantMode == ck_tile::QuantType::BQuantGrouped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is preshuffle B, even it is not BQuant we should still shuffle the B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my quick testing, gemm_quant_basic example does not support PreshuffleB == true when quant_mode=rowcol | tensor.
Hence, in quant_grouped_gemm, we consider it unsupported for other quant_mode(s)
|
Close as the #3119 has been merged to the develop |
Proposed changes
This PR adds bquant support to grouped_gemm examples
(example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp)and kernel(include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp).Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered