Skip to content

Conversation

@AviralGoelAMD
Copy link
Collaborator

Proposed changes

This PR adds bquant support to grouped_gemm examples (example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp) and kernel (include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp).

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

kyle-256 and others added 30 commits October 16, 2025 02:23
Calculate has_hot_loop, num_loop, and tail_number on device side for each
GEMM problem instead of using default values. This fixes incorrect results
when different problems in the group have different K dimensions.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for double shared memory buffer (DoubleSmemBuffer) with weight pre-shuffling (PreshuffleB) for BQuantGrouped quantization mode in grouped GEMM operations.

  • Implements a new overload of the pipeline operator to support double shared memory buffers with TailNumber parameter
  • Adds conditional logic to allocate and use two shared memory buffers when both DoubleSmemBuffer and BQuantGrouped are enabled
  • Introduces a new configuration GemmConfigPreshuffleB_Bquant_prefill with PreshuffleB and DoubleSmemBuffer enabled

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
include/ck_tile/ops/gemm_quant/pipeline/gemm_wp_bquant_pipeline_ag_bg_cr_v2.hpp Adds new operator overload accepting TailNumber parameter for double buffer support
include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp Implements conditional double shared memory buffer allocation and adds RunGemmWithPipelineSelection2LDS method
include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp Minor formatting changes (blank lines added)
include/ck_tile/ops/gemm/pipeline/wp_pipeline_agmem_bgmem_creg_v2.hpp Changes BlockHasHotloop from host-only to host-device callable
example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc Adds support for BQuantGrouped stride configuration, pre-shuffling B tensor, and initialization method parameter
example/ck_tile/17_grouped_gemm/quant_grouped_gemm.hpp Adds new configuration and removes pipeline-related macros, adds get_k_from_preshuffled_warp_tile function
example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp Updates pipeline selection logic and switches main to use new PreshuffleB configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bq_dev_buf.push_back(
std::make_unique<ck_tile::DeviceMem>(bq_tensors[i].get_element_space_size_in_bytes()));

if constexpr(GemmConfig::PreshuffleB && QuantMode == ck_tile::QuantType::BQuantGrouped)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is preshuffle B, even it is not BQuant we should still shuffle the B.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my quick testing, gemm_quant_basic example does not support PreshuffleB == true when quant_mode=rowcol | tensor.

Hence, in quant_grouped_gemm, we consider it unsupported for other quant_mode(s)

@ThomasNing
Copy link
Contributor

Close as the #3119 has been merged to the develop

@ThomasNing ThomasNing closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants