feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115

AviralGoelAMD · 2025-10-29T14:37:36Z

Proposed changes

This PR adds bquant support to grouped_gemm examples (example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp) and kernel (include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp).

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…gemm

Co-authored-by: Copilot <[email protected]>

Calculate has_hot_loop, num_loop, and tail_number on device side for each GEMM problem instead of using default values. This fixes incorrect results when different problems in the group have different K dimensions.

…ouped_gemm example & kernel

Copilot

Pull Request Overview

This PR adds support for double shared memory buffer (DoubleSmemBuffer) with weight pre-shuffling (PreshuffleB) for BQuantGrouped quantization mode in grouped GEMM operations.

Implements a new overload of the pipeline operator to support double shared memory buffers with TailNumber parameter
Adds conditional logic to allocate and use two shared memory buffers when both DoubleSmemBuffer and BQuantGrouped are enabled
Introduces a new configuration GemmConfigPreshuffleB_Bquant_prefill with PreshuffleB and DoubleSmemBuffer enabled

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
include/ck_tile/ops/gemm_quant/pipeline/gemm_wp_bquant_pipeline_ag_bg_cr_v2.hpp	Adds new operator overload accepting TailNumber parameter for double buffer support
include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp	Implements conditional double shared memory buffer allocation and adds RunGemmWithPipelineSelection2LDS method
include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp	Minor formatting changes (blank lines added)
include/ck_tile/ops/gemm/pipeline/wp_pipeline_agmem_bgmem_creg_v2.hpp	Changes BlockHasHotloop from host-only to host-device callable
example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc	Adds support for BQuantGrouped stride configuration, pre-shuffling B tensor, and initialization method parameter
example/ck_tile/17_grouped_gemm/quant_grouped_gemm.hpp	Adds new configuration and removes pipeline-related macros, adds get_k_from_preshuffled_warp_tile function
example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp	Updates pipeline selection logic and switches main to use new PreshuffleB configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc

example/ck_tile/17_grouped_gemm/quant_grouped_gemm.cpp

ThomasNing · 2025-10-30T02:48:59Z

example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc

        bq_dev_buf.push_back(
            std::make_unique<ck_tile::DeviceMem>(bq_tensors[i].get_element_space_size_in_bytes()));

+        if constexpr(GemmConfig::PreshuffleB && QuantMode == ck_tile::QuantType::BQuantGrouped)


If it is preshuffle B, even it is not BQuant we should still shuffle the B.

Based on my quick testing, gemm_quant_basic example does not support PreshuffleB == true when quant_mode=rowcol | tensor.

Hence, in quant_grouped_gemm, we consider it unsupported for other quant_mode(s)

include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp

ThomasNing · 2025-10-31T19:08:34Z

Close as the #3119 has been merged to the develop

kyle-256 and others added 30 commits October 16, 2025 02:23

add tensorwise quant in grouped gemm

545d166

fix example issue

67cbf7f

update test cases

5106801

format codes

41ada6c

clang format

5092326

use GTEST_FAIL

85a5ba5

add bquant to grouped_gemm

37509dc

add tensorwise quant in grouped gemm

f9990a2

fix example issue

4ec44cc

update test cases

4f00b0a

format codes

5ddba14

clang format

9f380e2

use GTEST_FAIL

a1f1b51

fix a bug in test_grouped_gemm_util

dab2060

skip test when use wmma on grouped_quant kernel

3861248

change cmake

f755a36

fix a bug in test_grouped_gemm_util

5517adf

skip test when use wmma on grouped_quant kernel

3d80973

change cmake

3f637b7

Merge branch 'kyle/gg_fp8_tensorwise' into aviralgoel/bquant_gg_2

cf1a720

tests(quant_grouped_gemm): add unit tests to cover bquant in grouped_…

0779c4e

…gemm

Update test/ck_tile/grouped_gemm_quant/test_grouped_gemm_util_quant.hpp

21a8c16

Co-authored-by: Copilot <[email protected]>

Update example/ck_tile/17_grouped_gemm/quant_grouped_gemm.hpp

9885c74

Co-authored-by: Copilot <[email protected]>

feat: add bf8 support

0fa9b47

chore: remove unnecessary decltype usage

bba11e6

chore: add default quant_mode to function signature as fallback

c44d404

fix: pass correct runtime pipeline params in grouped_gemm bquant kernel

af73d6f

Calculate has_hot_loop, num_loop, and tail_number on device side for each GEMM problem instead of using default values. This fixes incorrect results when different problems in the group have different K dimensions.

chore: set default quant mode in function signature

b5723ae

test: add additional test cases to cover edge case of no hotloop

7640354

change code based on comments

698d82f

AviralGoelAMD added 4 commits October 28, 2025 16:21

WIP: bquant preshuffle b compiles but gives numerical error

556cd08

feat(grouped_gemm_quant): bquant with preshuffleB support added to gr…

9206408

…ouped_gemm example & kernel

Merge branch 'develop' into restore-before-rebase

4479d64

refactor: refactor code after merge commit

6860458

AviralGoelAMD requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners October 29, 2025 14:37

chore: remove print statements

1b9aa04

AviralGoelAMD requested a review from Copilot October 29, 2025 18:32

AviralGoelAMD mentioned this pull request Oct 29, 2025

test(grouped_gemm): add unit tests for grouped_gemm bquant with preshuffleB true #3119

Merged

7 tasks

Copilot AI reviewed Oct 29, 2025

View reviewed changes

example/ck_tile/17_grouped_gemm/quant_run_grouped_gemm_example.inc Show resolved Hide resolved

ThomasNing requested changes Oct 30, 2025

View reviewed changes

chore: return run_grouped_gemm_example() directly

31d6aec

ThomasNing approved these changes Oct 30, 2025

View reviewed changes

amd-khushbu approved these changes Oct 30, 2025

View reviewed changes

ThomasNing closed this Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115

feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115

Uh oh!

AviralGoelAMD commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

ThomasNing Oct 30, 2025

Uh oh!

AviralGoelAMD Oct 30, 2025

Uh oh!

Uh oh!

ThomasNing commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115

feat(quant_grouped_gemm): add bquant with preshuffleB support to grouped_gemm example #3115

Uh oh!

Conversation

AviralGoelAMD commented Oct 29, 2025

Proposed changes

Checklist

Discussion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

ThomasNing Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

AviralGoelAMD Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ThomasNing commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants