Skip to content

Commit 8ac7dec

Browse files
authored
[None][fix] Fix W4A8 MoE kernel issue (NVIDIA#7072)
Signed-off-by: yuhyao <[email protected]>
1 parent f84dd64 commit 8ac7dec

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1524,6 +1524,11 @@ struct CollectiveMmaArrayMixedInput<
15241524
CUTLASS_DEVICE void tensormaps_cp_fence_release(
15251525
TensorMapStorage& shared_tensormaps, cute::tuple<TMs...> const& input_tensormaps)
15261526
{
1527+
if (cute::elect_one_sync())
1528+
{
1529+
cute::tma_desc_commit_group();
1530+
cute::tma_desc_wait_group();
1531+
}
15271532
// Entire warp must do this (i.e. it's aligned)
15281533
tma_descriptor_cp_fence_release(get<0>(input_tensormaps), shared_tensormaps.smem_tensormap_A);
15291534
tma_descriptor_cp_fence_release(get<1>(input_tensormaps), shared_tensormaps.smem_tensormap_B);

0 commit comments

Comments
 (0)