You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: example/ck_tile/17_grouped_gemm/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ The `Grouped GEMM` operators are versions of GEMM that run multiple GEMM operati
4
4
5
5
### Preshuffle and Persistence
6
6
7
-
The grouped GEMM examples include two advanced optimization features:
7
+
The grouped GEMM examples include the following advanced optimization features:
8
8
9
9
#### Weight Preshuffle
10
10
Weight preshuffle is an optimization technique that reorganizes the B matrix (weights) in memory to improve data access patterns and reduce memory bandwidth requirements. This is particularly beneficial for inference workloads where the same weights are reused across multiple batches.
@@ -26,7 +26,7 @@ Multi-D operations extend the standard GEMM operation by supporting additional e
26
26
-**Implementation**: Available in `grouped_gemm_multi_d.cpp`
27
27
-**Operation**: E = C × D₀ × D₁ (where C = A × B is the standard GEMM result)
28
28
-**Configuration**: Uses `GemmConfigV3`, `GemmConfigV4`, `GemmConfigMemory` template configuration with 2 D tensors
29
-
-**Data Types**: Supports fp16
29
+
-**Data Types**: Supports fp16, fp8
30
30
-**Benefits**: Enables complex operations like scaling, activation functions, or other elementwise transformations in a single kernel call
0 commit comments