Skip to content

Commit ad15dfc

Browse files
ThomasNingvidyasagar-amd
authored andcommitted
Modify the ck_tile gemm config
1 parent b22b840 commit ad15dfc

File tree

2 files changed

+2
-11
lines changed

2 files changed

+2
-11
lines changed

example/ck_tile/17_grouped_gemm/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The `Grouped GEMM` operators are versions of GEMM that run multiple GEMM operati
44

55
### Preshuffle and Persistence
66

7-
The grouped GEMM examples include two advanced optimization features:
7+
The grouped GEMM examples include the following advanced optimization features:
88

99
#### Weight Preshuffle
1010
Weight preshuffle is an optimization technique that reorganizes the B matrix (weights) in memory to improve data access patterns and reduce memory bandwidth requirements. This is particularly beneficial for inference workloads where the same weights are reused across multiple batches.
@@ -26,7 +26,7 @@ Multi-D operations extend the standard GEMM operation by supporting additional e
2626
- **Implementation**: Available in `grouped_gemm_multi_d.cpp`
2727
- **Operation**: E = C × D₀ × D₁ (where C = A × B is the standard GEMM result)
2828
- **Configuration**: Uses `GemmConfigV3`, `GemmConfigV4`, `GemmConfigMemory` template configuration with 2 D tensors
29-
- **Data Types**: Supports fp16
29+
- **Data Types**: Supports fp16, fp8
3030
- **Benefits**: Enables complex operations like scaling, activation functions, or other elementwise transformations in a single kernel call
3131
- **Build Target**: `make tile_example_grouped_gemm_multi_d -j`
3232

example/ck_tile/18_flatmm/README.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,6 @@ $$
1818

1919
- **FLATMM**: An alternative solution as the Preshuffled GEMM in /03_gemm
2020

21-
---
22-
23-
## Tile Programming Model
24-
25-
- **Tiles**: Each thread block processes a tile of $C$ for a given batch.
26-
- **Pipeline**: Modular, supports different memory/computation pipelines and flat/padded layouts.
27-
28-
---
29-
3021

3122
---
3223

0 commit comments

Comments
 (0)