XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092

Jianhui-Li · 2025-07-08T06:43:21Z

Please review these guidelines to help with the review process:

Have you provided a meaningful PR description?
Have you added a test, a reproducer, or a reference to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?
Have you organized your commits logically and ensured each can be built by itself?

Garra1980 · 2025-07-08T13:59:02Z

docs/rfcs/XeGPU.md

@@ -329,6 +329,20 @@ Attribute `Memory_kind` describes the memory kind. "global" means the global mem

 `nbarrier` and `fence` operations lower to uniform instructions, so there is no need to specify the `sg_map`.

+## XeGPU operations to access share local memory
+Users must create `matrix_desc` to hold a matrix in the share local memory. The matrix must be row-major. The matrix can attach a attribute for its memory layout, for example, a blocked layout or just original non-blocked row-major layout (aka. linear layout). 
+User can get a subview of an existing `matrix_desc` to get a new `matrix_desc`, potentially having a stride. Then user can use load_matrix and store_matrix to move the matrix data between share local memory and vectors (registers). The matrix is typically 2d and but can be multi-dimension. XeGPU's load_matrix and store_matrix works at workgroup level only. It uses xegpu.layout to describe how the matrix is decomposed to data fragments and maps to work items. The workgroup level operation loads the entire matrix to vector.


Since we're talking about WG-level here I think #1033 should me merged before this one

akroviakov · 2025-07-11T17:19:11Z

docs/rfcs/XeGPU.md

+
+// Subview for DPAS tile shape
+%ma = xegpu.matrix_desc_subview %m
+    : matrix_desc<32x256xf16> -> matrix_desc<256x32xf16, @block=[16, 16], #dpas_t_inst>


A <256x32> subview of <32x256>? Is it a "transposed view"? If so, shouldn't there be strides for subview result?

docs/rfcs/XeGPU.md

Add matrix_desc and operations

1b6dc9c

Jianhui-Li changed the title ~~Add matrix_desc and operations~~ XeGPU RFC update: Add matrix_desc and operations for share local memory Jul 8, 2025

Jianhui-Li changed the title ~~XeGPU RFC update: Add matrix_desc and operations for share local memory~~ XeGPU RFC update: Add matrix_desc and operations for share local memory access Jul 8, 2025

Garra1980 reviewed Jul 8, 2025

View reviewed changes

Jianhui-Li added 10 commits July 8, 2025 09:04

Update XeGPU.md

39198a3

Update XeGPU.md

c1ab298

Update XeGPU.md

1bbcdaa

Update XeGPU.md

8906c08

Update XeGPU.md

3c0c8ad

Update XeGPU.md

bfc834b

Update XeGPU.md

be46fad

Update XeGPU.md

c75986c

Update XeGPU.md

d5b0684

Update XeGPU.md

3657d36

akroviakov reviewed Jul 11, 2025

View reviewed changes

docs/rfcs/XeGPU.md Outdated Show resolved Hide resolved

Jianhui-Li added 2 commits July 11, 2025 17:43

Update XeGPU.md

3b79e51

Update XeGPU.md

5dd778e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092

XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092

Uh oh!

Jianhui-Li commented Jul 8, 2025

Uh oh!

Garra1980 Jul 8, 2025

Uh oh!

akroviakov Jul 11, 2025

Uh oh!

Uh oh!

Uh oh!

XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092

Are you sure you want to change the base?

XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092

Uh oh!

Conversation

Jianhui-Li commented Jul 8, 2025

Uh oh!

Garra1980 Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

akroviakov Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!