-
Notifications
You must be signed in to change notification settings - Fork 44
XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
docs/rfcs/XeGPU.md
Outdated
@@ -329,6 +329,20 @@ Attribute `Memory_kind` describes the memory kind. "global" means the global mem | |||
|
|||
`nbarrier` and `fence` operations lower to uniform instructions, so there is no need to specify the `sg_map`. | |||
|
|||
## XeGPU operations to access share local memory | |||
Users must create `matrix_desc` to hold a matrix in the share local memory. The matrix must be row-major. The matrix can attach a attribute for its memory layout, for example, a blocked layout or just original non-blocked row-major layout (aka. linear layout). | |||
User can get a subview of an existing `matrix_desc` to get a new `matrix_desc`, potentially having a stride. Then user can use load_matrix and store_matrix to move the matrix data between share local memory and vectors (registers). The matrix is typically 2d and but can be multi-dimension. XeGPU's load_matrix and store_matrix works at workgroup level only. It uses xegpu.layout to describe how the matrix is decomposed to data fragments and maps to work items. The workgroup level operation loads the entire matrix to vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're talking about WG-level here I think #1033 should me merged before this one
docs/rfcs/XeGPU.md
Outdated
|
||
// Subview for DPAS tile shape | ||
%ma = xegpu.matrix_desc_subview %m | ||
: matrix_desc<32x256xf16> -> matrix_desc<256x32xf16, @block=[16, 16], #dpas_t_inst> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A <256x32> subview of <32x256>? Is it a "transposed view"? If so, shouldn't there be strides for subview result?
Please review these guidelines to help with the review process: