Skip to content

Conversation

copybara-service[bot]
Copy link

[NFC] Refactor the collective matmul

  1. Make sure out_smem is allocated for the duration of the kernel.
    The previous code didn't fully await its usage before its scoped
    allocations expired, which is UB.
  2. Deduplicate the bodies of all N loop steps (we're peeling off the
    first step, since it's the only one that does comms).

1. Make sure out_smem is allocated for the duration of the kernel.
   The previous code didn't fully await its usage before its scoped
   allocations expired, which is UB.
2. Deduplicate the bodies of all N loop steps (we're peeling off the
   first step, since it's the only one that does comms).

PiperOrigin-RevId: 801825966
@copybara-service copybara-service bot merged commit a5f2a26 into main Sep 1, 2025
1 check was pending
@copybara-service copybara-service bot deleted the test_799974615 branch September 1, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant