Skip to content

Conversation

@ThruptiRajLakshmanaGowda
Copy link
Contributor

Proposed changes

Currently CK Tile Engine GEMM Multi D operator had a design which was taking longer time to build, with this new design the build time will be reduced.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@AviralGoelAMD AviralGoelAMD requested a review from Copilot October 29, 2025 20:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the GEMM Multi D kernel generation system from a monolithic build approach to an individual kernel compilation model with parallel generation support. Key improvements include:

  • Renaming output tensor from "E" to "C" for consistency
  • Replacing batch dispatcher with individual kernel benchmarking
  • Adding parallel kernel generation using Python's ProcessPoolExecutor
  • New validation utilities for tile configurations
  • JSON-based benchmark results with comprehensive metrics

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gemm_multi_d_profiler.hpp Updated to benchmark single kernels; renamed E to C tensor references
gemm_multi_d_instance_builder.py Complete rewrite for individual kernel generation with parallel processing
gemm_multi_d_benchmark.py New Python-based benchmark orchestration tool with JSON output
gemm_multi_d_common.hpp New file consolidating common types and helper functions
commons/validation_utils.py New comprehensive validation utilities for tile configurations
gemm_multi_d_benchmark_single.cpp New single-kernel benchmark executable
CMakeLists.txt Refactored to build individual kernel targets instead of monolithic libraries
configs/*.json Updated with new persistent field and k_block_per_cu parameter
Jenkinsfile Simplified CI to use Python benchmark orchestrator
Comments suppressed due to low confidence (2)

tile_engine/ops/gemm_multi_d/gemm_multi_d_instance_builder.py:1

  • Debug print statement left in production code. This should be removed or converted to proper logging.
#!/usr/bin/env python

tile_engine/ops/gemm_multi_d/gemm_multi_d_benchmark.hpp:68

  • [nitpick] Unnecessary concatenation of two string literals. The second << \"\\n\" can be combined with the previous string literal as << \"\\n\".

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@ThomasNing ThomasNing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other part LGTM overall

@ThomasNing ThomasNing merged commit a33d98f into develop Oct 31, 2025
48 checks passed
@ThomasNing ThomasNing deleted the ck-tile-engine-gemmd branch October 31, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants