-
Notifications
You must be signed in to change notification settings - Fork 248
[CK TILE ENGINE] GEMM Multi D Restructure #3121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the GEMM Multi D kernel generation system from a monolithic build approach to an individual kernel compilation model with parallel generation support. Key improvements include:
- Renaming output tensor from "E" to "C" for consistency
- Replacing batch dispatcher with individual kernel benchmarking
- Adding parallel kernel generation using Python's ProcessPoolExecutor
- New validation utilities for tile configurations
- JSON-based benchmark results with comprehensive metrics
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| gemm_multi_d_profiler.hpp | Updated to benchmark single kernels; renamed E to C tensor references |
| gemm_multi_d_instance_builder.py | Complete rewrite for individual kernel generation with parallel processing |
| gemm_multi_d_benchmark.py | New Python-based benchmark orchestration tool with JSON output |
| gemm_multi_d_common.hpp | New file consolidating common types and helper functions |
| commons/validation_utils.py | New comprehensive validation utilities for tile configurations |
| gemm_multi_d_benchmark_single.cpp | New single-kernel benchmark executable |
| CMakeLists.txt | Refactored to build individual kernel targets instead of monolithic libraries |
| configs/*.json | Updated with new persistent field and k_block_per_cu parameter |
| Jenkinsfile | Simplified CI to use Python benchmark orchestrator |
Comments suppressed due to low confidence (2)
tile_engine/ops/gemm_multi_d/gemm_multi_d_instance_builder.py:1
- Debug print statement left in production code. This should be removed or converted to proper logging.
#!/usr/bin/env python
tile_engine/ops/gemm_multi_d/gemm_multi_d_benchmark.hpp:68
- [nitpick] Unnecessary concatenation of two string literals. The second
<< \"\\n\"can be combined with the previous string literal as<< \"\\n\".
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other part LGTM overall
Proposed changes
Currently CK Tile Engine GEMM Multi D operator had a design which was taking longer time to build, with this new design the build time will be reduced.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered