-
Notifications
You must be signed in to change notification settings - Fork 1
count zero #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count zero #28
Conversation
WalkthroughThis update introduces CUDA/HIP C++20 convenience wrappers and documents a new configurable warp size macro. It adds device and host-side zeroing functions, extends the Changes
Sequence Diagram(s)sequenceDiagram
participant Host
participant ManagedMem
participant GPU
participant Kernel
Host->>ManagedMem: Allocate and initialize mem0, mem1
Host->>GPU: Launch kernel (calls mkn::gpu::zero on mem0, mem1)
GPU->>Kernel: Execute zeroing function (parallel zeroing)
Kernel-->>GPU: Zeroed arrays
GPU-->>Host: Return control
Host->>ManagedMem: Validate all elements are zeroed
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (2)
inc/mkn/gpu/def.hpp (1)
21-24: Introduced warp size macro and constant.Adding
_MKN_GPU_WARP_SIZE_and thewarp_sizeconstexpr provides a unified compile-time warp size constant that matches the updated documentation.
Consider, as an optional refactor, switching to an inline variable under C++20:- static constexpr std::size_t warp_size = _MKN_GPU_WARP_SIZE_; + inline constexpr std::size_t warp_size = _MKN_GPU_WARP_SIZE_;Additionally, note that macro names beginning with an underscore followed by an uppercase letter are reserved by the C++ standard; you may wish to rename
_MKN_GPU_WARP_SIZE_toMKN_GPU_WARP_SIZE_to avoid reserved-identifier pitfalls.inc/mkn/gpu/devfunc.hpp (1)
62-62: Consider using explicit template parameter instead of auto.While
auto&works for template parameters, using an explicit template parameter might be clearer and more maintainable.-void inline _prinfo(auto& devProp) { +template <typename DeviceProp> +void inline _prinfo(DeviceProp& devProp) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
README.md(1 hunks)README.noformat(2 hunks)inc/mkn/gpu/cpu.hpp(1 hunks)inc/mkn/gpu/cuda.hpp(1 hunks)inc/mkn/gpu/def.hpp(1 hunks)inc/mkn/gpu/devfunc.hpp(1 hunks)inc/mkn/gpu/launchers.hpp(1 hunks)inc/mkn/gpu/multi_launch.hpp(1 hunks)inc/mkn/gpu/rocm.hpp(2 hunks)mkn.yaml(1 hunks)test/any/managed.cpp(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
test/any/managed.cpp (4)
inc/mkn/gpu/launchers.hpp (2)
DLauncher(72-72)DLauncher(73-73)inc/mkn/gpu/devfunc.hpp (2)
zero(41-49)zero(41-41)inc/mkn/gpu/cpu.hpp (2)
zero(284-286)zero(284-284)test/any/async_streaming.cpp (2)
test(23-54)test(23-23)
inc/mkn/gpu/devfunc.hpp (4)
inc/mkn/gpu/cuda.hpp (9)
void(301-303)void(306-308)s(288-292)s(288-288)idx(61-61)sync(238-238)sync(238-238)sync(239-239)sync(239-239)inc/mkn/gpu/rocm.hpp (7)
void(304-306)void(309-311)idx(60-60)sync(247-247)sync(247-247)sync(248-248)sync(248-248)inc/mkn/gpu/cpu.hpp (7)
t(134-134)idx(295-297)idx(295-295)sync(220-220)sync(220-220)zero(284-286)zero(284-284)inc/mkn/gpu/multi_launch.hpp (22)
s(81-81)s(134-136)i(94-94)i(94-94)i(105-109)i(105-105)i(119-119)i(119-119)i(169-179)i(169-169)i(192-201)i(192-192)i(221-224)i(221-221)i(287-287)i(287-287)i(312-321)i(312-312)idx(190-190)idx(190-190)idx(438-438)idx(438-438)
🔇 Additional comments (13)
inc/mkn/gpu/multi_launch.hpp (1)
34-36: Consistent include ordering.Moving project-specific headers (
"mkn/gpu.hpp","mkn/kul/log.hpp","mkn/kul/time.hpp") above the standard library includes enhances dependency clarity and ensures foundational utilities are available before their consumers.README.md (1)
3-5: Added project focus statement.The new tagline "CUDA/HIP C++20 convenience wrappers" succinctly communicates the library's purpose right under the main heading.
mkn.yaml (1)
13-14: Verify ROCm test configuration.By commenting out the
testentries under therocmprofile, ROCm-specific tests will no longer run. Please confirm that ROCm tests have been intentionally disabled or migrated elsewhere; otherwise, re-enable or update these patterns to ensure ROCm code paths remain covered.README.noformat (2)
3-4: Added concise project descriptor.Including "CUDA/HIP C++20 convenience wrappers" at the top aligns with the README.md update and clearly states the target APIs and language standard.
30-34: Documented the new warp size configuration key.The
_MKN_GPU_WARP_SIZE_entry along with its type and default value has been properly added. Ensure this stays in sync withinc/mkn/gpu/def.hpp.inc/mkn/gpu/cpu.hpp (1)
283-286: LGTM! Clean CPU-side memory zeroing implementation.The implementation correctly uses
std::fillto zero out memory on the CPU side. The template design allows it to work with any data type, and the function signature is consistent with other utility functions in the codebase.inc/mkn/gpu/launchers.hpp (1)
73-73: LGTM! Excellent use of perfect forwarding.The new forwarding constructor enhances flexibility by allowing
DLauncherto be constructed with any combination of arguments that the baseLauncherclass accepts. The use ofauto&&...provides perfect forwarding, ensuring efficient argument passing.inc/mkn/gpu/rocm.hpp (3)
35-35: Good refactoring: Commenting out unused include.Commenting out the include aligns with the refactoring to centralize functionality in separate headers.
314-314: LGTM! Centralizing device functions.Including
devfunc.hppprovides access to centralized device utility functions, improving code organization and reducing duplication across CUDA/ROCm implementations.
319-319: Good use of helper function for device info printing.Delegating to
_prinfo(devProp)reduces code duplication and centralizes device property formatting logic.test/any/managed.cpp (2)
86-103: LGTM! Well-designed test for GPU memory zeroing.The test correctly exercises the GPU
zerofunction:
- Creates managed vectors with known values (1 and 2)
- Launches a kernel with appropriate dimensions (
dim3{1}, dim3{mkn::gpu::warp_size})- The device
zerofunction is designed for warp-level parallelism, so calling it with exactlywarp_sizethreads is correct- Verification logic properly checks that all elements are zeroed by ensuring they sum to 0
The test design aligns well with the device function's intended usage pattern where all threads in a warp cooperatively zero the array.
107-107: Good integration of new test into test suite.Adding
test_zero()to the aggregated return value ensures the new functionality is properly tested as part of the overall test suite.inc/mkn/gpu/cuda.hpp (1)
316-316: LGTM - Clean delegation to centralized function.The refactoring to delegate device property printing to
_prinfo(devProp)is a good improvement that centralizes the implementation and reduces code duplication.
Summary by CodeRabbit
New Features
Documentation
Tests
Bug Fixes
Chores