-
Notifications
You must be signed in to change notification settings - Fork 387
feat(rdma): add parallel memory region registration support #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(rdma): add parallel memory region registration support #855
Conversation
- Add ``parallel_reg_mr`` config option with environment variable control - Implement parallel registration/unregistration using std::async - Maintain backward compatibility with sequential mode Signed-off-by: staryxchen <[email protected]>
Related issue: #848 |
Signed-off-by: staryxchen <[email protected]>
The zip file you provided seems to be empty? |
Sry, I've updated the file. Please try again. |
…sabled - Default value of parallel_reg_mr changed from true to false - Environment variable switched from MC_DISABLE_PARALLEL_REG_MR to MC_ENABLE_PARALLEL_REG_MR Signed-off-by: staryxchen <[email protected]>
Hi @xiaguan |
Sure, I'll give it a try. I'll share the results later. |
In the simple dual-NIC test setup, registration speed doesn't seem to improve with or without pre-allocation—there's actually a bit of regression. Not sure how it performs on 8 NICs yet. I'll test it once I get access to such a machine. In the meantime, feel free to keep this PR open. |
What is the size of the registered memory in your test? |
(40GB, 4GB) |
I think the size is not enough to show the improvements of this patch. Could we try a larger capacity, like 400GB? |
8nic, 200GB without pre alloc
with this pr
pre alloc
with this pr
|
Summary
This PR introduces a configurable parallel memory region registration feature with significant performance improvements for pre-allocated memory scenarios, while maintaining backward compatibility.
I conducted several tests to validate performance (test code is also attached).
perf.tar.gz
Test Configuration
MC_DISABLE_PARALLEL_REG_MR
not set (parallel registration enabled)MC_DISABLE_PARALLEL_REG_MR=1
(sequential registration)Performance Results
Pre-allocated Memory Scenario
Non-pre-allocated Memory Scenario
I had the AI summarize and analyze the test results. Below is the AI's output:
Key Performance Findings
Pre-allocated Memory (500GB)
Non-pre-allocated Memory (500GB)
Analysis
Pre-allocated memory benefits from parallel registration because:
Non-pre-allocated memory performs better with sequential registration because:
Conclusion
The parallel memory registration optimization provides significant performance benefits for pre-allocated memory scenarios, with up to 8x improvement in unregistration performance. However, for large non-pre-allocated memory allocations, sequential registration performs better due to reduced resource contention and kernel overhead.
The
MC_PARALLEL_REG_MR
configuration option provides the flexibility to choose the optimal strategy based on the specific use case and memory allocation patterns of the application.