feat: Enable fast pinned memory alloc in KVBM #4184

jthomson04 · 2025-11-07T17:32:10Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Performance Improvements
- Enhanced pinned memory allocation with NUMA-aware and huge pages support options.
- Introduced DYN_KVBM_USE_HUGE_PAGES environment variable for memory optimization tuning.
- Improved memory allocation robustness with intelligent fallback mechanisms and better error handling.

Signed-off-by: jthomson04 <[email protected]>

coderabbitai · 2025-11-07T17:35:42Z

Walkthrough

Enhances pinned memory allocation in the CUDA block manager with NUMA-awareness and huge pages support. Replaces single-path allocation with conditional logic (2x2 match on numa_aware and use_huge_pages flags), each path offering different allocation strategies with corresponding error handling and fallbacks.

Changes

Cohort / File(s)	Summary
CUDA Memory Allocation Enhancement `lib/llm/src/block_manager/storage/cuda.rs`	Adds NUMA-awareness and huge pages support to pinned memory allocation via DYN_KVBM_USE_HUGE_PAGES environment variable. Replaces single-path allocation with four conditional paths: InvalidConfig when both features enabled, NUMA allocator with host malloc fallback when NUMA enabled, mmap huge page allocation with cudaHostRegister when huge pages enabled, and standard malloc_host when neither enabled. Updates error handling, logging, and imports for cudaError, cudaHostRegister, and mmap constants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Review conditional logic across all four allocation paths (NUMA + huge pages combinations) for correctness and mutual exclusivity
Verify low-level CUDA and mmap operations, including cudaHostRegister behavior and memory registration error handling
Examine NUMA allocator fallback logic and host malloc error recovery paths
Validate environment variable parsing and configuration validation (InvalidConfig scenario)
Check proper resource cleanup and potential memory leaks in failure paths

Poem

🐰 Whiskers twitch with NUMA glee,
Huge pages dance across the sea,
Four paths now for memory's flow,
Pinned and swift, our buffers glow!
Smart allocation, oh what a show! 🎉

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is a template with empty sections and placeholder values that contains no implementation details or concrete information about the changes.	Fill in all sections with concrete details: summarize the NUMA and huge pages implementation, explain the 2x2 allocation strategy, specify the modified file (cuda.rs), and replace the placeholder issue reference with an actual issue number.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: enabling fast pinned memory allocation in KVBM with the addition of NUMA-awareness and huge pages support.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f509493 and afdd18d.

📒 Files selected for processing (1)

lib/llm/src/block_manager/storage/cuda.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

📚 Learning: 2025-09-18T21:49:28.906Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

Applied to files:

lib/llm/src/block_manager/storage/cuda.rs

🧬 Code graph analysis (1)

lib/llm/src/block_manager/storage/cuda.rs (2)

lib/llm/src/block_manager/numa_allocator.rs (2)

is_numa_enabled (15-19)

std (56-56)

lib/llm/src/block_manager/numa_allocator/worker_pool.rs (1)

global (311-314)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: sglang (arm64)
GitHub Check: operator (arm64)
GitHub Check: sglang (amd64)
GitHub Check: vllm (amd64)
GitHub Check: operator (amd64)
GitHub Check: vllm (arm64)
GitHub Check: clippy (lib/runtime/examples)
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: Build and Test - dynamo

coderabbitai · 2025-11-07T17:35:46Z

lib/llm/src/block_manager/storage/cuda.rs

+                    if cudaHostRegister(ptr, size, 0) != cudaError::cudaSuccess {
+                        return Err(StorageError::AllocationFailed(
+                            "Failed to register memory".into(),
+                        ));
+                    }
+
+                    ptr as *mut u8
+                }


⚠️ Potential issue | 🔴 Critical

Wrong teardown for huge-page allocations

In the huge-page branch we mmap anonymous pages and register them with cudaHostRegister, but Drop still calls cudarc::driver::result::free_host, which is only valid for pointers returned by cuMemHostAlloc. Doing that on an mmapd region is undefined behaviour and leaks the huge-page reservation. We also skip cudaHostUnregister / munmap on both the success case and the error path where cudaHostRegister fails, so the mapping lives forever.

Please track whether the allocation came from the huge-page path, call cudaHostUnregister + munmap on drop, and unwind the mapping whenever registration fails. Something along these lines should address it:

@@ -use cudarc::runtime::sys::{cudaError, cudaHostRegister}; +use cudarc::runtime::sys::{cudaError, cudaHostRegister, cudaHostUnregister}; @@ pub struct PinnedStorage { ptr: u64, size: usize, handles: RegistrationHandles, ctx: Arc<CudaContext>, + huge_page_alloc: bool, } @@ - let ptr = match (numa_aware, use_huge_pages) { + let mut huge_page_alloc = false; + let ptr = match (numa_aware, use_huge_pages) { @@ (false, true) => { let ptr = nix::libc::mmap( @@ if ptr == MAP_FAILED { return Err(StorageError::AllocationFailed( "Failed to allocate pinned memory".into(), )); } - if cudaHostRegister(ptr, size, 0) != cudaError::cudaSuccess { - return Err(StorageError::AllocationFailed( - "Failed to register memory".into(), - )); + if cudaHostRegister(ptr, size, 0) != cudaError::cudaSuccess { + unsafe { nix::libc::munmap(ptr, size) }; + return Err(StorageError::AllocationFailed( + "Failed to register memory".into(), + )); } + huge_page_alloc = true; ptr as *mut u8 } @@ Ok(Self { ptr, size, handles: RegistrationHandles::new(), ctx: ctx.clone(), + huge_page_alloc, }) @@ fn drop(&mut self) { self.handles.release(); - unsafe { cudarc::driver::result::free_host(self.ptr as _) }.unwrap(); + unsafe { + if self.huge_page_alloc { + cudaHostUnregister(self.ptr as _); + nix::libc::munmap(self.ptr as *mut _, self.size); + } else { + cudarc::driver::result::free_host(self.ptr as _).unwrap(); + } + } }

Also applies to: 259-261

🤖 Prompt for AI Agents

In lib/llm/src/block_manager/storage/cuda.rs around lines 228-235 (and similarly lines 259-261), the huge-page allocation branch mmaps anonymous memory and registers it with cudaHostRegister but the Drop implementation and error paths still call cudarc::driver::result::free_host (valid only for cuMemHostAlloc), and they omit cudaHostUnregister/munmap, causing UB and leaks; change the allocation tracking to record whether the pointer came from the huge-page (mmap) path, and on Drop call cudaHostUnregister then munmap for huge-page allocations (instead of free_host), ensure that if cudaHostRegister fails you immediately call munmap to unwind the mapping before returning Err, and keep using free_host only for cuMemHostAlloc-backed allocations so both success and error paths properly clean up.

fast pinned alloc

afdd18d

Signed-off-by: jthomson04 <[email protected]>

jthomson04 requested a review from a team as a code owner November 7, 2025 17:32

pull-request-size bot added the size/M label Nov 7, 2025

github-actions bot added the feat label Nov 7, 2025

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable fast pinned memory alloc in KVBM #4184

feat: Enable fast pinned memory alloc in KVBM #4184

jthomson04 commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Enable fast pinned memory alloc in KVBM #4184

Are you sure you want to change the base?

feat: Enable fast pinned memory alloc in KVBM #4184

Conversation

jthomson04 commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jthomson04 commented Nov 7, 2025 •

edited by coderabbitai bot

Loading