Skip to content

Conversation

oandreeva-nv
Copy link
Contributor

@oandreeva-nv oandreeva-nv commented Sep 17, 2025

Overview:

Fixes DIS-507

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Configure memory layout per device, host, and disk via Python and builder APIs.
    • Inter-layout transfers supported (contiguous ↔ layer-separated) with validation and debug assertions.
    • Layout verification utilities and reports, including address/size checks and optional checksums.
    • Pre-transfer compatibility checks and a new integrity verification API for post-transfer validation.
  • Bug Fixes

    • Corrected block view sizing for fully contiguous layouts with outer dimensions.
  • Tests

    • Extensive coverage for inter-layout copies (CPU/GPU), alignment scenarios, and GDS-compatible disk operations.

oandreeva-nv and others added 4 commits September 15, 2025 10:08
Copy link

copy-pr-bot bot commented Sep 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Olga Andreeva <[email protected]>
@oandreeva-nv oandreeva-nv marked this pull request as ready for review September 17, 2025 17:36
@oandreeva-nv oandreeva-nv requested a review from a team as a code owner September 17, 2025 17:36
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

Introduces per-component layout configuration (device/host/disk) across Python bindings, vLLM/TRTLLM connectors, and worker/config. Adds inter-layout transfer support (FC↔LS) for CUDA and NIXL paths, pre-transfer compatibility checks, and post-transfer integrity verification. Expands layout module with verification utilities and adjusts memory region sizing. Updates tests for GDS alignment and cross-layout cases.

Changes

Cohort / File(s) Summary
Python bindings: layout args + enum
lib/bindings/python/rust/llm/block_manager/distributed/worker.rs
Adds PyLayoutType enum; maps to internal LayoutType; extends KvbmWorker::new with optional device/host/disk layout args; builder wiring with defaults to FullyContiguous.
vLLM/TRTLLM connectors: builder updates
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs
Replaces boolean is_fully_contiguous_layout with device/host/disk layout setters; sets FullyContiguous in TRTLLM connector; vLLM connector sets LS on device, FC on host/disk.
Worker/config: per-edge layout inference
lib/llm/src/block_manager/distributed/worker.rs
Removes boolean flag; adds device_layout_type, host_layout_type, disk_layout_type; infers device layout via match on type; allocates host/disk per specified types.
Block view sizing (FC path)
lib/llm/src/block_manager/block/data/local.rs
For FC blocks, multiplies size by num_outer_dims in BlockView/BlockViewMut creation.
CUDA transfers: inter-layout copy
lib/llm/src/block_manager/block/transfer/cuda.rs
Implements FC→LS and LS→FC copy loops with per-layer/outer iteration and assertions; keeps FC→FC memcpy; adds layout transfer tests and debug assertions.
NIXL transfers: inter-layout mapping and sizing
lib/llm/src/block_manager/block/transfer/nixl.rs
Adds FC→LS descriptor mapping with early return; adds early return for FC→FC; in general path uses min(src,dst) per-layer copy_size with warning.
Distributed transfer: validation & integrity API
lib/llm/src/block_manager/distributed/transfer.rs
Adds pre-transfer layout compatibility check; introduces public verify_transfer_integrity with byte-wise compare and optional pattern checks; adds logging.
Layout module: APIs, sizing, utils exposure
lib/llm/src/block_manager/layout.rs
Removes pub mod distributed; adds pub mod utils; changes FC memory_region_size to outer_dim stride; adds verification methods (verify_memory_regions, expected_memory_address, verify_memory_region) for FC and LS; adds LS alignment verification; introduces crate-private LayerSeparateConfig.
Layout utils: worker verification utilities
lib/llm/src/block_manager/layout/utils.rs
Adds validate_power_of_2 returning validator::ValidationError; introduces public worker_verification module with WorkerLayoutVerifier, RegionVerificationResult, LayoutVerificationStats, and verify_layout_compatibility.
Offload tests: GDS and integrity
lib/llm/src/block_manager/offload.rs
Adjusts existing assertions; adds GDS-focused tests, helpers for mixed layouts, pattern population/verification, file lifecycle, and constrained scenarios.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Py as Python
  participant Pyo3 as PyO3 Binding
  participant Cfg as KvbmWorkerConfig Builder
  participant W as KvbmWorker
  participant Dev as Device Layout
  participant Host as Host Layout
  participant Disk as Disk Layout

  Py->>Pyo3: KvbmWorker.new(..., device_layout_type?, host_layout_type?, disk_layout_type?)
  Pyo3->>Cfg: set device/host/disk layout types (default FC)
  Cfg-->>Pyo3: built config
  Pyo3->>W: KvbmWorker::new(config)
  W->>W: infer device layout (match LayoutType)
  W->>Dev: create device layout
  W->>Host: create host layout (from config)
  W->>Disk: create disk layout (from config)
  W-->>Py: ready
  note over W,Dev: Layout-specific dims resolved (layers/outer/inner)
Loading
sequenceDiagram
  autonumber
  participant BTH as BlockTransferHandler
  participant Src as Source Blocks
  participant Dst as Target Blocks
  participant IO as write_to(..)

  BTH->>BTH: begin_transfer(sources, targets)
  BTH->>BTH: validate_transfer_compatibility
  alt compatible
    BTH->>IO: write_to(sources -> targets)
    IO-->>BTH: Ok(())
  else mismatch
    BTH-->>BTH: Err(LayoutMismatch)
  end

  opt Post-transfer (caller)
    BTH->>BTH: verify_transfer_integrity(sources, targets, patterns?)
    BTH-->>BTH: Ok / Err (data mismatch)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

In tunnels of memory where pages align,
I hop between layers—contiguous, fine.
Device, host, and disk now dance in three,
FC to LS, we copy with glee.
I nibble on checksums; mismatches I fear—
Verified, validated—carrots are clear! 🥕✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The provided PR description is incomplete relative to the repository template: the Overview contains only "Fixes DIS-507", the Details and "Where should the reviewer start?" sections are empty placeholders, and the Related Issues entry uses the placeholder "#xxx", so there is no substantive summary of changes, per-file highlights, testing instructions, or a real issue reference. Because these required elements are missing, reviewers lack the context needed to evaluate scope, risk, and where to focus their review. The description therefore does not meet the repository's template requirements. Please update the PR description to follow the template: provide a concise Overview stating the intent and high-level impact, populate Details with the concrete changes (files modified, key behavioral/API changes, and test coverage), and fill "Where should the reviewer start?" with specific files or lines to inspect; replace the placeholder "closes #xxx" with the actual issue number and include testing steps, migration notes, and any backward-compatibility concerns so reviewers can proceed. Once updated, re-run the pre-merge checks.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The title accurately names a concrete, important change in the diff—host and disk layout defaults/configuration are moved to FullyContiguous—and is specific and non-generic; however, the PR also contains broader layout work (per-edge LayoutType support, Python bindings, layout-compatibility transfer/convert paths, verification APIs, and tests) that the title does not enumerate.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
lib/llm/src/block_manager/block/data/local.rs (1)

123-130: Guard against overflow when computing fully-contiguous block size

Multiplying mr.size() by num_layers() and num_outer_dims() can overflow usize and won’t be caught. Use checked_mul and surface a BlockError instead of silently wrapping.

Apply this diff:

-            let size = mr.size() * self.num_layers() * self.num_outer_dims();
+            let per_region = mr.size();
+            let size = per_region
+                .checked_mul(self.num_layers())
+                .and_then(|v| v.checked_mul(self.num_outer_dims()))
+                .ok_or_else(|| BlockError::InvalidState("block size overflow".to_string()))?;
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)

353-356: Don’t panic in runtime hot path when a slot is missing

Panic will take down the process. Log and continue (or convert to an error return higher up) to keep the worker resilient.

Apply this diff:

-            } else {
-                // made this condition more strict slot existence checks were added as a prerequesite
-                // to be added to the maybe_finished_offloading set.
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished offloading set");
-            }
+            } else {
+                // stricter existence checks are in place, but avoid crashing the process
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_offloading"
+                );
+                continue;
+            }

382-383: Avoid panic in onboarding check as well

Mirror the non-panicking handling here.

Apply this diff:

-            } else {
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished onboarding set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_onboarding"
+                );
+                continue;
+            }
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2)

311-323: Avoid panic on offloading slot miss

Same resilience concern as the vLLM connector; don’t panic in production paths.

Apply this diff:

-            } else {
-                // made this condition more strict slot existence checks were added as a prerequesite
-                // to be added to the maybe_finished_offloading set.
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished offloading set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_offloading"
+                );
+                continue;
+            }

349-350: Avoid panic on onboarding slot miss

Mirror non-panicking handling here.

Apply this diff:

-            } else {
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished onboarding set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_onboarding"
+                );
+                continue;
+            }
lib/llm/src/block_manager/distributed/transfer.rs (1)

104-115: Bounds-check indices from the request before indexing into pools.

source_pool_list[idx] and target_pool_list[idx] may panic if a request carries out-of-range indices. Validate and error out.

+        // Validate indices before indexing
+        let src_len = source_pool_list.len();
+        let tgt_len = target_pool_list.len();
+        for (from, to) in request.blocks() {
+            if *from >= src_len || *to >= tgt_len {
+                return Err(anyhow::anyhow!(
+                    "Transfer index out of bounds: from={}, to={}, src_len={}, tgt_len={}",
+                    from, to, src_len, tgt_len
+                ));
+            }
+        }
lib/llm/src/block_manager/distributed/worker.rs (1)

581-589: Avoid unwraps in production path (stream creation).

Propagate errors instead of panicking:

-        let transfer_context = Arc::new(TransferContext::new(
-            Arc::new(Some(agent)),
-            DeviceAllocator::new(config.device_id)
-                .unwrap()
-                .ctx()
-                .new_stream()
-                .unwrap(),
+        let stream = DeviceAllocator::new(config.device_id)?
+            .ctx()
+            .new_stream()?;
+        let transfer_context = Arc::new(TransferContext::new(
+            Arc::new(Some(agent)),
+            stream,
             Handle::current(),
         ));
🧹 Nitpick comments (17)
lib/llm/src/block_manager/block/transfer/nixl.rs (1)

55-56: Early return is fine, but keep assertions in debug

The explicit Ok(()) is fine; ensure any preconditions remain covered by debug assertions elsewhere.

lib/llm/src/block_manager/block/transfer/cuda.rs (3)

246-253: Incorrect debug message for overlap check

H2D path message says “D2D copy”; adjust to avoid confusion or remove overlap check entirely for H2D.

Apply this diff:

-        "Source and destination device memory regions must not overlap for D2D copy"
+        "Source and destination memory regions must not overlap"

270-276: Same nit: overlap message mentions D2D in D2H path

Mirror the wording fix here.

Apply this diff:

-        "Source and destination device memory regions must not overlap for D2D copy"
+        "Source and destination memory regions must not overlap"

309-739: Large CUDA tests: keep but ensure they don’t run by default

The tests are cfg(test, feature = "testing-cuda"); that’s good. Consider marking the perf test with #[ignore] to avoid accidental long runs in CI tiers without the feature.

lib/llm/src/block_manager/distributed/transfer.rs (1)

21-23: Remove unused import.

layout::BlockLayoutConfig isn’t used here. Drop it to keep warnings clean.

lib/llm/src/block_manager/distributed/worker.rs (1)

217-219: Use checked math for bytes_per_block to avoid usize overflow.

Guard against large shapes:

-        let bytes_per_block =
-            num_layers * outer_dim * config.page_size * inner_dim * config.dtype_width_bytes;
+        let bytes_per_block = num_layers
+            .checked_mul(outer_dim).and_then(|v| v.checked_mul(config.page_size))
+            .and_then(|v| v.checked_mul(inner_dim)).and_then(|v| v.checked_mul(config.dtype_width_bytes))
+            .ok_or_else(|| anyhow::anyhow!("bytes_per_block overflow"))?;
lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (1)

171-181: Minor: simplify Option mapping and let rustfmt reflow.

Use map(Into::into) to reduce verbosity.

-            .device_layout_type(device_layout_type.map(|py_layout| py_layout.into()).unwrap_or(LayoutType::FullyContiguous))
+            .device_layout_type(device_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous))
             .host_layout_type(
-                host_layout_type
-                    .map(|py_layout| py_layout.into())
-                    .unwrap_or(LayoutType::FullyContiguous)
+                host_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous)
             )
             .disk_layout_type(
-                disk_layout_type
-                    .map(|py_layout| py_layout.into())
-                    .unwrap_or(LayoutType::FullyContiguous)
+                disk_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous)
             )
lib/llm/src/block_manager/layout.rs (3)

338-345: Confirm semantics: memory_region_size now equals one page.

Setting memory_region_size = outer_dim_stride_in_bytes aligns LocalMemoryRegion::size to a single [page_size × inner_dim × dtype] region. This is consistent with region addressing used elsewhere. Ensure all callers (e.g., transfer validators) expect per-page size, not per-layer size.


861-943: LayerSeparate verification and alignment checks are solid.

Good coverage: region math, alignment, and capacity checks per-layer. Minor: consider early-returning on first mismatch to speed failing tests.


1621-1904: Comprehensive tests are valuable; minor nits.

The no-overlap/alignment/stride tests add strong regression guarantees. Consider parameterizing constants to reduce duplication across FC/LS tests.

lib/llm/src/block_manager/layout/utils.rs (7)

19-25: align_up: guard against overflow and enforce precondition in release builds

value + alignment - 1 can overflow; debug_assert! disappears in release. Use a checked add and a hard assert (or return Result) to make it safe.

 #[inline(always)]
 pub fn align_up(value: usize, alignment: usize) -> usize {
-    debug_assert!(alignment.is_power_of_two(), "Alignment must be a power of 2");
-    (value + alignment - 1) & !(alignment - 1)
+    assert!(alignment.is_power_of_two() && alignment > 0, "Alignment must be a power of 2");
+    let add = alignment - 1;
+    let sum = value.checked_add(add).expect("overflow in align_up");
+    sum & !add
 }

27-36: Don’t leak validator::ValidationError from low-level utils

This module otherwise uses LayoutError. Exposing validator::ValidationError here couples layers unnecessarily.

-/// Validates that the given value is a power of 2.
-pub fn validate_power_of_2(alignment: usize) -> Result<(), validator::ValidationError> {
-    if alignment.is_power_of_two() {
-        Ok(())
-    } else {
-        Err(validator::ValidationError::new(
-            "Alignment must be a power of 2",
-        ))
-    }
-}
+/// Validates that the given value is a power of 2.
+pub fn validate_power_of_2(alignment: usize) -> Result<(), LayoutError> {
+    if alignment.is_power_of_two() {
+        Ok(())
+    } else {
+        Err(LayoutError::InvalidConfig("Alignment must be a power of 2".into()))
+    }
+}

If external callers truly need validator::ValidationError, add a thin adapter outside this module.


167-190: Pre‑allocate results to avoid reallocations

We know total regions up front.

-            let mut results = Vec::new();
+            let mut results = Vec::with_capacity(layout.num_blocks() * layout.num_layers() * layout.outer_dim());

266-289: Stats should ignore “unknown” address state if you adopt tri‑state addr_matches

If you switch to Option<bool>, update counters to only include Some(false) as mismatches and Some(true) in successes.

-            if !result.addr_matches {
+            if matches!(result.addr_matches, Some(false)) {
                 self.stats.addr_mismatches += 1;
             }
 ...
-            if result.addr_matches && result.size_matches {
+            if result.size_matches && matches!(result.addr_matches, Some(true) | None) {
                 self.stats.successful_verifications += 1;
             }

297-300: Critical mismatches should include checksum when verify_data=true

Currently only size mismatches are considered critical. If data verification is opted in, checksum mismatches should elevate to critical as well.

-        pub fn has_critical_mismatches(&self) -> bool {
-            // Only check size mismatches since address verification is layout-specific
-            self.stats.size_mismatches > 0
-        }
+        pub fn has_critical_mismatches(&self) -> bool {
+            self.stats.size_mismatches > 0 || self.stats.checksum_mismatches > 0
+        }

303-328: Avoid intermediate allocations in report building

Use string literals with push_str.

-            report.push_str(&format!("Layout Verification Report\n"));
-            report.push_str(&format!("========================\n"));
+            report.push_str("Layout Verification Report\n");
+            report.push_str("========================\n");

(Apply similarly to the other lines.)


331-396: Compatibility should also check alignment; consider returning a rich report

  • Missing alignment check can green‑light incompatible GDS paths.
  • Returning bool hides the reason; a structured report improves UX.

Add alignment check:

         if source_config.dtype_width_bytes != dest_config.dtype_width_bytes {
             tracing::error!("Data type width mismatch: {} vs {}",
                 source_config.dtype_width_bytes, dest_config.dtype_width_bytes);
             return Ok(false);
         }
+
+        if source_config.alignment != dest_config.alignment {
+            tracing::error!("Alignment mismatch: {} vs {}",
+                source_config.alignment, dest_config.alignment);
+            return Ok(false);
+        }

Follow‑up (non‑blocking): return Result<(), Incompatibility> collecting all mismatches instead of the first-fail boolean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67ff181 and c5e2be2.

📒 Files selected for processing (11)
  • lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (4 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2 hunks)
  • lib/llm/src/block_manager/block/data/local.rs (2 hunks)
  • lib/llm/src/block_manager/block/transfer/cuda.rs (2 hunks)
  • lib/llm/src/block_manager/block/transfer/nixl.rs (2 hunks)
  • lib/llm/src/block_manager/distributed/transfer.rs (3 hunks)
  • lib/llm/src/block_manager/distributed/worker.rs (4 hunks)
  • lib/llm/src/block_manager/layout.rs (6 hunks)
  • lib/llm/src/block_manager/layout/utils.rs (3 hunks)
  • lib/llm/src/block_manager/offload.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
lib/llm/src/block_manager/offload.rs (4)
lib/llm/src/block_manager/layout/utils.rs (1)
  • verify_layout_compatibility (332-395)
lib/llm/src/block_manager/storage.rs (8)
  • storage_type (171-171)
  • storage_type (389-391)
  • storage_type (469-471)
  • storage_type (510-512)
  • size (177-177)
  • size (397-399)
  • size (477-479)
  • size (518-520)
lib/llm/src/block_manager/block/data.rs (4)
  • storage_type (26-26)
  • layer_view (49-58)
  • num_layers (32-32)
  • block_view (73-78)
lib/llm/src/block_manager/storage/disk.rs (3)
  • storage_type (123-125)
  • fd (90-92)
  • size (131-133)
lib/llm/src/block_manager/block/transfer/nixl.rs (3)
lib/llm/src/block_manager/layout.rs (1)
  • num_layers (235-237)
lib/llm/src/block_manager/block.rs (2)
  • num_layers (377-379)
  • size (1049-1051)
lib/llm/src/block_manager/storage/nixl.rs (7)
  • size (114-116)
  • size (281-283)
  • size (299-301)
  • size (323-325)
  • size (348-350)
  • size (373-375)
  • size (412-414)
lib/llm/src/block_manager/block/transfer/cuda.rs (4)
lib/llm/src/block_manager/block/data/local.rs (1)
  • num_layers (74-76)
lib/llm/src/block_manager/layout.rs (9)
  • num_layers (235-237)
  • storage (194-194)
  • storage (505-507)
  • storage (842-844)
  • config (206-206)
  • config (519-521)
  • config (804-806)
  • allocate (470-495)
  • allocate (764-796)
lib/llm/src/block_manager/block/data.rs (1)
  • num_layers (32-32)
lib/llm/src/block_manager/storage/cuda.rs (9)
  • size (226-228)
  • size (388-390)
  • default (284-288)
  • default (445-449)
  • ctx (460-462)
  • allocate (301-303)
  • allocate (466-468)
  • addr (222-224)
  • addr (384-386)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (1)
lib/llm/src/block_manager/block.rs (1)
  • layout (1242-1244)
lib/llm/src/block_manager/distributed/transfer.rs (3)
lib/llm/src/block_manager/block.rs (3)
  • layout (1242-1244)
  • num_layers (377-379)
  • size (1049-1051)
lib/llm/src/block_manager/block/data/local.rs (1)
  • num_layers (74-76)
lib/llm/src/block_manager/block/data.rs (1)
  • num_layers (32-32)
lib/llm/src/block_manager/distributed/worker.rs (1)
lib/llm/src/block_manager/layout.rs (11)
  • builder (296-298)
  • layout_type (191-191)
  • layout_type (501-503)
  • layout_type (836-840)
  • num_layers (235-237)
  • outer_dim (243-245)
  • inner_dim (253-255)
  • config (206-206)
  • config (519-521)
  • config (804-806)
  • page_size (248-250)
lib/llm/src/block_manager/layout.rs (2)
lib/llm/src/block_manager/storage.rs (8)
  • addr (174-174)
  • addr (393-395)
  • addr (473-475)
  • addr (514-516)
  • size (177-177)
  • size (397-399)
  • size (477-479)
  • size (518-520)
lib/llm/src/block_manager/storage/nixl.rs (8)
  • addr (277-279)
  • size (114-116)
  • size (281-283)
  • size (299-301)
  • size (323-325)
  • size (348-350)
  • size (373-375)
  • size (412-414)
lib/llm/src/block_manager/layout/utils.rs (2)
lib/llm/src/block_manager/block.rs (1)
  • layout (1242-1244)
lib/llm/src/block_manager/layout.rs (1)
  • outer_dim (243-245)
🪛 GitHub Actions: Rust pre-merge checks
lib/bindings/python/rust/llm/block_manager/distributed/worker.rs

[error] 10-10: Rustfmt: formatting issue detected by cargo fmt -- --check. Run 'cargo fmt' to fix.


[error] 28-28: Rustfmt: multi-line struct initialization formatting changed. Run 'cargo fmt'.


[error] 168-168: Rustfmt: break device_layout_type call into multi-line layout.


[error] 175-175: Rustfmt: adjust formatting around unwrap_or in device layout chain.


[error] 179-179: Rustfmt: adjust formatting in device layout chain (map/unwrap).

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs

[error] 20-20: Rustfmt: import LayoutType inserted/moved; reformat to match style.


[error] 169-169: Rustfmt: break device_layout_type call into multi-line layout.


[error] 175-175: Rustfmt: trailing comma added and unwrap formatting adjusted.


[error] 179-179: Rustfmt: unwrap/map formatting adjusted in device layout chain.


[error] 26-26: Rustfmt: import LayoutType removed or reordered to satisfy formatting.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs

[error] 19-19: Rustfmt: import LayoutType inserted/moved; reformat to match style.


[error] 25-25: Rustfmt: import LayoutType adjusted to satisfy formatting.

🪛 GitHub Actions: NVIDIA Dynamo Github Validation
lib/llm/src/block_manager/block/transfer/nixl.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/block/transfer/cuda.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/distributed/transfer.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/layout/utils.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

🔇 Additional comments (15)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)

26-26: Fix rustfmt import ordering

Sandbox formatting check failed (cargo fmt error: "Could not locate working directory: no /proc/self/exe available"); run cargo fmt locally and commit the resulting changes or reorder imports in lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (around line 26) to satisfy rustfmt.


169-176: Format chained builder calls (rustfmt)

Per-edge layout config is correct; format the chained builder calls to satisfy rustfmt/Clippy.
File: lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (lines 169–176)
Run locally: cargo fmt && cargo clippy -q --no-deps

lib/llm/src/block_manager/block/transfer/nixl.rs (1)

1-999: Run cargo fmt (repo-wide)

Formatter failures are blocking CI on this file; run cargo fmt --all and commit the changes. Automated verification failed in the sandbox with: "Could not locate working directory.: no /proc/self/exe available" — run locally or in CI and re-request verification.

lib/llm/src/block_manager/block/transfer/cuda.rs (1)

1-999: Run cargo fmt (repo-wide)

Formatter failures are blocking CI on this file — run cargo fmt --all and commit the resulting changes. Automated verification here failed (error: "Could not locate working directory: no /proc/self/exe available"); run locally or in CI to confirm formatting is fixed.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (1)

25-25: Fix rustfmt import ordering — run cargo fmt

rustfmt is failing for the added LayoutType import in lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (around line 25). Run cargo fmt locally and commit the formatted changes. (Automated attempt failed with: "Could not locate working directory: no /proc/self/exe available".)

lib/llm/src/block_manager/distributed/transfer.rs (1)

129-151: Run cargo fmt to fix formatting.

The pipeline is failing on rustfmt. Please run cargo fmt and commit the diff.

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (2)

26-36: Enum mapping LGTM.

PyLayoutTypeLayoutType conversion is correct and clear.


1-1: Run cargo fmt.

Rustfmt errors are blocking. Please format this file.

lib/llm/src/block_manager/offload.rs (1)

1404-1408: Nice: index-based verification after bulk onboard.

The revised assertions compare each disk→device pair by index and check contents; this is more robust than counting blocks.

lib/llm/src/block_manager/layout.rs (2)

113-118: Public utils module exposure LGTM.

Making utils public and reusing it across verifiers is consistent with added verification APIs.


558-610: Verification APIs for FullyContiguous look good.

verify_memory_regions, expected_memory_address, and verify_memory_region are clean, use the shared verifier, and provide actionable logs.

lib/llm/src/block_manager/layout/utils.rs (4)

83-104: Bounds checks look good

Indexes validated against config with clear, specific errors. LGTM.


403-447: Nice: cross‑layout compatibility test

Good coverage of FC↔LS size compatibility without touching device memory.


448-476: Incompatible layouts test exercises the right surface

The differing blocks case is a good negative check.


105-477: Formatting check couldn't be executed in sandbox — run rustfmt locally and re-run formatting check

Sandbox error returned: "Unable to proceed. Could not locate working directory.: no /proc/self/exe available. Is /proc mounted?"

Run locally and re-run CI:
cargo fmt && cargo fmt -- --check

Comment on lines 139 to 144
let mr = self.layout.memory_region(self.block_idx, 0, 0)?;
let offset = mr.addr();
let size = mr.size() * self.num_layers();
let size = mr.size() * self.num_layers() * self.num_outer_dims();
let storage_type = mr.storage_type();
unsafe { view::BlockViewMut::new(self, offset, size, storage_type) }
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Repeat overflow checks in mutable block view path

Same computation and risk as above; mirror the overflow-safe math.

Apply this diff:

-            let size = mr.size() * self.num_layers() * self.num_outer_dims();
+            let per_region = mr.size();
+            let size = per_region
+                .checked_mul(self.num_layers())
+                .and_then(|v| v.checked_mul(self.num_outer_dims()))
+                .ok_or_else(|| BlockError::InvalidState("block size overflow".to_string()))?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mr = self.layout.memory_region(self.block_idx, 0, 0)?;
let offset = mr.addr();
let size = mr.size() * self.num_layers();
let size = mr.size() * self.num_layers() * self.num_outer_dims();
let storage_type = mr.storage_type();
unsafe { view::BlockViewMut::new(self, offset, size, storage_type) }
} else {
let mr = self.layout.memory_region(self.block_idx, 0, 0)?;
let offset = mr.addr();
let per_region = mr.size();
let size = per_region
.checked_mul(self.num_layers())
.and_then(|v| v.checked_mul(self.num_outer_dims()))
.ok_or_else(|| BlockError::InvalidState("block size overflow".to_string()))?;
let storage_type = mr.storage_type();
unsafe { view::BlockViewMut::new(self, offset, size, storage_type) }
} else {
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/data/local.rs around lines 139 to 144, the
mutable block-view branch repeats unchecked arithmetic that can overflow; mirror
the overflow-safe math used in the immutable path by computing size and final
offset with checked operations (checked_mul and checked_add), return an error
(propagate with ?) if any checked operation returns None, and only call unsafe
view::BlockViewMut::new with the validated offset and size. Ensure you reuse the
same error type/creation used by the immutable branch so behavior matches
exactly.

Comment on lines 78 to 115
} else if !src_data.is_fully_contiguous() && dst_data.is_fully_contiguous() {
// Special case: non-contiguous source (LayerSeparate) → contiguous destination (FullyContiguous)
// We need to copy each layer sequentially to create a contiguous layout

assert_eq!(src_data.num_layers(), dst_data.num_layers());

// Get the full contiguous destination view
let mut dst_block_view = dst_data.block_view_mut()?;
let dst_base_ptr = unsafe { dst_block_view.as_mut_ptr() };

let mut dst_offset = 0;

for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the source layer view (this has the correct LayerSeparate layout)
let src_view = src_data.layer_view(layer_idx, outer_idx)?;
let layer_size = src_view.size();

// Copy to sequential position in contiguous destination
unsafe {
memcpy_fn(
src_view.as_ptr(),
dst_base_ptr.add(dst_offset),
layer_size,
stream,
)?;
}

dst_offset += layer_size;
}
}

// Verify we filled the entire destination block
debug_assert_eq!(dst_offset, dst_block_view.size(),
"Destination offset mismatch: filled {} bytes but destination has {} bytes",
dst_offset, dst_block_view.size());

} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

NC→FC: assert outer-dim parity and bound-check per-chunk writes

Add an outer_dim parity assert and ensure dst_offset + layer_size never exceeds dst_block_view.size() before copying.

Apply this diff:

-        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
@@
-                // Copy to sequential position in contiguous destination
+                // Copy to sequential position in contiguous destination
+                let next = dst_offset.checked_add(layer_size).ok_or_else(|| {
+                    TransferError::ExecutionError("dst_offset overflow".into())
+                })?;
+                debug_assert!(next <= dst_block_view.size(),
+                    "NC→FC: chunk exceeds destination block: next={} total={}",
+                    next, dst_block_view.size());
                 unsafe {
                     memcpy_fn(
                         src_view.as_ptr(),
-                        dst_base_ptr.add(dst_offset),
+                        dst_base_ptr.add(dst_offset),
                         layer_size,
                         stream,
                     )?;
                 }
-
-                dst_offset += layer_size;
+                dst_offset = next;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if !src_data.is_fully_contiguous() && dst_data.is_fully_contiguous() {
// Special case: non-contiguous source (LayerSeparate) → contiguous destination (FullyContiguous)
// We need to copy each layer sequentially to create a contiguous layout
assert_eq!(src_data.num_layers(), dst_data.num_layers());
// Get the full contiguous destination view
let mut dst_block_view = dst_data.block_view_mut()?;
let dst_base_ptr = unsafe { dst_block_view.as_mut_ptr() };
let mut dst_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the source layer view (this has the correct LayerSeparate layout)
let src_view = src_data.layer_view(layer_idx, outer_idx)?;
let layer_size = src_view.size();
// Copy to sequential position in contiguous destination
unsafe {
memcpy_fn(
src_view.as_ptr(),
dst_base_ptr.add(dst_offset),
layer_size,
stream,
)?;
}
dst_offset += layer_size;
}
}
// Verify we filled the entire destination block
debug_assert_eq!(dst_offset, dst_block_view.size(),
"Destination offset mismatch: filled {} bytes but destination has {} bytes",
dst_offset, dst_block_view.size());
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
} else if !src_data.is_fully_contiguous() && dst_data.is_fully_contiguous() {
// Special case: non-contiguous source (LayerSeparate) → contiguous destination (FullyContiguous)
// We need to copy each layer sequentially to create a contiguous layout
assert_eq!(src_data.num_layers(), dst_data.num_layers());
assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
// Get the full contiguous destination view
let mut dst_block_view = dst_data.block_view_mut()?;
let dst_base_ptr = unsafe { dst_block_view.as_mut_ptr() };
let mut dst_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the source layer view (this has the correct LayerSeparate layout)
let src_view = src_data.layer_view(layer_idx, outer_idx)?;
let layer_size = src_view.size();
// Copy to sequential position in contiguous destination
let next = dst_offset.checked_add(layer_size).ok_or_else(|| {
TransferError::ExecutionError("dst_offset overflow".into())
})?;
debug_assert!(next <= dst_block_view.size(),
"NC→FC: chunk exceeds destination block: next={} total={}",
next, dst_block_view.size());
unsafe {
memcpy_fn(
src_view.as_ptr(),
dst_base_ptr.add(dst_offset),
layer_size,
stream,
)?;
}
dst_offset = next;
}
}
// Verify we filled the entire destination block
debug_assert_eq!(dst_offset, dst_block_view.size(),
"Destination offset mismatch: filled {} bytes but destination has {} bytes",
dst_offset, dst_block_view.size());
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer/cuda.rs around lines 78 to 115, add
an outer-dim parity assertion and a per-chunk bounds check before each copy:
assert that src_data.num_outer_dims() is compatible with
dst_data.num_outer_dims() (e.g. src_data.num_outer_dims() %
dst_data.num_outer_dims() == 0 or the stronger equality if required by your
layout invariants) and, before calling memcpy_fn, ensure dst_offset + layer_size
<= dst_block_view.size() and return an error or panic if it would overflow; keep
the existing dst_offset increment and preserve the final debug_assert_eq
verifying the full fill.

Comment on lines 115 to 151
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source (FullyContiguous) → non-contiguous destination (LayerSeparate)
// We need to map the contiguous source data to the layered destination structure

assert_eq!(src_data.num_layers(), dst_data.num_layers());

// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() };

let mut src_offset = 0;

for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct LayerSeparate layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();

// Copy from sequential position in contiguous source
unsafe {
memcpy_fn(
src_base_ptr.add(src_offset),
dst_view.as_mut_ptr(),
layer_size,
stream,
)?;
}

src_offset += layer_size;
}
}

// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

FC→NC: assert outer-dim parity and bound-check per-chunk reads

Same safety concerns in the reverse path; guard src_offset growth before memcpy.

Apply this diff:

-        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
@@
-                // Copy from sequential position in contiguous source
+                // Copy from sequential position in contiguous source
+                let next = src_offset.checked_add(layer_size).ok_or_else(|| {
+                    TransferError::ExecutionError("src_offset overflow".into())
+                })?;
+                debug_assert!(next <= src_block_view.size(),
+                    "FC→NC: chunk exceeds source block: next={} total={}",
+                    next, src_block_view.size());
                 unsafe {
                     memcpy_fn(
-                        src_base_ptr.add(src_offset),
+                        src_base_ptr.add(src_offset),
                         dst_view.as_mut_ptr(),
                         layer_size,
                         stream,
                     )?;
                 }
-
-                src_offset += layer_size;
+                src_offset = next;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source (FullyContiguous) → non-contiguous destination (LayerSeparate)
// We need to map the contiguous source data to the layered destination structure
assert_eq!(src_data.num_layers(), dst_data.num_layers());
// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() };
let mut src_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct LayerSeparate layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();
// Copy from sequential position in contiguous source
unsafe {
memcpy_fn(
src_base_ptr.add(src_offset),
dst_view.as_mut_ptr(),
layer_size,
stream,
)?;
}
src_offset += layer_size;
}
}
// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source (FullyContiguous) → non-contiguous destination (LayerSeparate)
// We need to map the contiguous source data to the layered destination structure
assert_eq!(src_data.num_layers(), dst_data.num_layers());
assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() };
let mut src_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct LayerSeparate layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();
// Copy from sequential position in contiguous source
let next = src_offset.checked_add(layer_size).ok_or_else(|| {
TransferError::ExecutionError("src_offset overflow".into())
})?;
debug_assert!(next <= src_block_view.size(),
"FC→NC: chunk exceeds source block: next={} total={}",
next, src_block_view.size());
unsafe {
memcpy_fn(
src_base_ptr.add(src_offset),
dst_view.as_mut_ptr(),
layer_size,
stream,
)?;
}
src_offset = next;
}
}
// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());

Comment on lines 56 to 103
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source → non-contiguous destination
// We need to map the contiguous source data to the layered destination structure

assert_eq!(src_data.num_layers(), dst_data.num_layers());

// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() } as usize;
let src_device_id = src_block_view.device_id();

let mut src_offset = 0;

for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();

// Create source descriptor with calculated offset from contiguous block
unsafe {
src_dl.add_desc(
src_base_ptr + src_offset,
layer_size,
src_device_id,
)?;
}

// Create destination descriptor
let dst_desc = dst_view.as_nixl_descriptor_mut();
unsafe {
dst_dl.add_desc(
dst_desc.as_ptr() as usize,
dst_desc.size(),
dst_desc.device_id(),
)?;
}

debug_assert_eq!(layer_size, dst_desc.size());
src_offset += layer_size;
}
}

// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

FC → non‑contiguous: add outer-dim parity check and guard offset arithmetic

You assert layer parity but not outer_dim parity; also src_offset arithmetic can overflow or exceed the source size before the final debug_assert. Add checks per chunk.

Apply this diff:

-        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_layers(), dst_data.num_layers());
+        assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
@@
-                // Create source descriptor with calculated offset from contiguous block
+                // Create source descriptor with calculated offset from contiguous block
+                let next = src_offset.checked_add(layer_size).ok_or_else(|| {
+                    anyhow::anyhow!("src_offset overflow while building NIXL xfer list")
+                })?;
+                debug_assert!(next <= src_block_view.size(),
+                    "FC→NC: chunk exceeds source block: next={} total={}",
+                    next, src_block_view.size());
                 unsafe {
                     src_dl.add_desc(
-                        src_base_ptr + src_offset,
+                        src_base_ptr + src_offset,
                         layer_size,
                         src_device_id,
                     )?;
                 }
@@
-                let dst_desc = dst_view.as_nixl_descriptor_mut();
+                let dst_desc = dst_view.as_nixl_descriptor_mut();
                 unsafe {
                     dst_dl.add_desc(
                         dst_desc.as_ptr() as usize,
-                        dst_desc.size(),
+                        layer_size,
                         dst_desc.device_id(),
                     )?;
                 }
@@
-                src_offset += layer_size;
+                src_offset = next;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source → non-contiguous destination
// We need to map the contiguous source data to the layered destination structure
assert_eq!(src_data.num_layers(), dst_data.num_layers());
// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() } as usize;
let src_device_id = src_block_view.device_id();
let mut src_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();
// Create source descriptor with calculated offset from contiguous block
unsafe {
src_dl.add_desc(
src_base_ptr + src_offset,
layer_size,
src_device_id,
)?;
}
// Create destination descriptor
let dst_desc = dst_view.as_nixl_descriptor_mut();
unsafe {
dst_dl.add_desc(
dst_desc.as_ptr() as usize,
dst_desc.size(),
dst_desc.device_id(),
)?;
}
debug_assert_eq!(layer_size, dst_desc.size());
src_offset += layer_size;
}
}
// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());
} else if src_data.is_fully_contiguous() && !dst_data.is_fully_contiguous() {
// Special case: contiguous source → non-contiguous destination
// We need to map the contiguous source data to the layered destination structure
assert_eq!(src_data.num_layers(), dst_data.num_layers());
assert_eq!(src_data.num_outer_dims(), dst_data.num_outer_dims());
// Get the full contiguous source view
let src_block_view = src_data.block_view()?;
let src_base_ptr = unsafe { src_block_view.as_ptr() } as usize;
let src_device_id = src_block_view.device_id();
let mut src_offset = 0;
for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..src_data.num_outer_dims() {
// Get the destination layer view (this has the correct layout)
let mut dst_view = dst_data.layer_view_mut(layer_idx, outer_idx)?;
let layer_size = dst_view.size();
// Create source descriptor with calculated offset from contiguous block
let next = src_offset.checked_add(layer_size).ok_or_else(|| {
anyhow::anyhow!("src_offset overflow while building NIXL xfer list")
})?;
debug_assert!(next <= src_block_view.size(),
"FC→NC: chunk exceeds source block: next={} total={}",
next, src_block_view.size());
unsafe {
src_dl.add_desc(
src_base_ptr + src_offset,
layer_size,
src_device_id,
)?;
}
// Create destination descriptor
let dst_desc = dst_view.as_nixl_descriptor_mut();
unsafe {
dst_dl.add_desc(
dst_desc.as_ptr() as usize,
layer_size,
dst_desc.device_id(),
)?;
}
debug_assert_eq!(layer_size, dst_desc.size());
src_offset = next;
}
}
// Verify we consumed the entire source block
debug_assert_eq!(src_offset, src_block_view.size(),
"Source offset mismatch: consumed {} bytes but source has {} bytes",
src_offset, src_block_view.size());

Comment on lines +112 to 141
// Handle potential size mismatches between layouts
let src_size = src_view.size();
let dst_size = dst_view.size();
let copy_size = std::cmp::min(src_size, dst_size);

// Log a warning if sizes don't match (this indicates a layout issue)
if src_size != dst_size {
tracing::warn!(
"Size mismatch in NIXL layer copy: src_size={}, dst_size={}, using copy_size={}. \
This may indicate a layout configuration issue.",
src_size, dst_size, copy_size
);
}

let src_desc = src_view.as_nixl_descriptor();
let dst_desc = dst_view.as_nixl_descriptor_mut();

unsafe {
src_dl.add_desc(
src_desc.as_ptr() as usize,
src_desc.size(),
copy_size, // Use the safe copy size
src_desc.device_id(),
)?;

dst_dl.add_desc(
dst_desc.as_ptr() as usize,
dst_desc.size(),
copy_size, // Use the safe copy size
dst_desc.device_id(),
)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not silently truncate on size mismatch; fail fast instead

Using min(src_size, dst_size) risks silent data loss/corruption. Unless explicitly feature-gated, this should error out with actionable context.

Apply this diff:

-                // Handle potential size mismatches between layouts
-                let src_size = src_view.size();
-                let dst_size = dst_view.size();
-                let copy_size = std::cmp::min(src_size, dst_size);
-
-                // Log a warning if sizes don't match (this indicates a layout issue)
-                if src_size != dst_size {
-                    tracing::warn!(
-                        "Size mismatch in NIXL layer copy: src_size={}, dst_size={}, using copy_size={}. \
-                         This may indicate a layout configuration issue.",
-                        src_size, dst_size, copy_size
-                    );
-                }
-
-                let src_desc = src_view.as_nixl_descriptor();
-                let dst_desc = dst_view.as_nixl_descriptor_mut();
-
-                unsafe {
-                    src_dl.add_desc(
-                        src_desc.as_ptr() as usize,
-                        copy_size,  // Use the safe copy size
-                        src_desc.device_id(),
-                    )?;
-
-                    dst_dl.add_desc(
-                        dst_desc.as_ptr() as usize,
-                        copy_size,  // Use the safe copy size
-                        dst_desc.device_id(),
-                    )?;
-                }
+                let src_desc = src_view.as_nixl_descriptor();
+                let dst_desc = dst_view.as_nixl_descriptor_mut();
+                if src_desc.size() != dst_desc.size() {
+                    anyhow::bail!(
+                        "NIXL layer size mismatch at layer={}, outer={}: src={} dst={}",
+                        layer_idx,
+                        outer_idx,
+                        src_desc.size(),
+                        dst_desc.size()
+                    );
+                }
+                unsafe {
+                    src_dl.add_desc(
+                        src_desc.as_ptr() as usize,
+                        src_desc.size(),
+                        src_desc.device_id(),
+                    )?;
+                    dst_dl.add_desc(
+                        dst_desc.as_ptr() as usize,
+                        dst_desc.size(),
+                        dst_desc.device_id(),
+                    )?;
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Handle potential size mismatches between layouts
let src_size = src_view.size();
let dst_size = dst_view.size();
let copy_size = std::cmp::min(src_size, dst_size);
// Log a warning if sizes don't match (this indicates a layout issue)
if src_size != dst_size {
tracing::warn!(
"Size mismatch in NIXL layer copy: src_size={}, dst_size={}, using copy_size={}. \
This may indicate a layout configuration issue.",
src_size, dst_size, copy_size
);
}
let src_desc = src_view.as_nixl_descriptor();
let dst_desc = dst_view.as_nixl_descriptor_mut();
unsafe {
src_dl.add_desc(
src_desc.as_ptr() as usize,
src_desc.size(),
copy_size, // Use the safe copy size
src_desc.device_id(),
)?;
dst_dl.add_desc(
dst_desc.as_ptr() as usize,
dst_desc.size(),
copy_size, // Use the safe copy size
dst_desc.device_id(),
)?;
}
let src_desc = src_view.as_nixl_descriptor();
let dst_desc = dst_view.as_nixl_descriptor_mut();
if src_desc.size() != dst_desc.size() {
anyhow::bail!(
"NIXL layer size mismatch at layer={}, outer={}: src={} dst={}",
layer_idx,
outer_idx,
src_desc.size(),
dst_desc.size()
);
}
unsafe {
src_dl.add_desc(
src_desc.as_ptr() as usize,
src_desc.size(),
src_desc.device_id(),
)?;
dst_dl.add_desc(
dst_desc.as_ptr() as usize,
dst_desc.size(),
dst_desc.device_id(),
)?;
}
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer/nixl.rs around lines 112 to 141, the
code currently silently truncates on layout size mismatches by using
min(src_size, dst_size) and only logging a warning; change this to fail fast:
check if src_size != dst_size and return an Err with actionable context (include
src_size, dst_size, layer identifiers if available) instead of proceeding, and
only call add_desc with the full expected size when they match; preserve the
unsafe block but ensure you propagate a clear error before any unsafe operations
when sizes differ.

Comment on lines +225 to +236
Ok(RegionVerificationResult {
block_idx,
layer_idx,
outer_idx,
expected_addr,
actual_addr: region.addr,
expected_size,
actual_size: region.size,
addr_matches: true, // Always true since we accept the layout's calculation
size_matches: expected_size == region.size,
checksum,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

addr_matches is always true — misleading verification signal

expected_addr is set to region.addr, so addr_matches can never fail. Either compute an independent expected address or mark address verification as “not performed.”

If you can’t compute expected addresses, change the API to express tri‑state:

-        /// Whether the addresses match
-        pub addr_matches: bool,
+        /// Address verification: None when not performed; Some(true/false) when verified
+        pub addr_matches: Option<bool>,

And set:

-                addr_matches: true, // Always true since we accept the layout's calculation
+                addr_matches: None, // Address verification not performed

Update stats/reporting accordingly to ignore None.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Ok(RegionVerificationResult {
block_idx,
layer_idx,
outer_idx,
expected_addr,
actual_addr: region.addr,
expected_size,
actual_size: region.size,
addr_matches: true, // Always true since we accept the layout's calculation
size_matches: expected_size == region.size,
checksum,
})
Ok(RegionVerificationResult {
block_idx,
layer_idx,
outer_idx,
expected_addr,
actual_addr: region.addr,
expected_size,
actual_size: region.size,
addr_matches: None, // Address verification not performed
size_matches: expected_size == region.size,
checksum,
})
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/layout/utils.rs around lines 225 to 236, the code
sets expected_addr = region.addr and then addr_matches = true which makes
address verification meaningless; either compute the expected address
independently (derive it from layout metadata or parent offsets and use that
value to set addr_matches = (expected_addr == region.addr)), or change the API
to express a tri-state for address verification (e.g., Option<bool> or an enum
Verified/NotPerformed/Failed), populate expected_addr only when computable (or
set it to None), set addr_matches to None/NotPerformed when not computed, and
update any stats/reporting code to treat None/NotPerformed as “ignore” instead
of failure.

Comment on lines +242 to +253
fn calculate_memory_checksum(&self, region: &LocalMemoryRegion) -> Result<u64, LayoutError> {
unsafe {
let slice = std::slice::from_raw_parts(region.addr as *const u8, region.size);
let mut checksum = 0u64;

for (i, &byte) in slice.iter().enumerate() {
checksum = checksum.wrapping_add((byte as u64).wrapping_mul(i as u64 + 1));
}

Ok(checksum)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unsafe read can touch device memory and crash

from_raw_parts(region.addr as *const u8, region.size) assumes host‑accessible memory. On CUDA/NIXL, this can segfault or corrupt. Guard and/or route through a safe storage API.

Minimal hardening:

         fn calculate_memory_checksum(&self, region: &LocalMemoryRegion) -> Result<u64, LayoutError> {
-            unsafe {
+            if region.size == 0 {
+                return Ok(0);
+            }
+            if region.addr == 0 {
+                return Err(LayoutError::InvalidConfig("region.addr is null; cannot checksum".into()));
+            }
+            // TODO: gate on host accessibility once exposed by storage/layout.
+            unsafe {
                 let slice = std::slice::from_raw_parts(region.addr as *const u8, region.size);
                 let mut checksum = 0u64;
                 for (i, &byte) in slice.iter().enumerate() {
                     checksum = checksum.wrapping_add((byte as u64).wrapping_mul(i as u64 + 1));
                 }
                 Ok(checksum)
             }
         }

Longer‑term: expose a host‑access check or read API on Storage/GenericBlockLayout and avoid raw pointer reads here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn calculate_memory_checksum(&self, region: &LocalMemoryRegion) -> Result<u64, LayoutError> {
unsafe {
let slice = std::slice::from_raw_parts(region.addr as *const u8, region.size);
let mut checksum = 0u64;
for (i, &byte) in slice.iter().enumerate() {
checksum = checksum.wrapping_add((byte as u64).wrapping_mul(i as u64 + 1));
}
Ok(checksum)
}
}
fn calculate_memory_checksum(&self, region: &LocalMemoryRegion) -> Result<u64, LayoutError> {
if region.size == 0 {
return Ok(0);
}
if region.addr == 0 {
return Err(LayoutError::InvalidConfig("region.addr is null; cannot checksum".into()));
}
// TODO: gate on host accessibility once exposed by storage/layout.
unsafe {
let slice = std::slice::from_raw_parts(region.addr as *const u8, region.size);
let mut checksum = 0u64;
for (i, &byte) in slice.iter().enumerate() {
checksum = checksum.wrapping_add((byte as u64).wrapping_mul(i as u64 + 1));
}
Ok(checksum)
}
}
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/layout/utils.rs around lines 242 to 253, the code
unsafely does from_raw_parts(region.addr as *const u8, region.size) which can
read device/non‑host memory and crash; instead guard this path and route through
a safe storage API: check whether the region is host‑accessible (or call a
Storage/GenericBlockLayout host_accessible/read API), and only perform the raw
pointer read when confirmed; otherwise copy/read the region via the provided
storage I/O (or return a LayoutError indicating non‑host memory). For minimal
hardening, wrap the unsafe block with a host_accessible check and 1) allocate a
Vec<u8> and copy bytes only when safe, or 2) invoke an exposed
read_into_host_buffer(...) on the layout/storage and compute the checksum from
that buffer, returning an error if neither is possible.

Comment on lines +1431 to +1441
// GDS requires 4KB alignment for optimal performance
const GDS_ALIGNMENT: usize = 4096;

let (offload_manager, device_pool, host_pool, disk_pool) = build_pools_with_layout(
4,
Some(4),
Some(4),
Some(GDS_ALIGNMENT), // Use GDS-friendly alignment
layout_type,
BlockRegistrationDuplicationSetting::Disabled,
)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

GDS alignment test is misconfiguring inner_dim, not alignment.

Passing Some(GDS_ALIGNMENT) goes into inner_dim, not alignment. The test claims “Use GDS-friendly alignment” but doesn’t set LayoutConfig.alignment, so it’s not validating 4KB alignment.

  • Option A (minimal): add an overload build_pools_with_layout_and_alignment(..., alignment: Option<usize>) and use it here.
  • Option B (change signature): extend build_pools_with_layout to accept alignment: Option<usize> and plumb it into LayoutConfig.

Example (Option B):

-    pub fn build_pools_with_layout(
+    pub fn build_pools_with_layout(
         device_blocks: usize,
         host_blocks: Option<usize>,
         disk_blocks: Option<usize>,
-        inner_dim: Option<usize>,
+        inner_dim: Option<usize>,
+        alignment: Option<usize>,
         layout_type: LayoutType,
         duplication_setting: BlockRegistrationDuplicationSetting,
     ) -> Result<(... )> {
         let mut config = LayoutConfig {
             ...
-            alignment: 1,
+            alignment: alignment.unwrap_or(1),
             dtype_width_bytes: 2,
         };

Then pass alignment: Some(GDS_ALIGNMENT) from this test.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo "Top-level files:"
ls -la | sed -n '1,120p' || true

echo
echo "Searching for offload.rs..."
rg -n --hidden --no-ignore -S "offload.rs" || true

echo
echo "Attempt to show lib/llm/src/block_manager/offload.rs lines 1400-1460 if present..."
if [ -f lib/llm/src/block_manager/offload.rs ]; then
  echo "Found: lib/llm/src/block_manager/offload.rs"
  sed -n '1400,1460p' lib/llm/src/block_manager/offload.rs || true
else
  echo "lib/llm/src/block_manager/offload.rs not found at that path"
fi

echo
echo "Searching for function definition 'build_pools_with_layout' and its call sites..."
rg -n --hidden --no-ignore -S "pub fn build_pools_with_layout" -C5 || true
rg -n --hidden --no-ignore -S "build_pools_with_layout\(" -C5 || true

echo
echo "Searching for LayoutConfig struct and fields 'alignment' / 'inner_dim'..."
rg -n --hidden --no-ignore -S "struct LayoutConfig" -C5 || true
rg -n --hidden --no-ignore -S "\balignment\b" -C5 || true
rg -n --hidden --no-ignore -S "\binner_dim\b" -C5 || true

Length of output: 179265


GDS alignment argument is being passed to inner_dim — tests do not set LayoutConfig.alignment

Confirmed: build_pools_with_layout's 4th parameter is inner_dim: Option and LayoutConfig.alignment is hardcoded to 1, so passing Some(GDS_ALIGNMENT) sets inner_dim instead of alignment. Location: lib/llm/src/block_manager/offload.rs (call at ~1434–1441; function/signature around ~687–709).

Fix options:

  • Option A (minimal): add an overload build_pools_with_layout_and_alignment(..., alignment: Option) and call it from the GDS tests.
  • Option B (preferred): extend build_pools_with_layout to accept alignment: Option and plumb it into LayoutConfig.

Example (Option B):

-    pub fn build_pools_with_layout(
+    pub fn build_pools_with_layout(
         device_blocks: usize,
         host_blocks: Option<usize>,
         disk_blocks: Option<usize>,
-        inner_dim: Option<usize>,
+        inner_dim: Option<usize>,
+        alignment: Option<usize>,
         layout_type: LayoutType,
         duplication_setting: BlockRegistrationDuplicationSetting,
     ) -> Result<(... )> {
         let mut config = LayoutConfig {
             ...
-            alignment: 1,
+            alignment: alignment.unwrap_or(1),
             dtype_width_bytes: 2,
         };

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1568 to +1576
let expected_size = BLOCK_SIZE * NUM_LAYERS * 2 * 13 * 4; // From test constants
assert!(metadata.len() >= expected_size as u64,
"Disk file size {} is smaller than expected {}",
metadata.len(), expected_size);

// Verify file is properly aligned for GDS operations
assert_eq!(metadata.len() % 4096, 0,
"Disk file size {} is not 4KB aligned for GDS", metadata.len());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix expected_size computation; dtype width and layout dims are hardcoded.

expected_size uses magic numbers (2 * 13 * 4) that don’t match the LayoutConfig used here (dtype_width_bytes = 2, outer_dim = 1 in many setups). Compute from the block’s actual view or layout config to avoid flakiness:

-                    let expected_size = BLOCK_SIZE * NUM_LAYERS * 2 * 13 * 4; // From test constants
+                    let bd = block_data;
+                    let region_size = bd.page_size() * bd.inner_dim() * 2; // dtype_width_bytes
+                    let expected_size = NUM_LAYERS * bd.outer_dim() * region_size;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let expected_size = BLOCK_SIZE * NUM_LAYERS * 2 * 13 * 4; // From test constants
assert!(metadata.len() >= expected_size as u64,
"Disk file size {} is smaller than expected {}",
metadata.len(), expected_size);
// Verify file is properly aligned for GDS operations
assert_eq!(metadata.len() % 4096, 0,
"Disk file size {} is not 4KB aligned for GDS", metadata.len());
}
let bd = block_data;
let region_size = bd.page_size() * bd.inner_dim() * 2; // dtype_width_bytes
let expected_size = NUM_LAYERS * bd.outer_dim() * region_size;
assert!(metadata.len() >= expected_size as u64,
"Disk file size {} is smaller than expected {}",
metadata.len(), expected_size);
// Verify file is properly aligned for GDS operations
assert_eq!(metadata.len() % 4096, 0,
"Disk file size {} is not 4KB aligned for GDS", metadata.len());
}

Comment on lines +1780 to +1836
/// Verify layer-specific patterns are preserved across transfers
fn verify_layer_patterns<S1, L1, M1, S2, L2, M2>(
source_block: &ImmutableBlock<S1, L1, M1>,
dest_block: &ImmutableBlock<S2, L2, M2>
) -> Result<()>
where
S1: Storage, L1: LocalityProvider, M1: BlockMetadata,
S2: Storage, L2: LocalityProvider, M2: BlockMetadata,
ImmutableBlock<S1, L1, M1>: BlockDataProvider,
ImmutableBlock<S2, L2, M2>: BlockDataProvider,
{
let src_data = source_block.block_data();
let dst_data = dest_block.block_data();

assert_eq!(src_data.num_layers(), dst_data.num_layers());

for layer_idx in 0..src_data.num_layers() {
for outer_idx in 0..2 { // Assuming max 2 outer dimensions
if let (Ok(src_layer), Ok(dst_layer)) = (
src_data.layer_view(layer_idx, outer_idx),
dst_data.layer_view(layer_idx, outer_idx)
) {
assert_eq!(src_layer.size(), dst_layer.size());

let expected_pattern = 0x10 + layer_idx as u8 + outer_idx as u8;

unsafe {
let src_ptr = src_layer.as_ptr();
let dst_ptr = dst_layer.as_ptr();
let src_size = src_layer.size();
let dst_size = dst_layer.size();

// Safety checks
if src_ptr.is_null() || dst_ptr.is_null() {
return Err(anyhow::anyhow!("Layer view returned null pointer"));
}
if src_size == 0 || dst_size == 0 {
continue; // Skip empty layers
}

let src_slice = std::slice::from_raw_parts(src_ptr, src_size);
let dst_slice = std::slice::from_raw_parts(dst_ptr, dst_size);

// Verify source has expected pattern
assert!(src_slice.iter().all(|&b| b == expected_pattern),
"Source layer {} outer {} has incorrect pattern", layer_idx, outer_idx);

// Verify destination matches source
assert!(dst_slice.iter().all(|&b| b == expected_pattern),
"Destination layer {} outer {} has incorrect pattern", layer_idx, outer_idx);
}
}
}
}

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid unsafe CPU reads from device memory in helpers.

verify_layer_patterns creates CPU slices from raw pointers without staging device memory; this will break for StorageType::Device. Reuse check_block_contents (which already handles device vs host vs disk) for integrity, or add device→host copies.

  • Minimal: make this helper operate only on host/disk blocks; return Ok(()) or skip when device storage is detected.
  • Better: copy device layers to host before inspection, mirroring get_block_contents.

This will prevent spurious crashes in mixed-layout/device tests.

Signed-off-by: Olga Andreeva <[email protected]>
@oandreeva-nv oandreeva-nv changed the title Transition to FullyContiguous Host and Disk layouts feature: Transition to FullyContiguous Host and Disk layouts Sep 17, 2025
@oandreeva-nv oandreeva-nv changed the title feature: Transition to FullyContiguous Host and Disk layouts feat: Transition to FullyContiguous Host and Disk layouts Sep 17, 2025
@github-actions github-actions bot added the feat label Sep 17, 2025
// Handle potential size mismatches between layouts
let src_size = src_view.size();
let dst_size = dst_view.size();
let copy_size = std::cmp::min(src_size, dst_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we let it panic or return an error in this case? If this happens, feels like a high chance the model will start generating garbage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought this would be the case involving G1 such that one side is not contiguous and the other side is, such that this is an expected case. so I left my below comment :-)

Comment on lines +70 to +74
tracing::warn!(
"Size mismatch in NIXL layer copy: src_size={}, dst_size={}, using copy_size={}. \
This may indicate a layout configuration issue.",
src_size, dst_size, copy_size
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this warn msg be printed out a lot per each onboarding and offloading involving G1?

also, is it really indicating a layout config issue or expected?

}

/// Verify block data integrity after transfer
pub fn verify_transfer_integrity<Source, Target>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where verify_transfer_integrity is actually used?

LayoutType::FullyContiguous => {
let num_layers = shape[1];
let outer_dim = shape[2];
let inner_dim = shape[3..].iter().product::<usize>() / config.page_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: this is interesting to group all the rest in the inner_dim

Comment on lines +187 to +191
let (outer_contiguous, outer_dim) = if shape[0] >= config.num_device_blocks {
(false, shape[1])
} else if shape[1] >= config.num_device_blocks {
(true, shape[0])
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why the branching logic is like this here? around shape[0]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants