Skip to content

Conversation

atchernych
Copy link
Contributor

@atchernych atchernych commented Sep 17, 2025

Overview:

DEP-357

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Introduced a worker selection pipeline that quickly determines the target worker for a request without running full inference, reducing latency for routing decisions.
    • Supports multiple routing strategies (e.g., random, round-robin, direct, key–value based).
    • Streams back the selected worker ID and token-related metadata for downstream use.
    • Added a helper to extract the chosen worker and token data from streaming responses.
    • Backwards-compatible: existing flows are unaffected unless the new pipeline is used.

Signed-off-by: Anna Tchernych <[email protected]>
@atchernych atchernych requested a review from a team as a code owner September 17, 2025 01:28
@atchernych atchernych marked this pull request as draft September 17, 2025 01:29
@github-actions github-actions bot added the feat label Sep 17, 2025
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

Introduces a new worker selection pipeline for LLM requests, exposing it via a new module declaration and implementing pipeline construction, routing integration, and a helper to extract worker selection data from output streams.

Changes

Cohort / File(s) Summary of Changes
Module exposure
lib/llm/src/entrypoint/input.rs
Declares and exports pub mod worker_selection_pipeline; to expose the new pipeline module.
Worker selection pipeline implementation
lib/llm/src/entrypoint/input/worker_selection_pipeline.rs
Adds a forward-only pipeline (frontend → preprocessor → backend → migration → router) focused on routing decisions. Provides builders with/without preprocessor construction and a stream parser to extract worker_instance_id and token_data. Integrates router modes (Random/RoundRobin/Direct/KV), includes tests scaffolding and doc comments.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Frontend
    participant Preprocessor
    participant Backend
    participant Migration
    participant Router

    Client->>Frontend: Request
    Frontend->>Preprocessor: Forward edge
    Preprocessor->>Backend: Prepared request
    Backend->>Migration: Routed candidate(s)
    Migration->>Router: Query routing decision
    Router-->>Migration: Annotations (worker_instance_id, token_data)
    Migration-->>Frontend: Streamed LLMEngineOutput (annotated)
    Frontend-->>Client: Stream (comments carry selection data)

    note over Router,Backend: RouterMode\n- Random/RoundRobin/Direct → ServiceBackend from Router\n- KV → KvPushRouter backend
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A hop through pipes, I trace the line,
From prepped prompts to routes divine.
No tokens spilled, just paths to see—
Which worker hums for you and me.
I twitch my nose, selection won,
Then thump: the journey’s neatly done. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains the template headings but is largely placeholders: Overview only lists "DEP-357", Details and "Where should the reviewer start?" are empty, and Related Issues uses a placeholder "#xxx", so it does not provide a substantive summary of changes, motivations, files to review, testing steps, or impact notes. Please replace placeholders with a concise Overview explaining motivation and expected behavior, populate Details with concrete changes and affected files, list specific files/tests to review under "Where should the reviewer start?", update Related Issues with the correct issue number (or remove the placeholder), and add testing steps and any runtime/migration impact notes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: Create worker selection pipeline" is concise and directly describes the primary change (adding a worker selection pipeline module and related functions), making the intent clear to reviewers; it avoids noise or vague wording.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
lib/llm/src/entrypoint/input/worker_selection_pipeline.rs (8)

37-37: Remove unused import.

The serde_json import is already available in the local scope where it's used (line 178). Consider removing this redundant top-level import.

-use serde_json;
-

62-79: Consider simplifying the complex trait bounds.

The trait bounds for OpenAIPreprocessor as an Operator are repeated across multiple functions. Consider creating a trait alias or helper type to reduce duplication and improve maintainability.

// Add at module level
type PreprocessorOperator<Req> = dyn Operator<
    Context<Req>,
    Pin<Box<dyn AsyncEngineStream<Annotated<LLMEngineOutput>>>>,
    Context<PreprocessedRequest>,
    Pin<Box<dyn AsyncEngineStream<Annotated<LLMEngineOutput>>>>,
>;

Then simplify the where clause to:

where
    Req: dynamo_runtime::engine::Data,
-   OpenAIPreprocessor: Operator<
-           Context<Req>,
-           Pin<Box<dyn AsyncEngineStream<Annotated<LLMEngineOutput>>>>,
-           Context<PreprocessedRequest>,
-           Pin<Box<dyn AsyncEngineStream<Annotated<LLMEngineOutput>>>>,
-       >,
+   OpenAIPreprocessor: PreprocessorOperator<Req>,

159-160: Consider using more descriptive default values.

Using 0 as the default worker_id and an empty vector for tokens might mask extraction failures. Consider using Option<i64> and Option<Vec<u32>> to distinguish between "not found" and actual values, or return an error when expected annotations are missing.

-    let mut worker_id = 0i64;
-    let mut tokens = Vec::<u32>::new();
+    let mut worker_id = None::<i64>;
+    let mut tokens = None::<Vec<u32>>;

     while let Some(response) = stream.next().await {
         if let Some(event) = &response.event {
             match event.as_str() {
                 "worker_instance_id" => {
-                    worker_id = response
+                    worker_id = Some(response
                         .comment
                         .as_ref()
                         .and_then(|comments| comments.first())
                         .and_then(|v| v.parse::<i64>().ok())
-                        .unwrap_or(0);
+                        .ok_or_else(|| anyhow::anyhow!("Failed to parse worker_instance_id"))?);
                 }
                 "token_data" => {
-                    tokens = response
+                    tokens = Some(response
                         .comment
                         .as_ref()
                         .and_then(|comments| comments.first())
                         .and_then(|v| serde_json::from_str::<Vec<u32>>(v).ok())
-                        .unwrap_or_default();
+                        .ok_or_else(|| anyhow::anyhow!("Failed to parse token_data"))?);
                 }
                 _ => {}
             }
         }
     }

-    Ok((worker_id, tokens))
+    match (worker_id, tokens) {
+        (Some(id), Some(t)) => Ok((id, t)),
+        _ => Err(anyhow::anyhow!("Missing required annotations in stream")),
+    }

166-172: Improve error handling for parsing failures.

The current implementation silently falls back to 0 if parsing fails. Consider logging parsing errors or propagating them to help with debugging.

                 "worker_instance_id" => {
                     worker_id = response
                         .comment
                         .as_ref()
                         .and_then(|comments| comments.first())
-                        .and_then(|v| v.parse::<i64>().ok())
-                        .unwrap_or(0);
+                        .ok_or_else(|| anyhow::anyhow!("worker_instance_id annotation missing comment"))?
+                        .parse::<i64>()
+                        .map_err(|e| anyhow::anyhow!("Failed to parse worker_instance_id: {}", e))?;
                 }

194-214: Consider implementing the test or removing it.

The test is marked as #[ignore] with a comment about requiring a full distributed setup. Consider either:

  1. Implementing a unit test with mocked components
  2. Moving this to an integration test suite
  3. Removing the placeholder if it won't be implemented soon

Would you like me to help implement a unit test with mocked components that doesn't require a full distributed setup?


217-335: Large commented-out code block should be removed or documented.

This 118-line commented block contains a complete implementation of create_worker_selection_pipeline_from_c_params. If this functionality is planned for future use, consider:

  1. Moving it to a separate file with a clear TODO/ticket reference
  2. Creating a GitHub issue to track its implementation
  3. Removing it if it's no longer needed

Keeping large blocks of commented code reduces maintainability and can cause confusion.


280-289: Security concern: Unsafe C string handling.

The commented-out code shows unsafe C string conversions without null checks. While this code is not active, if it's uncommented in the future, ensure proper validation:

  • Check for null pointers before calling CStr::from_ptr
  • Validate string encoding
  • Consider using safer alternatives like CString with proper ownership

303-304: Incomplete code in commented section.

Line 303 appears to be an incomplete statement - the match expression for loading the ModelDeploymentCard is cut off. This suggests the commented code was not fully implemented or was partially edited.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0373b89 and fbf1199.

📒 Files selected for processing (2)
  • lib/llm/src/entrypoint/input.rs (1 hunks)
  • lib/llm/src/entrypoint/input/worker_selection_pipeline.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/llm/src/entrypoint/input/worker_selection_pipeline.rs (6)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • KvPushRouter (1217-1318)
  • ModelDeploymentCard (458-463)
  • Client (244-285)
  • RouterMode (854-856)
components/backends/sglang/src/dynamo/sglang/protocol.py (1)
  • PreprocessedRequest (36-43)
lib/runtime/src/pipeline/network/egress/push_router.rs (1)
  • from_client_with_threshold (104-135)
lib/runtime/src/pipeline/nodes/sinks/pipeline.rs (1)
  • from_engine (8-13)
lib/llm/src/preprocessor.rs (1)
  • new_with_parts (160-182)
lib/bindings/python/rust/lib.rs (2)
  • event (1013-1015)
  • comments (1017-1019)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (3)
lib/llm/src/entrypoint/input/worker_selection_pipeline.rs (2)

85-91: Good use of the threshold-aware router construction.

The code correctly uses from_client_with_threshold to create a router that can monitor worker busyness when a threshold is provided.


93-104: Verify KV router initialization requirements.

The code correctly validates that a chooser is required for RouterMode::KV. This prevents runtime errors when KV routing is selected without the necessary components.

lib/llm/src/entrypoint/input.rs (1)

24-24: LGTM! Module declaration follows the existing pattern.

The new worker_selection_pipeline module is properly declared as public and follows the same pattern as other modules in this file.

Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant