Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Nov 10, 2025

Overview:

This PR successfully migrates the llama4 multimodal disagg support from examples to vLLM components library.

Key Changes:

  • added examples/backends/vllm/launch/disagg_multimodal_llama.sh. This adapts to already migrated multimodal scripts. Run the disagg PD workers with inline image encoding.
  • deleted old disagg_llama.sh script.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Documentation

    • Enhanced help message with detailed usage examples and descriptions for multimodal serving setup.
  • Refactor

    • Streamlined configuration workflow with unified command-line interface.
    • Consistent method for configuring encoding, prefilling, and decoding components across nodes.

@ayushag-nv ayushag-nv requested review from a team as code owners November 10, 2025 18:06
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

Walkthrough

A shell script was modified to migrate from direct Python script invocations to a unified dynamo.vllm CLI-driven workflow for disaggregated multimodal serving, with corresponding flag updates and expanded help documentation.

Changes

Cohort / File(s) Summary
Disaggregated Multimodal Serving Script Migration
examples/backends/vllm/launch/disagg_multimodal_llama.sh
Migrated from direct Python script invocations (components/processor.py, components/worker.py) to unified python -m dynamo.vllm CLI calls with subcommands (--multimodal-processor, --multimodal-encode-prefill-worker, --multimodal-decode-worker). Updated corresponding flags and expanded help message with usage examples.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Verify flag mappings are correct between old script invocations and new CLI subcommands
  • Confirm all parameter names and values align with the new dynamo.vllm CLI interface
  • Review that the disaggregated node configuration (head vs. non-head) maintains equivalent behavior under the new workflow

Poem

🐰 The scripts have unified, hooray, hooray!
From scattered files to CLI's way,
With dynamo.vllm, the workflow flows clear,
Multimodal serving without a fear,
One interface to prefill and decode—
A smoother, cleaner, harmonious road! 🎯

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides an overview and mentions key changes, but the Details and Where should the reviewer start sections are empty, and the Related Issues section has a placeholder (#xxx) instead of an actual issue number. Fill in the Details section with specifics about the migration, complete the Where should the reviewer start section with file guidance, and replace #xxx with the actual GitHub issue number.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: migration of llama4 multimodal disagg support, which aligns with the shell script updates replacing old components with new dynamo.vllm CLI invocations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e4a339 and e04bc9f.

📒 Files selected for processing (1)
  • examples/backends/vllm/launch/disagg_multimodal_llama.sh (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • examples/backends/vllm/launch/disagg_multimodal_llama.sh
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4213/merge) by ayushag-nv.
examples/backends/vllm/launch/disagg_multimodal_llama.sh

[error] 1-1: Command: pre-commit run --show-diff-on-failure --color=always --all-files failed. Trailing whitespace was detected and the hook modified the file: examples/backends/vllm/launch/disagg_multimodal_llama.sh. Rerun the pre-commit checks locally (e.g., 'pre-commit run --all-files') to verify all issues are resolved.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
examples/backends/vllm/launch/disagg_multimodal_llama.sh (3)

18-31: Improved help documentation.

The expanded help message with usage examples and clearer structure is good for user guidance. The examples clearly show how to use the script on head and worker nodes.


62-63: Wait strategy allows debugging flexibility.

The script uses wait at the end to allow all background processes to complete. This design (combined with the trap/kill cleanup handler) enables debugging by keeping processes running even if one fails, which is operationally useful in cluster environments. This aligns well with the migration goal and best practices for disaggregated serving.


51-51: Verification complete: CLI migration to unified dynamo.vllm interface is correctly implemented.

All three vLLM subcommands (--multimodal-processor, --multimodal-encode-prefill-worker, --multimodal-decode-worker) are properly defined in components/src/dynamo/vllm/args.py, with correct flag mappings (--mm-prompt-template, --is-prefill-worker, --tensor-parallel-size, --max-model-len, --gpu-memory-utilization). The handler routing in main.py correctly dispatches each subcommand to its corresponding handler (ProcessorHandler, EncodeWorkerHandler, MultimodalDecodeWorkerHandler). The script usage at lines 51, 56, and 59 aligns with the implementation.

@ayushag-nv ayushag-nv requested a review from krishung5 November 10, 2025 18:23
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
@ayushag-nv ayushag-nv force-pushed the ayushag/llama4-disagg-mm branch from b28402a to e981930 Compare November 10, 2025 18:24
@ayushag-nv ayushag-nv enabled auto-merge (squash) November 13, 2025 05:16
@ayushag-nv
Copy link
Contributor Author

/ok to test 0aba0ef

@ayushag-nv ayushag-nv merged commit 4adab52 into main Nov 13, 2025
36 of 42 checks passed
@ayushag-nv ayushag-nv deleted the ayushag/llama4-disagg-mm branch November 13, 2025 17:38
daiyaanarfeen pushed a commit that referenced this pull request Nov 14, 2025
tangcy98 pushed a commit to tangcy98/dynamo that referenced this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants