Skip to content

Conversation

tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Sep 17, 2025

add a single-node engine (TEP8P+DEP8D) and multi-node engine (TEP16P+DEP16D).

closes: DEP-364

Summary by CodeRabbit

  • New Features

    • Added deployment recipes for DeepSeek‑R1 with disaggregated prefill/decode services for 8‑GPU and 16‑GPU setups.
    • Introduced a shared model cache volume to speed up model availability across services.
    • Included health checks for frontend and worker services to improve reliability.
  • Documentation

    • Updated DeepSeek‑R1 deployment status to available (deployment ✓; benchmark pending).

Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

Introduces DeepSeek-R1 disaggregated SGLang deployment manifests (8-way and 16-way TP/DP variants), adds a PersistentVolumeClaim for model cache, and updates the recipes README to mark DeepSeek-R1 as deployable.

Changes

Cohort / File(s) Summary of changes
Docs status update
recipes/README.md
Updated DeepSeek-R1 deployment status from 🚧 to ✓; benchmark remains 🚧.
Model cache storage
recipes/deepseek-r1/model_cache/model-cache.yaml
Added PVC manifest model-cache (RWX, 1000Gi, configurable storageClassName) for shared Hugging Face cache.
SGLang WideEP disagg deployments
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml, recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
Added DynamoGraphDeployment YAMLs defining frontend, prefill, and decode services with disaggregation; GPU topology (8/16 TP and DP), health probes, shared PVC mount at /root/.cache/huggingface, NIXL transfer backend, and bootstrap port 30001.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Frontend
    participant PrefillWorkers as Prefill Workers
    participant DecodeWorkers as Decode Workers
    participant ModelCache as PVC: model-cache

    Note over Frontend,DecodeWorkers: SGLang Disaggregated Inference (WideEP)

    Client->>Frontend: HTTP request (/generate)
    Frontend->>PrefillWorkers: Dispatch prefill (TP N, EP-size N)
    PrefillWorkers->>ModelCache: Read model weights/tokenizer cache
    PrefillWorkers-->>Frontend: Prefill outputs (KV, metadata)
    Frontend->>DecodeWorkers: Stream decode with KV (DP N, optional DP attention)
    DecodeWorkers->>ModelCache: Lazy-load weights as needed
    DecodeWorkers-->>Frontend: Token stream
    Frontend-->>Client: Streamed response

    Note right of PrefillWorkers: disaggregation-mode: prefill<br/>transfer: nixl<br/>bootstrap: 30001
    Note right of DecodeWorkers: disaggregation-mode: decode<br/>mem-fraction-static: 0.8 (16-way variant)
    rect rgba(230,245,255,0.5)
    Note over Frontend: Startup/health probes on 8000
    Note over PrefillWorkers,DecodeWorkers: Startup/health probes on 9090
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

Hop hop hurrah, the nodes align,
Prefill to decode in tidy design.
A cache we share, a PVC so fine,
Tokens stream like sparkling wine.
TP, DP—rabbity rhyme—
Deploy is ✓, right on time! 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description is minimal and does not follow the repository's required template: it only contains a one-line summary and "closes: DEP-364" but lacks the required "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections as well as key review information such as which files were added, configuration highlights, and validation/testing steps. Please update the PR description to use the repository template by adding the "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections; in Details describe the added YAMLs (single-node TEP8P+DEP8D, multi-node TEP16P+DEP16D, and the model-cache PVC), list the primary files to review (e.g., recipes/deepseek-r1/model_cache/model-cache.yaml, recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml, recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml, recipes/README.md), and include basic validation or testing steps and expected behavior. Replace or clarify "closes: DEP-364" to match the repo format (for example "closes GitHub issue: #") or state if DEP-364 is an external tracker.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "chore: add dsr1 k8s yaml" is concise and directly related to the main change (adding Kubernetes YAML for the DeepSeek‑R1/“dsr1” recipe files), so it communicates the primary intent of the changeset; it is short, single-sentence, and appropriate for a chore-level change. The abbreviation "dsr1" is slightly cryptic but likely understandable in-repo; overall the title accurately reflects the added k8s manifests.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (12)
recipes/README.md (1)

9-9: DeepSeek‑R1 marked deployable: confirm docs reflect both single‑node and multi‑node variants.

Row looks good. Consider clarifying the “Mode” to indicate both single‑node and multi‑node disagg variants are supported, and ensure prerequisites mention access to sglang runtime images (not just vLLM).

recipes/deepseek-r1/model_cache/model-cache.yaml (3)

8-13: PVC RWX + 1 TiB: verify storage class supports ReadWriteMany and quota.

Many clusters’ default storage classes don’t support RWX. Please confirm your chosen StorageClass provides RWX and adequate throughput for concurrent decode/prefill mounts. Adjust size if your model/variants exceed 1 TiB.


1-13: Add missing newline at EOF.

Fix YAML lint error.

   storageClassName: "your-storage-class-name"
+

5-7: Optional: set namespace explicitly.

If these manifests aren’t always applied with -n, add metadata.namespace to avoid surprises.

 metadata:
   name: model-cache
+  namespace: ${NAMESPACE}
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (4)

66-66: Bootstrap port likely to collide if both variants run concurrently.

Use distinct ports per deployment to avoid cross‑talk.

-              --disaggregation-bootstrap-port 30001
+              --disaggregation-bootstrap-port 30011

(Apply to both decode and prefill.)

Also applies to: 108-108


16-23: Probe timeouts are excessively high.

timeoutSeconds: 1800 makes each failed probe wait 30 minutes. Prefer shorter timeouts with higher failureThreshold to keep kubelet responsive during long inits.

-            periodSeconds: 10
-            timeoutSeconds: 1800
-            failureThreshold: 60
+            periodSeconds: 10
+            timeoutSeconds: 30
+            failureThreshold: 120

Also applies to: 41-47, 85-91


55-67: HF token handling: confirm secret injection.

README mentions “envFromSecret: hf-token-secret”, but this CR doesn’t reference it. If the operator doesn’t inject it globally, add the secret reference per service.

Also applies to: 99-109


109-109: Add missing newline at EOF.

Fix YAML lint error.

               --mem-fraction-static 0.8
+
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (4)

64-64: Bootstrap port likely to collide if both variants run concurrently.

Use a different port than the 16p deployment to avoid interference.

-              --disaggregation-bootstrap-port 30001
+              --disaggregation-bootstrap-port 30001  # keep 30001 for 8p if 16p uses 30011

Also applies to: 103-103


37-45: Probe timeouts are excessively high.

Shorten timeoutSeconds and raise failureThreshold instead.

-            periodSeconds: 10
-            timeoutSeconds: 1800
-            failureThreshold: 60
+            periodSeconds: 10
+            timeoutSeconds: 30
+            failureThreshold: 120

Also applies to: 79-86, 16-23


52-65: HF token handling: confirm secret injection.

CR doesn’t reference hf-token-secret. If required for model pulls, add or ensure it’s injected by the operator.

Also applies to: 93-103


103-103: Add missing newline at EOF.

Fix YAML lint error.

               --disaggregation-bootstrap-port 30001
+
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86ce03f and 125ab73.

📒 Files selected for processing (4)
  • recipes/README.md (1 hunks)
  • recipes/deepseek-r1/model_cache/model-cache.yaml (1 hunks)
  • recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (1 hunks)
  • recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
  • recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
🪛 YAMLlint (1.37.1)
recipes/deepseek-r1/model_cache/model-cache.yaml

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml

[error] 103-103: no new line character at the end of file

(new-line-at-end-of-file)

recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml

[error] 109-109: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (3)
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (2)

31-33: GPU limits format looks correct for Dynamo CRDs.

Matches repo convention gpu: "8". No changes needed.

If requests are required by your scheduler policy, add matching requests.

Also applies to: 75-77


34-38: PVC mount is good; confirm PVC exists in same namespace.

Ensure the model-cache PVC is created in the namespace where this CR is applied.

Also applies to: 77-83

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (1)

31-36: PVC + SHM config looks reasonable.

Mount path and sharedMemory sizing are consistent. No changes needed.

Also applies to: 72-77

Signed-off-by: hongkuanz <[email protected]>
Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments on docs/ux

LGTM otherwise.

@biswapanda biswapanda assigned biswapanda and tedzhouhk and unassigned biswapanda Sep 18, 2025
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
@biswapanda
Copy link
Contributor

Lgtm

@tedzhouhk tedzhouhk enabled auto-merge (squash) September 18, 2025 16:58
@tedzhouhk tedzhouhk merged commit 0006106 into main Sep 18, 2025
10 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/add-dsr1-recipe branch September 18, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants