chore: add dsr1 k8s yaml #3101

tedzhouhk · 2025-09-17T23:10:00Z

add a single-node engine (TEP8P+DEP8D) and multi-node engine (TEP16P+DEP16D).

closes: DEP-364

Summary by CodeRabbit

New Features
- Added deployment recipes for DeepSeek‑R1 with disaggregated prefill/decode services for 8‑GPU and 16‑GPU setups.
- Introduced a shared model cache volume to speed up model availability across services.
- Included health checks for frontend and worker services to improve reliability.
Documentation
- Updated DeepSeek‑R1 deployment status to available (deployment ✓; benchmark pending).

Signed-off-by: hongkuanz <[email protected]>

coderabbitai · 2025-09-17T23:13:16Z

Walkthrough

Introduces DeepSeek-R1 disaggregated SGLang deployment manifests (8-way and 16-way TP/DP variants), adds a PersistentVolumeClaim for model cache, and updates the recipes README to mark DeepSeek-R1 as deployable.

Changes

Cohort / File(s)	Summary of changes
Docs status update `recipes/README.md`	Updated DeepSeek-R1 deployment status from 🚧 to ✓; benchmark remains 🚧.
Model cache storage `recipes/deepseek-r1/model_cache/model-cache.yaml`	Added PVC manifest `model-cache` (RWX, 1000Gi, configurable storageClassName) for shared Hugging Face cache.
SGLang WideEP disagg deployments `recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml`, `recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml`	Added DynamoGraphDeployment YAMLs defining frontend, prefill, and decode services with disaggregation; GPU topology (8/16 TP and DP), health probes, shared PVC mount at `/root/.cache/huggingface`, NIXL transfer backend, and bootstrap port 30001.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Frontend
    participant PrefillWorkers as Prefill Workers
    participant DecodeWorkers as Decode Workers
    participant ModelCache as PVC: model-cache

    Note over Frontend,DecodeWorkers: SGLang Disaggregated Inference (WideEP)

    Client->>Frontend: HTTP request (/generate)
    Frontend->>PrefillWorkers: Dispatch prefill (TP N, EP-size N)
    PrefillWorkers->>ModelCache: Read model weights/tokenizer cache
    PrefillWorkers-->>Frontend: Prefill outputs (KV, metadata)
    Frontend->>DecodeWorkers: Stream decode with KV (DP N, optional DP attention)
    DecodeWorkers->>ModelCache: Lazy-load weights as needed
    DecodeWorkers-->>Frontend: Token stream
    Frontend-->>Client: Streamed response

    Note right of PrefillWorkers: disaggregation-mode: prefill<br/>transfer: nixl<br/>bootstrap: 30001
    Note right of DecodeWorkers: disaggregation-mode: decode<br/>mem-fraction-static: 0.8 (16-way variant)
    rect rgba(230,245,255,0.5)
    Note over Frontend: Startup/health probes on 8000
    Note over PrefillWorkers,DecodeWorkers: Startup/health probes on 9090
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: support sglang in sla planner #2421 — Adds/updates SGLang disaggregation deployments for DeepSeek-R1 with similar worker roles and PVC usage.
fix: fix examples #2459 — Adjusts dynamo.sglang worker commands and disaggregation flags akin to the decode/prefill args here.
feat(sglang): deepep sglang support in dynamo #1120 — Introduces SGLang/DeepSeek-R1 deployment artifacts using NIXL transfer and bootstrap coordination.

Poem

Hop hop hurrah, the nodes align,
Prefill to decode in tidy design.
A cache we share, a PVC so fine,
Tokens stream like sparkling wine.
TP, DP—rabbity rhyme—
Deploy is ✓, right on time! 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description is minimal and does not follow the repository's required template: it only contains a one-line summary and "closes: DEP-364" but lacks the required "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections as well as key review information such as which files were added, configuration highlights, and validation/testing steps.	Please update the PR description to use the repository template by adding the "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections; in Details describe the added YAMLs (single-node TEP8P+DEP8D, multi-node TEP16P+DEP16D, and the model-cache PVC), list the primary files to review (e.g., recipes/deepseek-r1/model_cache/model-cache.yaml, recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml, recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml, recipes/README.md), and include basic validation or testing steps and expected behavior. Replace or clarify "closes: DEP-364" to match the repo format (for example "closes GitHub issue: #") or state if DEP-364 is an external tracker.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "chore: add dsr1 k8s yaml" is concise and directly related to the main change (adding Kubernetes YAML for the DeepSeek‑R1/“dsr1” recipe files), so it communicates the primary intent of the changeset; it is short, single-sentence, and appropriate for a chore-level change. The abbreviation "dsr1" is slightly cryptic but likely understandable in-repo; overall the title accurately reflects the added k8s manifests.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (12)

recipes/README.md (1)

9-9: DeepSeek‑R1 marked deployable: confirm docs reflect both single‑node and multi‑node variants.

Row looks good. Consider clarifying the “Mode” to indicate both single‑node and multi‑node disagg variants are supported, and ensure prerequisites mention access to sglang runtime images (not just vLLM).
recipes/deepseek-r1/model_cache/model-cache.yaml (3)
8-13: PVC RWX + 1 TiB: verify storage class supports ReadWriteMany and quota.

Many clusters’ default storage classes don’t support RWX. Please confirm your chosen StorageClass provides RWX and adequate throughput for concurrent decode/prefill mounts. Adjust size if your model/variants exceed 1 TiB.

1-13: Add missing newline at EOF.

Fix YAML lint error.
   storageClassName: "your-storage-class-name"
+
5-7: Optional: set namespace explicitly.

If these manifests aren’t always applied with -n, add metadata.namespace to avoid surprises.
 metadata:
   name: model-cache
+  namespace: ${NAMESPACE}
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (4)
66-66: Bootstrap port likely to collide if both variants run concurrently.

Use distinct ports per deployment to avoid cross‑talk.
-              --disaggregation-bootstrap-port 30001
+              --disaggregation-bootstrap-port 30011
(Apply to both decode and prefill.)

Also applies to: 108-108

16-23: Probe timeouts are excessively high.

timeoutSeconds: 1800 makes each failed probe wait 30 minutes. Prefer shorter timeouts with higher failureThreshold to keep kubelet responsive during long inits.
-            periodSeconds: 10
-            timeoutSeconds: 1800
-            failureThreshold: 60
+            periodSeconds: 10
+            timeoutSeconds: 30
+            failureThreshold: 120
Also applies to: 41-47, 85-91

55-67: HF token handling: confirm secret injection.

README mentions “envFromSecret: hf-token-secret”, but this CR doesn’t reference it. If the operator doesn’t inject it globally, add the secret reference per service.

Also applies to: 99-109

109-109: Add missing newline at EOF.

Fix YAML lint error.
               --mem-fraction-static 0.8
+
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (4)
64-64: Bootstrap port likely to collide if both variants run concurrently.

Use a different port than the 16p deployment to avoid interference.
-              --disaggregation-bootstrap-port 30001
+              --disaggregation-bootstrap-port 30001  # keep 30001 for 8p if 16p uses 30011
Also applies to: 103-103

37-45: Probe timeouts are excessively high.

Shorten timeoutSeconds and raise failureThreshold instead.
-            periodSeconds: 10
-            timeoutSeconds: 1800
-            failureThreshold: 60
+            periodSeconds: 10
+            timeoutSeconds: 30
+            failureThreshold: 120
Also applies to: 79-86, 16-23

52-65: HF token handling: confirm secret injection.

CR doesn’t reference hf-token-secret. If required for model pulls, add or ensure it’s injected by the operator.

Also applies to: 93-103

103-103: Add missing newline at EOF.

Fix YAML lint error.
               --disaggregation-bootstrap-port 30001
+

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86ce03f and 125ab73.

📒 Files selected for processing (4)

recipes/README.md (1 hunks)
recipes/deepseek-r1/model_cache/model-cache.yaml (1 hunks)
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (1 hunks)
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-04T19:03:06.643Z

Learnt from: biswapanda
PR: ai-dynamo/dynamo#2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml

🪛 YAMLlint (1.37.1)

recipes/deepseek-r1/model_cache/model-cache.yaml

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml

[error] 103-103: no new line character at the end of file

(new-line-at-end-of-file)

recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml

[error] 109-109: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: Build and Test - sglang
GitHub Check: Build and Test - vllm
GitHub Check: Mirror Repository to GitLab

🔇 Additional comments (3)

recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (2)

31-33: GPU limits format looks correct for Dynamo CRDs.

Matches repo convention gpu: "8". No changes needed.

If requests are required by your scheduler policy, add matching requests.

Also applies to: 75-77

34-38: PVC mount is good; confirm PVC exists in same namespace.

Ensure the model-cache PVC is created in the namespace where this CR is applied.

Also applies to: 77-83

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (1)

31-36: PVC + SHM config looks reasonable.

Mount path and sharedMemory sizing are consistent. No changes needed.

Also applies to: 72-77

recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml

Signed-off-by: hongkuanz <[email protected]>

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml

biswapanda

minor comments on docs/ux

LGTM otherwise.

Signed-off-by: hongkuanz <[email protected]>

biswapanda · 2025-09-18T16:34:36Z

Lgtm

chore: add dsr1 k8s yaml

2e5ec85

Signed-off-by: hongkuanz <[email protected]>

pull-request-size bot added the size/L label Sep 17, 2025

github-actions bot added the chore label Sep 17, 2025

pc

125ab73

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 23:11 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 23:16 Inactive

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

use exec

8012edc

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 02:12 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 02:13 Inactive

biswapanda reviewed Sep 18, 2025

View reviewed changes

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml Show resolved Hide resolved

biswapanda reviewed Sep 18, 2025

View reviewed changes

recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml Outdated Show resolved Hide resolved

biswapanda reviewed Sep 18, 2025

View reviewed changes

biswapanda assigned biswapanda and tedzhouhk and unassigned biswapanda Sep 18, 2025

update ns

bd725a6

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 16:21 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 18, 2025 16:22 Inactive

add readme

55b3b5a

Signed-off-by: hongkuanz <[email protected]>

tedzhouhk enabled auto-merge (squash) September 18, 2025 16:58

biswapanda approved these changes Sep 18, 2025

View reviewed changes

tedzhouhk merged commit 0006106 into main Sep 18, 2025
10 checks passed

tedzhouhk deleted the hzhou/add-dsr1-recipe branch September 18, 2025 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: add dsr1 k8s yaml #3101

chore: add dsr1 k8s yaml #3101

Uh oh!

tedzhouhk commented Sep 17, 2025 •

edited by biswapanda

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda left a comment

Uh oh!

biswapanda commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

chore: add dsr1 k8s yaml #3101

chore: add dsr1 k8s yaml #3101

Uh oh!

Conversation

tedzhouhk commented Sep 17, 2025 • edited by biswapanda Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda left a comment

Choose a reason for hiding this comment

Uh oh!

biswapanda commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

tedzhouhk commented Sep 17, 2025 •

edited by biswapanda

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading