-
Notifications
You must be signed in to change notification settings - Fork 612
chore: add dsr1 k8s yaml #3101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add dsr1 k8s yaml #3101
Conversation
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
WalkthroughIntroduces DeepSeek-R1 disaggregated SGLang deployment manifests (8-way and 16-way TP/DP variants), adds a PersistentVolumeClaim for model cache, and updates the recipes README to mark DeepSeek-R1 as deployable. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Frontend
participant PrefillWorkers as Prefill Workers
participant DecodeWorkers as Decode Workers
participant ModelCache as PVC: model-cache
Note over Frontend,DecodeWorkers: SGLang Disaggregated Inference (WideEP)
Client->>Frontend: HTTP request (/generate)
Frontend->>PrefillWorkers: Dispatch prefill (TP N, EP-size N)
PrefillWorkers->>ModelCache: Read model weights/tokenizer cache
PrefillWorkers-->>Frontend: Prefill outputs (KV, metadata)
Frontend->>DecodeWorkers: Stream decode with KV (DP N, optional DP attention)
DecodeWorkers->>ModelCache: Lazy-load weights as needed
DecodeWorkers-->>Frontend: Token stream
Frontend-->>Client: Streamed response
Note right of PrefillWorkers: disaggregation-mode: prefill<br/>transfer: nixl<br/>bootstrap: 30001
Note right of DecodeWorkers: disaggregation-mode: decode<br/>mem-fraction-static: 0.8 (16-way variant)
rect rgba(230,245,255,0.5)
Note over Frontend: Startup/health probes on 8000
Note over PrefillWorkers,DecodeWorkers: Startup/health probes on 9090
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (12)
recipes/README.md (1)
9-9
: DeepSeek‑R1 marked deployable: confirm docs reflect both single‑node and multi‑node variants.Row looks good. Consider clarifying the “Mode” to indicate both single‑node and multi‑node disagg variants are supported, and ensure prerequisites mention access to sglang runtime images (not just vLLM).
recipes/deepseek-r1/model_cache/model-cache.yaml (3)
8-13
: PVC RWX + 1 TiB: verify storage class supports ReadWriteMany and quota.Many clusters’ default storage classes don’t support RWX. Please confirm your chosen StorageClass provides RWX and adequate throughput for concurrent decode/prefill mounts. Adjust size if your model/variants exceed 1 TiB.
1-13
: Add missing newline at EOF.Fix YAML lint error.
storageClassName: "your-storage-class-name" +
5-7
: Optional: set namespace explicitly.If these manifests aren’t always applied with -n, add metadata.namespace to avoid surprises.
metadata: name: model-cache + namespace: ${NAMESPACE}
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (4)
66-66
: Bootstrap port likely to collide if both variants run concurrently.Use distinct ports per deployment to avoid cross‑talk.
- --disaggregation-bootstrap-port 30001 + --disaggregation-bootstrap-port 30011(Apply to both decode and prefill.)
Also applies to: 108-108
16-23
: Probe timeouts are excessively high.timeoutSeconds: 1800 makes each failed probe wait 30 minutes. Prefer shorter timeouts with higher failureThreshold to keep kubelet responsive during long inits.
- periodSeconds: 10 - timeoutSeconds: 1800 - failureThreshold: 60 + periodSeconds: 10 + timeoutSeconds: 30 + failureThreshold: 120Also applies to: 41-47, 85-91
55-67
: HF token handling: confirm secret injection.README mentions “envFromSecret: hf-token-secret”, but this CR doesn’t reference it. If the operator doesn’t inject it globally, add the secret reference per service.
Also applies to: 99-109
109-109
: Add missing newline at EOF.Fix YAML lint error.
--mem-fraction-static 0.8 +
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (4)
64-64
: Bootstrap port likely to collide if both variants run concurrently.Use a different port than the 16p deployment to avoid interference.
- --disaggregation-bootstrap-port 30001 + --disaggregation-bootstrap-port 30001 # keep 30001 for 8p if 16p uses 30011Also applies to: 103-103
37-45
: Probe timeouts are excessively high.Shorten timeoutSeconds and raise failureThreshold instead.
- periodSeconds: 10 - timeoutSeconds: 1800 - failureThreshold: 60 + periodSeconds: 10 + timeoutSeconds: 30 + failureThreshold: 120Also applies to: 79-86, 16-23
52-65
: HF token handling: confirm secret injection.CR doesn’t reference hf-token-secret. If required for model pulls, add or ensure it’s injected by the operator.
Also applies to: 93-103
103-103
: Add missing newline at EOF.Fix YAML lint error.
--disaggregation-bootstrap-port 30001 +
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
recipes/README.md
(1 hunks)recipes/deepseek-r1/model_cache/model-cache.yaml
(1 hunks)recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
(1 hunks)recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
🪛 YAMLlint (1.37.1)
recipes/deepseek-r1/model_cache/model-cache.yaml
[error] 13-13: no new line character at the end of file
(new-line-at-end-of-file)
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
[error] 103-103: no new line character at the end of file
(new-line-at-end-of-file)
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
[error] 109-109: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Build and Test - sglang
- GitHub Check: Build and Test - vllm
- GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (3)
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml (2)
31-33
: GPU limits format looks correct for Dynamo CRDs.Matches repo convention gpu: "8". No changes needed.
If requests are required by your scheduler policy, add matching requests.
Also applies to: 75-77
34-38
: PVC mount is good; confirm PVC exists in same namespace.Ensure the model-cache PVC is created in the namespace where this CR is applied.
Also applies to: 77-83
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml (1)
31-36
: PVC + SHM config looks reasonable.Mount path and sharedMemory sizing are consistent. No changes needed.
Also applies to: 72-77
Signed-off-by: hongkuanz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments on docs/ux
LGTM otherwise.
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Lgtm |
add a single-node engine (TEP8P+DEP8D) and multi-node engine (TEP16P+DEP16D).
closes: DEP-364
Summary by CodeRabbit
New Features
Documentation