-
Notifications
You must be signed in to change notification settings - Fork 462
[WIP] Release 0.4 related testing yamls #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7d2ccc7
4b6227b
56465e7
24c0ede
f357b37
7d3fcca
0c81257
aa3ee3d
e990869
3e6d9c4
3432927
2065f3c
7e27762
7a09c2b
6600b29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,15 @@ | ||||||
# Dynamo Installation Instruction | ||||||
|
||||||
|
||||||
We follow the instruction in [dynamo](https://github.com/ai-dynamo/dynamo) to deploy the Dynamo Cloud in Kubernetes. The detailed instrunction can be found in Section 1. `1. Installing Dynamo Cloud from Published Artifacts` from dynamo's [quickstart guide](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_deploy/quickstart.md). We use the most recent release images (version: 0.3.2) published by Dynamo team. | ||||||
|
||||||
|
||||||
### Model Deployment | ||||||
|
||||||
We use sample deployment yamls from the dynamo repo in the v0.3.2 release for PD disaggration testing. https://github.com/ai-dynamo/dynamo/blob/v0.3.2/examples/llm/deploy/agg.yaml and https://github.com/ai-dynamo/dynamo/blob/v0.3.2/examples/llm/deploy/agg-router.yaml. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo in "disaggration". It should be "disaggregation".
Suggested change
|
||||||
|
||||||
|
||||||
> Note: There are some configuration changes in terms of image downloading and model downloading due to the testing environment difference. | ||||||
|
||||||
> 1. We download container image from VKE docker registry aibrix-cn-beijing.cr.volces.com. The images are synced from dockerhub and nvidia ngc. | ||||||
> 2. We download model from VKE object storage, which are synced from Huggingface model hub. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
apiVersion: nvidia.com/v1alpha1 | ||
kind: DynamoGraphDeployment | ||
metadata: | ||
name: llm-disagg | ||
spec: | ||
envs: | ||
- name: DYN_DEPLOYMENT_CONFIG | ||
value: '{"Common":{"model":"models/Qwen3-8B","block-size":64,"max-model-len":16384,"kv-transfer-config":"{\"kv_connector\":\"DynamoNixlConnector\"}"},"Frontend":{"served_model_name":"Qwen3-8B","endpoint":"dynamo.Processor.chat/completions","port":8000},"Processor":{"router":"round-robin","common-configs":["model","block-size"]},"VllmWorker":{"remote-prefill":true,"conditional-disagg":true,"max-local-prefill-length":10,"max-prefill-queue-size":2,"ServiceArgs":{"workers":1,"resources":{"gpu":"1"}},"common-configs":["model","block-size","max-model-len","kv-transfer-config"]},"PrefillWorker":{"max-num-batched-tokens":16384,"ServiceArgs":{"workers":1,"resources":{"gpu":"1"}},"common-configs":["model","block-size","max-model-len","kv-transfer-config"]},"Planner":{"environment":"kubernetes","no-operation":true}}' | ||
services: | ||
Frontend: | ||
dynamoNamespace: llm-disagg | ||
componentType: main | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: "2Gi" | ||
limits: | ||
cpu: "1" | ||
memory: "2Gi" | ||
extraPodSpec: | ||
# nodeSelector: | ||
# machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
args: | ||
- dynamo | ||
- serve | ||
- graphs.disagg:Frontend | ||
- --system-app-port | ||
- "5000" | ||
- --enable-system-app | ||
- --use-default-health-checks | ||
- --service-name | ||
- Frontend | ||
Processor: | ||
dynamoNamespace: llm-disagg | ||
componentType: worker | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: "2Gi" | ||
limits: | ||
cpu: "1" | ||
memory: "2Gi" | ||
extraPodSpec: | ||
# nodeSelector: | ||
# machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Storing secret placeholders like For example, you can define environment variables in your container spec that pull from a secret: env:
- name: YOUR_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: my-tos-secret
key: accessKeyId
- name: YOUR_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: my-tos-secret
key: secretAccessKey This issue is repeated for the
|
||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg:Processor --system-app-port 5000 --enable-system-app --use-default-health-checks --service-name Processor | ||
VllmWorker: | ||
#envFromSecret: hf-token-secret | ||
dynamoNamespace: llm-disagg | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
limits: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
extraPodSpec: | ||
# nodeSelector: | ||
# machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg:VllmWorker --system-app-port 5000 --enable-system-app --use-default-health-checks --service-name VllmWorker | ||
PrefillWorker: | ||
# envFromSecret: hf-token-secret | ||
dynamoNamespace: llm-disagg | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
limits: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
extraPodSpec: | ||
# nodeSelector: | ||
# machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg:PrefillWorker --system-app-port 5000 --enable-system-app --use-default-health-checks --service-name PrefillWorker |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
apiVersion: nvidia.com/v1alpha1 | ||
kind: DynamoGraphDeployment | ||
metadata: | ||
name: disagg-router | ||
spec: | ||
envs: | ||
- name: DYN_DEPLOYMENT_CONFIG | ||
value: '{"Common":{"model":"models/Qwen3-8B","block-size":64,"max-model-len":16384,"router":"kv","kv-transfer-config":"{\"kv_connector\":\"DynamoNixlConnector\"}"},"Frontend":{"served_model_name":"Qwen3-8B","endpoint":"dynamo.Processor.chat/completions","port":8000},"Processor":{"common-configs":["model","block-size","max-model-len","router"]},"Router":{"min-workers":1,"common-configs":["model","block-size","router"]},"VllmWorker":{"max-num-batched-tokens":16384,"remote-prefill":true,"conditional-disagg":true,"max-local-prefill-length":10,"max-prefill-queue-size":2,"tensor-parallel-size":1,"enable-prefix-caching":true,"ServiceArgs":{"workers":1,"resources":{"gpu":"1"}},"common-configs":["model","block-size","max-model-len","router","kv-transfer-config"]},"PrefillWorker":{"max-num-batched-tokens":16384,"ServiceArgs":{"workers":1,"resources":{"gpu":"1"}},"common-configs":["model","block-size","max-model-len","kv-transfer-config"]},"Planner":{"environment":"kubernetes","no-operation":true}}' | ||
services: | ||
Frontend: | ||
dynamoNamespace: llm-disagg-router | ||
componentType: main | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: "2Gi" | ||
limits: | ||
cpu: "1" | ||
memory: "2Gi" | ||
extraPodSpec: | ||
nodeSelector: | ||
machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
args: | ||
- dynamo | ||
- serve | ||
- graphs.disagg_router:Frontend | ||
- --system-app-port | ||
- "5000" | ||
- --enable-system-app | ||
- --use-default-health-checks | ||
- --service-name | ||
- Frontend | ||
Processor: | ||
dynamoNamespace: llm-disagg-router | ||
componentType: worker | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: "2Gi" | ||
limits: | ||
cpu: "1" | ||
memory: "2Gi" | ||
extraPodSpec: | ||
nodeSelector: | ||
node.kubernetes.io/instance-type: ecs.g3il.xlarge | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg_router:Processor --system-app-port "5000" --enable-system-app --use-default-health-checks --service-name Processor | ||
Router: | ||
dynamoNamespace: llm-disagg-router | ||
componentType: worker | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: "2Gi" | ||
limits: | ||
cpu: "1" | ||
memory: "2Gi" | ||
extraPodSpec: | ||
nodeSelector: | ||
machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
# apt update && apt install wget -y | ||
# wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
# chmod +x tosutil | ||
# ./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
# ./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
# echo "model downloaded, start serving" | ||
Comment on lines
+104
to
+110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
dynamo serve graphs.disagg_router:Router --system-app-port "5000" --enable-system-app --use-default-health-checks --service-name Router | ||
VllmWorker: | ||
# envFromSecret: hf-token-secret | ||
dynamoNamespace: llm-disagg-router | ||
replicas: 1 | ||
resources: | ||
requests: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
limits: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
extraPodSpec: | ||
nodeSelector: | ||
machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg_router:VllmWorker --system-app-port 5000 --enable-system-app --use-default-health-checks --service-name VllmWorker | ||
|
||
PrefillWorker: | ||
# envFromSecret: hf-token-secret | ||
dynamoNamespace: llm-disagg-router | ||
replicas: 2 | ||
resources: | ||
requests: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
limits: | ||
cpu: "10" | ||
memory: "20Gi" | ||
gpu: "1" | ||
extraPodSpec: | ||
nodeSelector: | ||
machine.cluster.vke.volcengine.com/gpu-name: NVIDIA-L20 | ||
mainContainer: | ||
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/ai-dynamo/vllm-runtime:0.3.2 | ||
workingDir: /workspace/examples/llm | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
|
||
apt update && apt install wget -y | ||
wget https://tos-tools.tos-cn-beijing.volces.com/linux/amd64/tosutil | ||
chmod +x tosutil | ||
./tosutil config -i <YOUR_ACCESS_KEY_ID> -k <YOUR_SECRET_ACCESS_KEY> -e tos-cn-beijing.ivolces.com -re cn-beijing | ||
./tosutil cp tos://aibrix-artifact-testing/models/Qwen3-8B ./models -r -p 8 -j 32 | ||
|
||
echo "model downloaded, start serving" | ||
|
||
dynamo serve graphs.disagg:PrefillWorker --system-app-port 5000 --enable-system-app --use-default-health-checks --service-name PrefillWorker |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: qwen3-235b-service | ||
namespace: default | ||
spec: | ||
selector: | ||
model.aibrix.ai/name: qwen3-235b | ||
ports: | ||
- protocol: TCP | ||
port: 8000 | ||
targetPort: 8000 | ||
nodePort: 30010 | ||
type: NodePort |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: qwen3-32b-service | ||
namespace: default | ||
spec: | ||
selector: | ||
model.aibrix.ai/name: qwen3-32b | ||
ports: | ||
- protocol: TCP | ||
port: 8000 | ||
targetPort: 8000 | ||
nodePort: 30009 | ||
type: NodePort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in "instrunction". It should be "instruction".