Skip to content

Commit ed6514d

Browse files
committed
Update guide to add steps to deploy healthcheck policy for gke
1 parent d77ad92 commit ed6514d

File tree

1 file changed

+58
-18
lines changed

1 file changed

+58
-18
lines changed

site-src/guides/index.md

Lines changed: 58 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ A cluster with:
1919
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
2020
to run the model server deployment.
2121

22+
Tooling:
23+
- [Helm](https://helm.sh/docs/intro/install/) installed
24+
2225
## **Steps**
2326

2427
### Deploy Sample Model Server
@@ -80,6 +83,58 @@ A cluster with:
8083
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
8184
```
8285

86+
### Deploy the InferencePool and Endpoint Picker Extension
87+
88+
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port 8000. The Helm install command automatically installs the endpoint-picker, inferencepool along with provider specific resources.
89+
90+
### Deploy the InferencePool and Endpoint Picker Extension
91+
92+
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port 8000. The Helm install command automatically installs the endpoint-picker, inferencepool along with provider specific resources.
93+
94+
=== "GKE"
95+
96+
```bash
97+
export GATEWAY_PROVIDER=gke
98+
helm install vllm-llama3-8b-instruct \
99+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
100+
--set provider.name=$GATEWAY_PROVIDER \
101+
--version v0.5.1 \
102+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
103+
```
104+
105+
=== "Istio"
106+
107+
```bash
108+
export GATEWAY_PROVIDER=none
109+
helm install vllm-llama3-8b-instruct \
110+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
111+
--set provider.name=$GATEWAY_PROVIDER \
112+
--version v0.5.1 \
113+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
114+
```
115+
116+
=== "Kgateway"
117+
118+
```bash
119+
export GATEWAY_PROVIDER=none
120+
helm install vllm-llama3-8b-instruct \
121+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
122+
--set provider.name=$GATEWAY_PROVIDER \
123+
--version v0.5.1 \
124+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
125+
```
126+
127+
=== "Agentgateway"
128+
129+
```bash
130+
export GATEWAY_PROVIDER=none
131+
helm install vllm-llama3-8b-instruct \
132+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
133+
--set provider.name=$GATEWAY_PROVIDER \
134+
--version v0.5.1 \
135+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
136+
```
137+
83138
### Deploy an Inference Gateway
84139

85140
Choose one of the following options to deploy an Inference Gateway.
@@ -268,22 +323,6 @@ A cluster with:
268323
kubectl get httproute llm-route -o yaml
269324
```
270325

271-
272-
### Deploy the InferencePool and Endpoint Picker Extension
273-
274-
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
275-
276-
```bash
277-
export GATEWAY_PROVIDER=none # See [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md#configuration) for valid configurations
278-
helm install vllm-llama3-8b-instruct \
279-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
280-
--set provider.name=$GATEWAY_PROVIDER \
281-
--version v0.5.1 \
282-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
283-
```
284-
285-
The Helm install automatically installs the endpoint-picker, inferencepool along with provider specific resources.
286-
287326
### Deploy InferenceObjective (Optional)
288327

289328
Deploy the sample InferenceObjective which allows you to specify priority of requests.
@@ -302,7 +341,7 @@ A cluster with:
302341
PORT=80
303342

304343
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
305-
"model": "food-review-1",
344+
"model": "food-review",
306345
"prompt": "Write as if you were a critic: San Francisco",
307346
"max_tokens": 100,
308347
"temperature": 0
@@ -317,10 +356,11 @@ A cluster with:
317356
1. Uninstall the InferencePool, InferenceModel, and model server resources
318357

319358
```bash
320-
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool-resources.yaml --ignore-not-found
359+
helm uninstall vllm-llama3-8b-instruct
321360
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml --ignore-not-found
322361
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
323362
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
363+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/sim-deployment.yaml --ignore-not-found
324364
kubectl delete secret hf-token --ignore-not-found
325365
```
326366

0 commit comments

Comments
 (0)