@@ -157,119 +157,4 @@ If you meet problems exposing external IPs, feel free to debug with following co
157
157
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
158
158
envoy-gateway ClusterIP 10.96.166.226 < none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 10d
159
159
160
- Local Development with CPU-only vLLM
161
- ------------------------------------
162
-
163
- This section explains how to run vLLM in a local Kubernetes cluster using CPU-only environments (e.g., for macOS or Linux dev).
164
-
165
- Download model locally
166
- ~~~~~~~~~~~~~~~~~~~~~~
167
-
168
- Use Hugging Face CLI:
169
-
170
- .. code-block :: bash
171
-
172
- huggingface-cli download facebook/opt-125m
173
-
174
- Start local cluster with kind
175
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
176
-
177
- Edit ``kind-config.yaml `` to mount your model cache, then:
178
-
179
- .. code-block :: bash
180
-
181
- kind create cluster --config=./development/vllm/kind-config.yaml
182
-
183
- For Dev & Testing Local Setup with Monitoring
184
- ---------------------------------------------
185
-
186
- .. code-block :: bash
187
-
188
- make dev-install-in-kind
189
- make dev-port-forward
190
- make dev-stop-port-forward
191
- make dev-uninstall-from-kind
192
-
193
-
194
- Build and load images
195
- ~~~~~~~~~~~~~~~~~~~~~
196
-
197
- .. code-block :: bash
198
-
199
- make docker-build-all
200
- kind load docker-image aibrix/runtime:nightly
201
-
202
- Load CPU environment image
203
- ~~~~~~~~~~~~~~~~~~~~~~~~~~
204
-
205
- **For macOS: **
206
-
207
- .. code-block :: bash
208
-
209
- docker pull aibrix/vllm-cpu-env:macos
210
- kind load docker-image aibrix/vllm-cpu-env:macos
211
-
212
- **For Linux: **
213
-
214
- .. code-block :: bash
215
-
216
- docker pull aibrix/vllm-cpu-env:linux-amd64
217
- kind load docker-image aibrix/vllm-cpu-env:linux-amd64
218
-
219
- Deploy vLLM model in kind cluster
220
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
-
222
- **For macOS: **
223
-
224
- .. code-block :: bash
225
-
226
- kubectl create -k development/vllm/macos
227
-
228
- **For Linux: **
229
-
230
- .. code-block :: bash
231
-
232
- kubectl create -k development/vllm/linux
233
-
234
- Access model endpoint
235
- ~~~~~~~~~~~~~~~~~~~~~
236
-
237
- .. code-block :: bash
238
-
239
- kubectl port-forward svc/facebook-opt-125m 8000:8000 &
240
-
241
- Query locally:
242
-
243
- .. code-block :: bash
244
-
245
- curl -v http://localhost:8000/v1/completions \
246
- -H " Content-Type: application/json" \
247
- -H " Authorization: Bearer test-key-1234567890" \
248
- -d ' {
249
- "model": "facebook-opt-125m",
250
- "prompt": "Say this is a test",
251
- "temperature": 0.5,
252
- "max_tokens": 512
253
- }'
254
-
255
- Practical Notes
256
- ~~~~~~~~~~~~~~~
257
-
258
- - ``vllm-cpu-env `` is ideal for development and debugging. Inference latency will be high due to CPU-only backend.
259
- - Be sure to mount your Hugging Face model cache directory, or the container will re-download it online.
260
- - Confirm both ``runtime `` and ``env `` images are loaded into kind.
261
- - Use ``kubectl logs `` or ``kubectl exec `` to debug model pod issues.
262
-
263
- Debugging Gateway IPs
264
- ~~~~~~~~~~~~~~~~~~~~~
265
-
266
- .. code-block :: bash
267
-
268
- kubectl get svc -n envoy-gateway-system
269
-
270
- .. code-block ::
271
-
272
- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
273
- envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
274
-
275
- Please also follow `debugging guidelines <https://aibrix.readthedocs.io/latest/features/gateway-plugins.html#debugging-guidelines >`_.
160
+ For advanced development usage, please refer to the :ref: `development ` section.
0 commit comments