Skip to content

Commit 3491bb7

Browse files
elizabethtElizabeth Thomas
andauthored
[Docs] Refactor local development from quick start (#1339)
* [Docs] Refactor local development from quick start Signed-off-by: Elizabeth Thomas <[email protected]> * Update quickstart documentation Signed-off-by: Elizabeth Thomas <[email protected]> --------- Signed-off-by: Elizabeth Thomas <[email protected]> Co-authored-by: Elizabeth Thomas <[email protected]>
1 parent 17f871b commit 3491bb7

File tree

2 files changed

+107
-116
lines changed

2 files changed

+107
-116
lines changed

docs/source/development/development.rst

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,112 @@ If you want to clean up everything and reinstall the latest code
3131
kubectl delete -k config/default
3232
kubectl delete -k config/dependency
3333
34+
Local Development with CPU-only vLLM
35+
------------------------------------
36+
37+
This section explains how to run vLLM in a local Kubernetes cluster using CPU-only environments (e.g., for macOS or Linux dev).
38+
39+
Download model locally
40+
~~~~~~~~~~~~~~~~~~~~~~
41+
42+
Use Hugging Face CLI:
43+
44+
.. code-block:: bash
45+
46+
huggingface-cli download facebook/opt-125m
47+
48+
Start local cluster with kind
49+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50+
51+
Edit ``kind-config.yaml`` to mount your model cache, then:
52+
53+
.. code-block:: bash
54+
55+
kind create cluster --config=./development/vllm/kind-config.yaml
56+
57+
Build and load images
58+
~~~~~~~~~~~~~~~~~~~~~
59+
60+
.. code-block:: bash
61+
62+
make docker-build-all
63+
kind load docker-image aibrix/runtime:nightly
64+
65+
Load CPU environment image
66+
~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+
**For macOS:**
69+
70+
.. code-block:: bash
71+
72+
docker pull aibrix/vllm-cpu-env:macos
73+
kind load docker-image aibrix/vllm-cpu-env:macos
74+
75+
**For Linux:**
76+
77+
.. code-block:: bash
78+
79+
docker pull aibrix/vllm-cpu-env:linux-amd64
80+
kind load docker-image aibrix/vllm-cpu-env:linux-amd64
81+
82+
Deploy vLLM model in kind cluster
83+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84+
85+
**For macOS:**
86+
87+
.. code-block:: bash
88+
89+
kubectl create -k development/vllm/macos
90+
91+
**For Linux:**
92+
93+
.. code-block:: bash
94+
95+
kubectl create -k development/vllm/linux
96+
97+
Access model endpoint
98+
~~~~~~~~~~~~~~~~~~~~~
99+
100+
.. code-block:: bash
101+
102+
kubectl port-forward svc/facebook-opt-125m 8000:8000 &
103+
104+
Query locally:
105+
106+
.. code-block:: bash
107+
108+
curl -v http://localhost:8000/v1/completions \
109+
-H "Content-Type: application/json" \
110+
-H "Authorization: Bearer test-key-1234567890" \
111+
-d '{
112+
"model": "facebook-opt-125m",
113+
"prompt": "Say this is a test",
114+
"temperature": 0.5,
115+
"max_tokens": 512
116+
}'
117+
118+
Practical Notes
119+
~~~~~~~~~~~~~~~
120+
121+
- ``vllm-cpu-env`` is ideal for development and debugging. Inference latency will be high due to CPU-only backend.
122+
- Be sure to mount your Hugging Face model cache directory, or the container will re-download it online.
123+
- Confirm both ``runtime`` and ``env`` images are loaded into kind.
124+
- Use ``kubectl logs`` or ``kubectl exec`` to debug model pod issues.
125+
126+
Debugging Gateway IPs
127+
~~~~~~~~~~~~~~~~~~~~~
128+
129+
.. code-block:: bash
130+
131+
kubectl get svc -n envoy-gateway-system
132+
133+
.. code-block::
134+
135+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
136+
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
137+
138+
Please also follow `debugging guidelines <https://aibrix.readthedocs.io/latest/features/gateway-plugins.html#debugging-guidelines>`_.
139+
34140
For Dev & Testing Local Setup with Monitoring
35141
---------------------------------------------
36142

docs/source/getting_started/quickstart.rst

Lines changed: 1 addition & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -157,119 +157,4 @@ If you meet problems exposing external IPs, feel free to debug with following co
157157
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
158158
envoy-gateway ClusterIP 10.96.166.226 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 10d
159159
160-
Local Development with CPU-only vLLM
161-
------------------------------------
162-
163-
This section explains how to run vLLM in a local Kubernetes cluster using CPU-only environments (e.g., for macOS or Linux dev).
164-
165-
Download model locally
166-
~~~~~~~~~~~~~~~~~~~~~~
167-
168-
Use Hugging Face CLI:
169-
170-
.. code-block:: bash
171-
172-
huggingface-cli download facebook/opt-125m
173-
174-
Start local cluster with kind
175-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
176-
177-
Edit ``kind-config.yaml`` to mount your model cache, then:
178-
179-
.. code-block:: bash
180-
181-
kind create cluster --config=./development/vllm/kind-config.yaml
182-
183-
For Dev & Testing Local Setup with Monitoring
184-
---------------------------------------------
185-
186-
.. code-block:: bash
187-
188-
make dev-install-in-kind
189-
make dev-port-forward
190-
make dev-stop-port-forward
191-
make dev-uninstall-from-kind
192-
193-
194-
Build and load images
195-
~~~~~~~~~~~~~~~~~~~~~
196-
197-
.. code-block:: bash
198-
199-
make docker-build-all
200-
kind load docker-image aibrix/runtime:nightly
201-
202-
Load CPU environment image
203-
~~~~~~~~~~~~~~~~~~~~~~~~~~
204-
205-
**For macOS:**
206-
207-
.. code-block:: bash
208-
209-
docker pull aibrix/vllm-cpu-env:macos
210-
kind load docker-image aibrix/vllm-cpu-env:macos
211-
212-
**For Linux:**
213-
214-
.. code-block:: bash
215-
216-
docker pull aibrix/vllm-cpu-env:linux-amd64
217-
kind load docker-image aibrix/vllm-cpu-env:linux-amd64
218-
219-
Deploy vLLM model in kind cluster
220-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221-
222-
**For macOS:**
223-
224-
.. code-block:: bash
225-
226-
kubectl create -k development/vllm/macos
227-
228-
**For Linux:**
229-
230-
.. code-block:: bash
231-
232-
kubectl create -k development/vllm/linux
233-
234-
Access model endpoint
235-
~~~~~~~~~~~~~~~~~~~~~
236-
237-
.. code-block:: bash
238-
239-
kubectl port-forward svc/facebook-opt-125m 8000:8000 &
240-
241-
Query locally:
242-
243-
.. code-block:: bash
244-
245-
curl -v http://localhost:8000/v1/completions \
246-
-H "Content-Type: application/json" \
247-
-H "Authorization: Bearer test-key-1234567890" \
248-
-d '{
249-
"model": "facebook-opt-125m",
250-
"prompt": "Say this is a test",
251-
"temperature": 0.5,
252-
"max_tokens": 512
253-
}'
254-
255-
Practical Notes
256-
~~~~~~~~~~~~~~~
257-
258-
- ``vllm-cpu-env`` is ideal for development and debugging. Inference latency will be high due to CPU-only backend.
259-
- Be sure to mount your Hugging Face model cache directory, or the container will re-download it online.
260-
- Confirm both ``runtime`` and ``env`` images are loaded into kind.
261-
- Use ``kubectl logs`` or ``kubectl exec`` to debug model pod issues.
262-
263-
Debugging Gateway IPs
264-
~~~~~~~~~~~~~~~~~~~~~
265-
266-
.. code-block:: bash
267-
268-
kubectl get svc -n envoy-gateway-system
269-
270-
.. code-block::
271-
272-
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
273-
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
274-
275-
Please also follow `debugging guidelines <https://aibrix.readthedocs.io/latest/features/gateway-plugins.html#debugging-guidelines>`_.
160+
For advanced development usage, please refer to the :ref:`development` section.

0 commit comments

Comments
 (0)