Skip to content

Commit 9904495

Browse files
authored
Merge branch 'main' into ning/release-benchmarking
Signed-off-by: Ning <[email protected]>
2 parents 5953bee + 66f2899 commit 9904495

File tree

44 files changed

+2605
-82
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2605
-82
lines changed

.github/workflows/release-build.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,10 @@ jobs:
140140
id-token: write
141141
name: publish
142142
steps:
143+
- name: Free Disk Space
144+
uses: jlumbroso/free-disk-space@main
145+
with:
146+
tool-cache: false
143147
- name: Check out source repository
144148
uses: actions/checkout@v4
145149
- name: Set up Python environment ${{ matrix.python-version }}
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# vLLM Remote Tokenizer Feature
2+
3+
This feature enables model-aware remote tokenizer support for vLLM inference engines in AIBrix gateway.
4+
5+
## Quick Start
6+
7+
Enable vLLM remote tokenizer with one command:
8+
9+
```bash
10+
kubectl apply -k config/features/vllm-remote-tokenizer/
11+
```
12+
13+
## Configuration
14+
15+
The following environment variables are configured:
16+
17+
| Variable | Default | Description |
18+
|----------|---------|-------------|
19+
| AIBRIX_ENABLE_VLLM_REMOTE_TOKENIZER | false | Enable remote tokenizer feature |
20+
| AIBRIX_VLLM_TOKENIZER_ENDPOINT_TEMPLATE | http://%s:8000 | URL template for vLLM endpoints |
21+
| AIBRIX_TOKENIZER_HEALTH_CHECK_PERIOD | 30s | Health check interval |
22+
| AIBRIX_TOKENIZER_TTL | 5m | Tokenizer cache TTL |
23+
| AIBRIX_MAX_TOKENIZERS_PER_POOL | 100 | Maximum tokenizers per pool |
24+
| AIBRIX_TOKENIZER_REQUEST_TIMEOUT | 10s | Request timeout |
25+
26+
## Customization
27+
28+
To use custom values, copy this directory and modify `gateway-plugins-env-patch.yaml`:
29+
30+
```bash
31+
cp -r config/features/vllm-remote-tokenizer/ config/features/my-vllm-config/
32+
# Edit config/features/my-vllm-config/gateway-plugins-env-patch.yaml
33+
kubectl apply -k config/features/my-vllm-config/
34+
```
35+
36+
## Enable the Feature
37+
38+
To enable vLLM remote tokenizer after installation:
39+
40+
```bash
41+
kubectl set env deployment/gateway-plugins -n aibrix-system AIBRIX_ENABLE_VLLM_REMOTE_TOKENIZER=true
42+
```
43+
44+
Or use a custom Kustomization overlay with the environment variable set to `true`.
45+
46+
## Disable
47+
48+
To disable, set the environment variable to false:
49+
50+
```bash
51+
kubectl set env deployment/gateway-plugins -n aibrix-system AIBRIX_ENABLE_VLLM_REMOTE_TOKENIZER=false
52+
```
53+
54+
## Verification
55+
56+
Check if enabled:
57+
58+
```bash
59+
kubectl get deployment gateway-plugins -n aibrix-system -o json | \
60+
jq '.spec.template.spec.containers[0].env[] | select(.name | startswith("AIBRIX_ENABLE_VLLM"))'
61+
```
62+
63+
Check metrics:
64+
65+
```bash
66+
kubectl port-forward -n aibrix-system svc/gateway-plugins 8080:8080
67+
curl http://localhost:8080/metrics | grep aibrix_tokenizer_pool
68+
```
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: gateway-plugins
5+
namespace: system
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- name: gateway-plugin
11+
env:
12+
- name: AIBRIX_ENABLE_VLLM_REMOTE_TOKENIZER
13+
value: "false"
14+
- name: AIBRIX_VLLM_TOKENIZER_ENDPOINT_TEMPLATE
15+
value: "http://%s:8000"
16+
- name: AIBRIX_TOKENIZER_HEALTH_CHECK_PERIOD
17+
value: "30s"
18+
- name: AIBRIX_TOKENIZER_TTL
19+
value: "5m"
20+
- name: AIBRIX_MAX_TOKENIZERS_PER_POOL
21+
value: "100"
22+
- name: AIBRIX_TOKENIZER_REQUEST_TIMEOUT
23+
value: "10s"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
namespace: aibrix-system
5+
6+
# This overlay enables vLLM remote tokenizer support
7+
# Apply with: kubectl apply -k config/features/vllm-remote-tokenizer/
8+
9+
resources:
10+
- ../../gateway/gateway-plugin
11+
12+
patches:
13+
- path: gateway-plugins-env-patch.yaml
14+
target:
15+
kind: Deployment
16+
name: gateway-plugins

config/overlays/release/kustomization.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ images:
2020
- name: redis
2121
newTag: latest
2222
- name: aibrix/gateway-plugins
23-
newTag: v0.3.0
23+
newTag: v0.4.0.rc.1
2424
- name: aibrix/metadata-service
25-
newTag: v0.3.0
25+
newTag: v0.4.0.rc.1
2626
- name: aibrix/controller-manager
27-
newTag: v0.3.0
27+
newTag: v0.4.0.rc.1
2828
- name: aibrix/runtime
29-
newTag: v0.3.0
29+
newTag: v0.4.0.rc.1

config/overlays/vke/default/kustomization.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ images:
2020
newTag: latest
2121
- name: aibrix/gateway-plugins
2222
newName: aibrix-cn-beijing.cr.volces.com/aibrix/gateway-plugins
23-
newTag: v0.3.0
23+
newTag: v0.4.0.rc.1
2424
- name: aibrix/metadata-service
2525
newName: aibrix-cn-beijing.cr.volces.com/aibrix/metadata-service
26-
newTag: v0.3.0
26+
newTag: v0.4.0.rc.1
2727
- name: aibrix/controller-manager
2828
newName: aibrix-cn-beijing.cr.volces.com/aibrix/controller-manager
29-
newTag: v0.3.0
29+
newTag: v0.4.0.rc.1
3030
- name: aibrix/runtime
3131
newName: aibrix-cn-beijing.cr.volces.com/aibrix/runtime
32-
newTag: v0.3.0
32+
newTag: v0.4.0.rc.1

config/standalone/autoscaler-controller/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ namePrefix: aibrix-autoscaling-
2020
images:
2121
- name: controller
2222
newName: aibrix/controller-manager
23-
newTag: nightly
23+
newTag: v0.4.0.rc.1
2424

2525
patches:
2626
- path: patch.yaml

config/standalone/distributed-inference-controller/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ namePrefix: aibrix-orchestration-
2121
images:
2222
- name: controller
2323
newName: aibrix/controller-manager
24-
newTag: v0.3.0
24+
newTag: v0.4.0.rc.1
2525
- name: quay.io/kuberay/operator
2626
newName: aibrix/kuberay-operator
2727
newTag: v1.2.1-patch-20250726

config/standalone/kv-cache-controller/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ namePrefix: aibrix-kv-cache-
2020
images:
2121
- name: controller
2222
newName: aibrix/controller-manager
23-
newTag: nightly
23+
newTag: v0.4.0.rc.1
2424

2525
patches:
2626
- path: patch.yaml

config/standalone/model-adapter-controller/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ namePrefix: aibrix-lora-
2323
images:
2424
- name: controller
2525
newName: aibrix/controller-manager
26-
newTag: nightly
26+
newTag: v0.4.0.rc.1
2727

2828
patches:
2929
- path: patch.yaml

0 commit comments

Comments
 (0)