Skip to content

Commit c681ffb

Browse files
committed
Updating the guides in the doc site
1 parent a7738f6 commit c681ffb

File tree

6 files changed

+16
-439
lines changed

6 files changed

+16
-439
lines changed

mkdocs.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ nav:
6565
- Getting started: guides/index.md
6666
- Use Cases:
6767
- Serve Multiple GenAI models: guides/serve-multiple-genai-models.md
68-
- Serve Multiple LoRA adapters: guides/serve-multiple-lora-adapters.md
6968
- Rollout:
7069
- Adapter Rollout: guides/adapter-rollout.md
7170
- InferencePool Rollout: guides/inferencepool-rollout.md

site-src/guides/adapter-rollout.md

Lines changed: 2 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -49,36 +49,7 @@ data:
4949
5050
The new adapter version is applied to the model servers live, without requiring a restart.
5151
52-
53-
### Direct traffic to the new adapter version
54-
55-
Modify the InferenceModel to configure a canary rollout with traffic splitting. In this example, 10% of traffic for food-review model will be sent to the new ***food-review-2*** adapter.
56-
57-
58-
```bash
59-
kubectl edit inferencemodel food-review
60-
```
61-
62-
Change the targetModels list in InferenceModel to match the following:
63-
64-
65-
```yaml
66-
apiVersion: inference.networking.x-k8s.io/v1alpha2
67-
kind: InferenceModel
68-
metadata:
69-
name: food-review
70-
spec:
71-
criticality: 1
72-
poolRef:
73-
name: vllm-llama3-8b-instruct
74-
targetModels:
75-
- name: food-review-1
76-
weight: 90
77-
- name: food-review-2
78-
weight: 10
79-
```
80-
81-
The above configuration means one in every ten requests should be sent to the new version. Try it out:
52+
Try it out:
8253
8354
1. Get the gateway IP:
8455
```bash
@@ -88,7 +59,7 @@ IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].va
8859
2. Send a few requests as follows:
8960
```bash
9061
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
91-
"model": "food-review",
62+
"model": "food-review-2",
9263
"prompt": "Write as if you were a critic: San Francisco",
9364
"max_tokens": 100,
9465
"temperature": 0
@@ -97,23 +68,6 @@ curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
9768

9869
### Finish the rollout
9970

100-
101-
Modify the InferenceModel to direct 100% of the traffic to the latest version of the adapter.
102-
103-
```yaml
104-
apiVersion: inference.networking.x-k8s.io/v1alpha2
105-
kind: InferenceModel
106-
metadata:
107-
name: food-review
108-
spec:
109-
criticality: 1
110-
poolRef:
111-
name: vllm-llama3-8b-instruct
112-
targetModels:
113-
- name: food-review-2
114-
weight: 100
115-
```
116-
11771
Unload the older versions from the servers by updating the LoRA syncer ConfigMap to list the older version under the `ensureNotExist` list:
11872

11973
```yaml

site-src/guides/epp-configuration/config-text.md

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,14 @@
1-
# Configuring Plugins via text
1+
# Configuring Plugins via YAML
22

33
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
4-
it is configured. The IGW can be configured in several ways, either by code or via text.
4+
it is configured. The IGW is primarily configured via a configuration file.
55

6-
If configured by code either a set of predetermined environment variables must be used or one must
7-
fork the IGW and change code.
8-
9-
A simpler way to congigure the IGW is to use a text based configuration. This text is in YAML format
10-
and can either be in a file or specified in-line as a parameter. The configuration defines the set of
6+
The YAML file can either be specified as a path to a file or in-line as a parameter. The configuration defines the set of
117
plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
12-
the same plugin type to be instantiated multiple times, if needed.
8+
the same plugin type to be instantiated multiple times, if needed (such as when configuring multiple scheduling profiles).
139

14-
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request. If one is not defailed, a default one names `default` will be added and will reference all of the
10+
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request.
11+
If no scheduling profile is specified, a default profile, named `default` will be added and will reference all of the
1512
instantiated plugins.
1613

1714
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
@@ -22,12 +19,9 @@ In addition, the set of instantiated plugins can also include a picker, which ch
2219
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
2320
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
2421

25-
It should be noted that while the configuration text looks like a Kubernetes Custom Resource, it is
26-
**NOT** a Kubernetes Custom Resource. Kubernetes infrastructure is used to load the configuration
27-
text and in the future will also help in versioning the text.
28-
29-
It should also be noted that even when the configuration text is loaded from a file, it is loaded at
30-
the Endpoint-Picker's (EPP) startup and changes to the file at runtime are ignored.
22+
***NOTE***: While the configuration text looks like a Kubernetes CRD, it is
23+
**NOT** a Kubernetes CRD. Specifically, the config is not reconciled upon, and is only read on startup.
24+
This is behavior is intentional, as augmenting the scheduling config without redeploying the EPP is not supported.
3125

3226
The configuration text has the following form:
3327
```yaml

site-src/guides/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -349,7 +349,7 @@ Tooling:
349349
The following instructions assume you would like to cleanup ALL resources that were created in this quickstart guide.
350350
Please be careful not to delete resources you'd like to keep.
351351

352-
1. Uninstall the InferencePool, InferenceModel, and model server resources
352+
1. Uninstall the InferencePool, InferenceObjective and model server resources
353353

354354
```bash
355355
helm uninstall vllm-llama3-8b-instruct

0 commit comments

Comments
 (0)