Updating the guides in the doc site

kfswain · kfswain · commit c681ffbfd527 · 2025-09-12T16:04:48.000Z
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -65,7 +65,6 @@ nav:
       - Getting started: guides/index.md
       - Use Cases:
         - Serve Multiple GenAI models: guides/serve-multiple-genai-models.md
-        - Serve Multiple LoRA adapters: guides/serve-multiple-lora-adapters.md
       - Rollout:
         - Adapter Rollout: guides/adapter-rollout.md
         - InferencePool Rollout: guides/inferencepool-rollout.md
diff --git a/site-src/guides/adapter-rollout.md b/site-src/guides/adapter-rollout.md
@@ -49,36 +49,7 @@ data:
 
 The new adapter version is applied to the model servers live, without requiring a restart.
 
-
-### Direct traffic to the new adapter version
-
-Modify the InferenceModel to configure a canary rollout with traffic splitting. In this example, 10% of traffic for food-review model will be sent to the new ***food-review-2*** adapter.
-
-
-```bash
-kubectl edit inferencemodel food-review
-```
-
-Change the targetModels list in InferenceModel to match the following:
-
-
-```yaml
-apiVersion: inference.networking.x-k8s.io/v1alpha2
-kind: InferenceModel
-metadata:
-  name: food-review
-spec:
-  criticality: 1
-  poolRef:
-    name: vllm-llama3-8b-instruct
-  targetModels:
-  - name: food-review-1
-    weight: 90
-  - name: food-review-2
-    weight: 10
-```
-
-The above configuration means one in every ten requests should be sent to the new version. Try it out:
+Try it out:
 
 1. Get the gateway IP:
 ```bash
@@ -88,7 +59,7 @@ IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].va
 2. Send a few requests as follows:
 ```bash
 curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
-"model": "food-review",
+"model": "food-review-2",
 "prompt": "Write as if you were a critic: San Francisco",
 "max_tokens": 100,
 "temperature": 0
@@ -97,23 +68,6 @@ curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
 
 ### Finish the rollout
 
-
-Modify the InferenceModel to direct 100% of the traffic to the latest version of the adapter.
-
-```yaml
-apiVersion: inference.networking.x-k8s.io/v1alpha2
-kind: InferenceModel
-metadata:
-  name: food-review
-spec:
-  criticality: 1
-  poolRef:
-    name: vllm-llama3-8b-instruct
-  targetModels:
-  - name: food-review-2
-    weight: 100
-```
-
 Unload the older versions from the servers by updating the LoRA syncer ConfigMap to list the older version under the `ensureNotExist` list:
 
 ```yaml
diff --git a/site-src/guides/epp-configuration/config-text.md b/site-src/guides/epp-configuration/config-text.md
@@ -1,17 +1,14 @@
-# Configuring Plugins via text
+# Configuring Plugins via YAML
 
 The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
-it is configured. The IGW can be configured in several ways, either by code or via text.
+it is configured. The IGW is primarily configured via a configuration file.
 
-If configured by code either a set of predetermined environment variables must be used or one must
-fork the IGW and change code.
-
-A simpler way to congigure the IGW is to use a text based configuration. This text is in YAML format
-and can either be in a file or specified in-line as a parameter. The configuration defines the set of
+The YAML file can either be specified as a path to a file or in-line as a parameter. The configuration defines the set of
 plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
-the same plugin type to be instantiated multiple times, if needed.
+the same plugin type to be instantiated multiple times, if needed (such as when configuring multiple scheduling profiles).
 
-Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request. If one is not defailed, a default one names `default` will be added and will reference all of the
+Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request.
+If no scheduling profile is specified, a default profile, named `default` will be added and will reference all of the
 instantiated plugins.
 
 The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
@@ -22,12 +19,9 @@ In addition, the set of instantiated plugins can also include a picker, which ch
 the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
 instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
 
-It should be noted that while the configuration text looks like a Kubernetes Custom Resource, it is
-**NOT** a Kubernetes Custom Resource. Kubernetes infrastructure is used to load the configuration
-text and in the future will also help in versioning the text.
-
-It should also be noted that even when the configuration text is loaded from a file, it is loaded at
-the Endpoint-Picker's (EPP) startup and changes to the file at runtime are ignored.
+***NOTE***: While the configuration text looks like a Kubernetes CRD, it is
+**NOT** a Kubernetes CRD. Specifically, the config is not reconciled upon, and is only read on startup.
+This is behavior is intentional, as augmenting the scheduling config without redeploying the EPP is not supported.
 
 The configuration text has the following form:
 ```yaml
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -349,7 +349,7 @@ Tooling:
    The following instructions assume you would like to cleanup ALL resources that were created in this quickstart guide.
    Please be careful not to delete resources you'd like to keep.
 
-   1. Uninstall the InferencePool, InferenceModel, and model server resources
+   1. Uninstall the InferencePool, InferenceObjective and model server resources
 
       ```bash
       helm uninstall vllm-llama3-8b-instruct
diff --git a/site-src/guides/inferencepool-rollout.md b/site-src/guides/inferencepool-rollout.md
diff --git a/site-src/guides/serve-multiple-lora-adapters.md b/site-src/guides/serve-multiple-lora-adapters.md