Iter8-kfserving enables metrics-driven experiments, progressive delivery, and automated rollouts for ML models served over Kubernetes and OpenShift clusters.
The picture below illustrates metrics-driven progressive canary release of a KFServing model using iter8-kfserving.
- Quick start on Minikube
- Installation
- Anatomy of an iter8 experiment
- Progressive canary release experiment
- Describe experiments using iter8ctl
- Iter8 metrics
- Concurrent experiments
- Reference
- Wiki with roadmap and developer documentation
- Contributing
Steps 1 to 7 demonstrate metrics-driven progressive canary release of a KFServing model using iter8-kfserving. This demo uses KFServing v0.5.0-rc2.
Before you begin, you will need Minikube, Kustomize v3, and Go 1.13+.
Step 1: Start Minikube with sufficient resources.
minikube start --cpus 6 --memory 12288 --kubernetes-version=v1.17.11 --driver=dockerStep 2: Install KFServing, kfserving-monitoring, and iter8-kfserving.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/platformsetup.sh | /bin/bashStep 3: In a separate terminal, setup Minikube tunnel. If prompted, enter password.
minikube tunnel --cleanupStep 4: Create a KFServing v1beta1 inferenceservice with a default model. Update it with a canary model. This step may take a couple of minutes.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/inferenceservicesetup.sh | /bin/bashStep 5: In a separate terminal, generate prediction requests for the inferenceservice.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/predictionrequests.sh | /bin/bashStep 6: Create the iter8-kfserving canary experiment.
kubectl apply -f https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/experiment.yamlapiVersion: iter8.tools/v2alpha1
kind: Experiment
metadata:
name: experiment-1
spec:
target: default/my-model
strategy:
type: Canary
criteria:
indicators:
- 95th-percentile-tail-latency
objectives:
- metric: mean-latency
upperLimit: 1000
- metric: error-rate
upperLimit: "0.01"
duration:
intervalSeconds: 15
maxIterations: 12
The above spec asks iter8 to perform a canary release experiment for the inferenceservice named my-model in the default namespace; during the experiment, the default and canary model versions will be assessed every 15 seconds over 12 iterations; when the experiment completes, the canary version will be considered successful (winner) if its mean-latency is within 1000 msec and its error rate is within 1%. If canary is successful, it will be rolled out: i.e., 100% of the traffic will be shifted to it.
Step 7: In a separate terminal, periodically describe the experiment.
Install iter8ctl. You can change the directory where iter8ctl binary is installed by changing GOBIN below.
GO111MODULE=on GOBIN=/usr/local/bin go get github.com/iter8-tools/[email protected]Periodically describe the experiment.
while clear; do
kubectl get experiment experiment-1 -o yaml | iter8ctl describe -f -
sleep 15
done
You should see output similar to the following.
******
Experiment name: experiment-1
Experiment namespace: default
Experiment target: default/my-model
******
Number of completed iterations: 10
******
Winning version: canary
******
Objectives
+--------------------------+---------+--------+
| OBJECTIVE | DEFAULT | CANARY |
+--------------------------+---------+--------+
| mean-latency <= 1000.000 | true | true |
+--------------------------+---------+--------+
| error-rate <= 0.010 | true | true |
+--------------------------+---------+--------+
******
Metrics
+--------------------------------+---------+---------+
| METRIC | DEFAULT | CANARY |
+--------------------------------+---------+---------+
| request-count | 132.294 | 73.254 |
+--------------------------------+---------+---------+
| 95th-percentile-tail-latency | 298.582 | 294.597 |
| (milliseconds) | | |
+--------------------------------+---------+---------+
| mean-latency (milliseconds) | 229.529 | 230.090 |
+--------------------------------+---------+---------+
| error-rate | 0.000 | 0.000 |
+--------------------------------+---------+---------+The experiment should complete after 12 iterations (~3 mins). Once the experiment completes, inspect the InferenceService object.
kubectl get isvc/my-modelYou should see 100% of the traffic shifted to the canary model, similar to the below output.
# output of the above command should be similar to the below
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
my-model http://my-model.default.example.com True 100 my-model-predictor-default-zwjbq 5m
