-
Notifications
You must be signed in to change notification settings - Fork 33
Tekton Orchestrated Benchmarking POC #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
mkdir -p "${MOUNT_PATH}/${MODEL_PATH}"; | ||
python -m pip install huggingface_hub; | ||
hf auth login --token "${HF_TOKEN}"; | ||
hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this skip the model download if it is already present in the path? I believe this is the entire idea of using the PVC to reuse models locally ... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe hf download
detects whether or not the model has been downloaded. Furthermore mkdir -p
accepts folders that already exist, so I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hf download
is intelligent and will skip the work. In the current implementation, the PVC is created locally and used only once. Each parallel experiment is running in a different namespace. In principle, we can define a PVC to an existing PV that has only once copy of the model. If we do this, we probably want to add a task before the parallelism to download the model.
@@ -0,0 +1,40 @@ | |||
apiVersion: v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this chart needed?
Can we simply use the inference-perf container as part of the StepAction, and configure it so that the step actually directly sends the Inference request?
This approach eliminates the chart, along with the need for a separate harness pod (the Tekton step will run as part of some pod anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step has now been updated to use the llm-d-benchmark image directly. It isn't as clean as it might be to use the inference-perf image directly. However, it should also be possible to modify it to be easier to use.
|
||
echo "✅ workload completed" | ||
|
||
- name: upload-results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section needs to be completed.
Let us start by implementing HTTP(S)-based results upload to an S3 compatible storage.
See example here: https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=storage-connectivity-cloud-object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been done; the results folder is tarred and uploaded to and s3 compatible bucket.
mkdir -p "${MOUNT_PATH}/${MODEL_PATH}"; | ||
python -m pip install huggingface_hub; | ||
hf auth login --token "${HF_TOKEN}"; | ||
hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe hf download
detects whether or not the model has been downloaded. Furthermore mkdir -p
accepts folders that already exist, so I think this is good.
@@ -0,0 +1,150 @@ | |||
inferenceExtension: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing the user would be creating these values yaml files for the experiment pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, the pipeline takes the location (url) as input. It can take a stringified values file as well. When sweeping through values, the values file ise overridden using --set
.
An alternative is to use a (yaml) description of the desired environment to generate the values files. This seems to assume we can express it more simply than the values files to today. It is not clear to me that this is the case.
A _matrix_ based `Task` can be unrolled into multiple tasks to reduce the parallelism. | ||
The utility script `utility/transform-pr-parallel.py` does this as follows: | ||
|
||
1. Unroll a single parameter into one `Task` per value. Each resulting Task defines a matrix over the remaining parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the "unrolled" output looks like here.
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
|
||
1. Create a namespace where the Tekton pipeline will execute. | ||
```shell | ||
export $NAMESPACE=your_namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export $NAMESPACE=your_namespace | |
export NAMESPACE=your_namespace |
- the namespace (where the PipelineRun executes) | ||
- s3 details: secret name, bucket name and endpoint URL | ||
|
||
Run by creating the PipelineRun: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears as one-liner after rendering. May need to un-indent.
```shell | ||
kubectl apply -f pipeline/stepactions.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```shell | |
kubectl apply -f pipeline/stepactions.yaml | |
```shell | |
cd tekton-poc | |
kubectl apply -f pipeline/stepactions.yaml |
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and looks good! I was able to execute three experiments in parallel asynchronously, and saw the results uploaded to IBM COS. Lets get this merged!
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | |
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | |
To change the Inference Scheduling configs for the experiment, update `tekton-poc/examples/inference-scheduling/gaie-values.yaml`, then `git push` to your fork, and supply the new URL to `inference-scheduling/` for the `experimentBaseUrl` value in the [pipeline run yaml](./pipeline/pipelinerun-matrix.yaml#L46). |
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
See tekton_poc/README.md