Skip to content

Conversation

kalantar
Copy link
Contributor

@kalantar kalantar commented Oct 8, 2025

See tekton_poc/README.md

Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
@kalantar kalantar marked this pull request as draft October 8, 2025 14:14
mkdir -p "${MOUNT_PATH}/${MODEL_PATH}";
python -m pip install huggingface_hub;
hf auth login --token "${HF_TOKEN}";
hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this skip the model download if it is already present in the path? I believe this is the entire idea of using the PVC to reuse models locally ... right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf download is intelligent and will skip the work. In the current implementation, the PVC is created locally and used only once. Each parallel experiment is running in a different namespace. In principle, we can define a PVC to an existing PV that has only once copy of the model. If we do this, we probably want to add a task before the parallelism to download the model.

@@ -0,0 +1,40 @@
apiVersion: v2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this chart needed?

Can we simply use the inference-perf container as part of the StepAction, and configure it so that the step actually directly sends the Inference request?

This approach eliminates the chart, along with the need for a separate harness pod (the Tekton step will run as part of some pod anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step has now been updated to use the llm-d-benchmark image directly. It isn't as clean as it might be to use the inference-perf image directly. However, it should also be possible to modify it to be easier to use.


echo "✅ workload completed"

- name: upload-results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs to be completed.

Let us start by implementing HTTP(S)-based results upload to an S3 compatible storage.

See example here: https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=storage-connectivity-cloud-object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done; the results folder is tarred and uploaded to and s3 compatible bucket.

mkdir -p "${MOUNT_PATH}/${MODEL_PATH}";
python -m pip install huggingface_hub;
hf auth login --token "${HF_TOKEN}";
hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.

@@ -0,0 +1,150 @@
inferenceExtension:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing the user would be creating these values yaml files for the experiment pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, the pipeline takes the location (url) as input. It can take a stringified values file as well. When sweeping through values, the values file ise overridden using --set.

An alternative is to use a (yaml) description of the desired environment to generate the values files. This seems to assume we can express it more simply than the values files to today. It is not clear to me that this is the case.

A _matrix_ based `Task` can be unrolled into multiple tasks to reduce the parallelism.
The utility script `utility/transform-pr-parallel.py` does this as follows:

1. Unroll a single parameter into one `Task` per value. Each resulting Task defines a matrix over the remaining parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the "unrolled" output looks like here.

Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>

1. Create a namespace where the Tekton pipeline will execute.
```shell
export $NAMESPACE=your_namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export $NAMESPACE=your_namespace
export NAMESPACE=your_namespace

- the namespace (where the PipelineRun executes)
- s3 details: secret name, bucket name and endpoint URL

Run by creating the PipelineRun:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears as one-liner after rendering. May need to un-indent.

Comment on lines +102 to +103
```shell
kubectl apply -f pipeline/stepactions.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```shell
kubectl apply -f pipeline/stepactions.yaml
```shell
cd tekton-poc
kubectl apply -f pipeline/stepactions.yaml

Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Copy link
Collaborator

@jgchn jgchn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and looks good! I was able to execute three experiments in parallel asynchronously, and saw the results uploaded to IBM COS. Lets get this merged!

Comment on lines +28 to +29
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).
This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).
To change the Inference Scheduling configs for the experiment, update `tekton-poc/examples/inference-scheduling/gaie-values.yaml`, then `git push` to your fork, and supply the new URL to `inference-scheduling/` for the `experimentBaseUrl` value in the [pipeline run yaml](./pipeline/pipelinerun-matrix.yaml#L46).

Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Signed-off-by: Michael Kalantar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants