Skip to content

Conversation

biswapanda
Copy link

No description provided.

@nv-kmcgill53
Copy link

Do you assume a homogeneous k8s environment with your proposal? How would you accommodate a heterogeneous HW cluster?

name: "blueprint-name"
model:
name: "RedHat/Llama-3.3-70B-Instruct"
path: "/path/to/model"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention you want a specific version of the model to be used. Can you add that field here to be explicit?


b. wait for the model to be ready

2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is hand-waving a lot of other configurations which need to happen for the particular setup. If the user wants aggregated vs disaggregated, if disaggregated, then how many workers in each? What is the gpu allocation for each of the workers, etc.?

How does this proposal address this, or are you making assumptions about the environment or use case? If so, that's fine, and I would like that explanation here.


2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc

3. we'd run benchmarks inside k8s cluster in k8s native approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still run into k8s DNS issues when trying to solve the "non-deterministic" problem? I think we have this same issue in slurm where we are not guaranteed the nodes are in the same topology. What are your tolerances for "reproducibility?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants