-
Notifications
You must be signed in to change notification settings - Fork 5
benchmarking harness #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7faf582
to
529fd25
Compare
2c2797b
to
1918e88
Compare
Do you assume a homogeneous k8s environment with your proposal? How would you accommodate a heterogeneous HW cluster? |
name: "blueprint-name" | ||
model: | ||
name: "RedHat/Llama-3.3-70B-Instruct" | ||
path: "/path/to/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention you want a specific version of the model to be used. Can you add that field here to be explicit?
|
||
b. wait for the model to be ready | ||
|
||
2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is hand-waving a lot of other configurations which need to happen for the particular setup. If the user wants aggregated vs disaggregated, if disaggregated, then how many workers in each? What is the gpu allocation for each of the workers, etc.?
How does this proposal address this, or are you making assumptions about the environment or use case? If so, that's fine, and I would like that explanation here.
|
||
2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc | ||
|
||
3. we'd run benchmarks inside k8s cluster in k8s native approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still run into k8s DNS issues when trying to solve the "non-deterministic" problem? I think we have this same issue in slurm where we are not guaranteed the nodes are in the same topology. What are your tolerances for "reproducibility?"
No description provided.