benchmarking harness #35

biswapanda · 2025-09-03T16:10:26Z

No description provided.

nv-kmcgill53 · 2025-09-03T17:31:53Z

Do you assume a homogeneous k8s environment with your proposal? How would you accommodate a heterogeneous HW cluster?

nv-kmcgill53 · 2025-09-03T17:36:05Z

benchmarking/benchmarking.md

+name: "blueprint-name"
+model: 
+    name: "RedHat/Llama-3.3-70B-Instruct"
+    path: "/path/to/model"


You mention you want a specific version of the model to be used. Can you add that field here to be explicit?

nv-kmcgill53 · 2025-09-03T17:38:52Z

benchmarking/benchmarking.md

+
+    b. wait for the model to be ready
+
+2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool)


I think this is hand-waving a lot of other configurations which need to happen for the particular setup. If the user wants aggregated vs disaggregated, if disaggregated, then how many workers in each? What is the gpu allocation for each of the workers, etc.?

How does this proposal address this, or are you making assumptions about the environment or use case? If so, that's fine, and I would like that explanation here.

nv-kmcgill53 · 2025-09-03T17:41:09Z

benchmarking/benchmarking.md

+
+2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc
+
+3. we'd run benchmarks inside k8s cluster in k8s native approach.


Do we still run into k8s DNS issues when trying to solve the "non-deterministic" problem? I think we have this same issue in slurm where we are not guaranteed the nodes are in the same topology. What are your tolerances for "reproducibility?"

biswapanda force-pushed the bis/bench-1 branch from 7faf582 to 529fd25 Compare September 3, 2025 17:28

wip

1918e88

biswapanda force-pushed the bis/bench-1 branch from 2c2797b to 1918e88 Compare September 3, 2025 17:31

nv-kmcgill53 reviewed Sep 3, 2025

View reviewed changes

wip

37975a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchmarking harness #35

benchmarking harness #35

Uh oh!

biswapanda commented Sep 3, 2025

Uh oh!

nv-kmcgill53 commented Sep 3, 2025

Uh oh!

nv-kmcgill53 Sep 3, 2025

Uh oh!

nv-kmcgill53 Sep 3, 2025

Uh oh!

nv-kmcgill53 Sep 3, 2025

Uh oh!

Uh oh!


		b. wait for the model to be ready

		2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool)


		2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc

		3. we'd run benchmarks inside k8s cluster in k8s native approach.

benchmarking harness #35

Are you sure you want to change the base?

benchmarking harness #35

Uh oh!

Conversation

biswapanda commented Sep 3, 2025

Uh oh!

nv-kmcgill53 commented Sep 3, 2025

Uh oh!

nv-kmcgill53 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

nv-kmcgill53 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

nv-kmcgill53 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!