Skip to content

Commit 529fd25

Browse files
committed
wip
1 parent 8e27082 commit 529fd25

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed

benchmarking/benchmarking.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Benchmarking Harness
2+
3+
In current state, benchmarking has a few problems:
4+
5+
1. UX: Experimentation, debugging, and iteration is hard.
6+
use case: As a user, I want to easily experiment with different configs, and get results quickly and compare them.
7+
8+
2. Reproducibility is hard: we don't store the input configs and results.
9+
use case: As a user, I want to be able to reproduce my experiments and share them with others.
10+
11+
3. benchmarking steps are tightly coupled. If a sinlge step/benchmark config fails the entire process is aborted/retried.
12+
13+
4. port-forwarding and benchmarking has non-deterministic latency characteristics.
14+
15+
## Proposed plan:
16+
17+
1. decouple all steps and then compose them together: prep a model, deploy k8s cr, benchmark, collect data
18+
19+
2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc
20+
21+
3. we'd run benchmarks inside k8s cluster in k8s native approach.
22+
23+
24+
## Steps:
25+
Following steps are executed by the harness:
26+
27+
Note: These steps are reusable across different tests (LLM benchmarking, Accuracy testing, Functional testing etc)
28+
29+
Since these steps are reusable across different tests, we can swap the container used for each step.
30+
31+
1. Initialize experiment
32+
33+
a. (Optional) deploy model
34+
35+
b. wait for the model to be ready
36+
37+
2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool)
38+
39+
a. Prepare configs (matrix of params: isl/osl, concurrency, etc)
40+
pass as a config file to the harness container
41+
42+
b. Run test for each config
43+
44+
3. Teardown
45+
46+
a. (Optional) Collect artifacts - push files to upstream storage (s3/minio)
47+
48+
b. Collect output results:
49+
Push benchmark metrics to a data storage layer (s3/minio/database) using a cli tool
50+
51+
4. Analytics:
52+
a. Generate charts, graphs, and tables from the benchmark metrics
53+
54+
55+
## Config
56+
57+
Benchmarking config file:
58+
```yaml
59+
name: "blueprint-name"
60+
model:
61+
name: "RedHat/Llama-3.3-70B-Instruct"
62+
path: "/path/to/model"
63+
concurrency: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
64+
endpoint: "/v1/chat/completions"
65+
endpoint_type: "chat"
66+
benchmark:
67+
isl_osl
68+
- [8192, 1024]
69+
- [1024, 1024]
70+
- [1024, 8192]
71+
```
72+
73+
## Alternatives:
74+
75+
### Alternative 1: Benchmarking as a first class citizen in dynamo
76+
77+
```
78+
kind: DynamoBenchmark
79+
metadata:
80+
name: vllm-agg-benchmark
81+
spec:
82+
model:
83+
modelRef: llama-3-70b-instruct-v1
84+
config:
85+
model: "RedHat/Llama-3.3-70B-Instruct"
86+
path: "/path/to/model"
87+
concurrency: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
88+
endpoint: "/v1/chat/completions"
89+
endpoint_type: "chat"
90+
benchmark:
91+
isl_osl
92+
- [8192, 1024]
93+
- [1024, 1024]
94+
- [1024, 8192]
95+
```
96+
97+
### Alternative 2: Benchmarking helm chart + workflow manager
98+
99+
Simpler to manage and deploy.
100+
Reuse Argo workflows for the workflow manager to orchestrate deps and workflow.

0 commit comments

Comments
 (0)