-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Labels
internalfiled by core contributor or associatefiled by core contributor or associate
Milestone
Description
Describe the bug
It takes several minutes to generate random requests
guidellm benchmark --target=$URL --model=$MODEL --rate-type=concurrent --rate=200 --max-requests=200 --output-path=~/llama-70b.json --processo
r=$MODEL --data='{"prompt_tokens":20000, "output_tokens":5000}'
This takes multiple minutes at:
Creating backend...
Backend openai_http connected to http://10.16.1.185:8000 for model meta-llama/Llama-3.3-70B-Instruct.
Creating request loader...
Expected behavior
- I would expect this to take a shorter amount of time
Environment
Include all relevant environment information:
- OS [e.g. Ubuntu 20.04]:
- Python version [e.g. 3.12.2]:
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
ivanbaldo
Metadata
Metadata
Assignees
Labels
internalfiled by core contributor or associatefiled by core contributor or associate