Skip to content

Guidellm takes several minutes to create random requests with long prompts #270

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

Describe the bug
It takes several minutes to generate random requests

guidellm benchmark --target=$URL --model=$MODEL --rate-type=concurrent --rate=200 --max-requests=200 --output-path=~/llama-70b.json --processo
r=$MODEL --data='{"prompt_tokens":20000, "output_tokens":5000}'

This takes multiple minutes at:

Creating backend...
Backend openai_http connected to http://10.16.1.185:8000 for model meta-llama/Llama-3.3-70B-Instruct.
Creating request loader...

Expected behavior

  • I would expect this to take a shorter amount of time

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 20.04]:
  2. Python version [e.g. 3.12.2]:

To Reproduce
Exact steps to reproduce the behavior:

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    internalfiled by core contributor or associate

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions