-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Hi,
I successfully quantized the Llama3.1-8B-Instruct model, and the quantization process itself completed without issues. However, when eval with openllm, the estimated evaluation time is over 670 hours, which is impractical.
Here’s a snippet of the evaluation log:
Running generate_until requests: 0%| | 0/14042 [00:00<?, ?it/s]
Passed argument batch_size = auto. Detecting largest batch size
Determined Largest batch size: 1
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Running generate_until requests: 0%| | 9/14042 [26:03<667:31:34, 171.25s/it]I’m running this on an RTX 4090, which seems should be more than capable for this task. Any way to speed this up?
Metadata
Metadata
Assignees
Labels
No labels