Skip to content

Commit 4052b26

Browse files
committed
add link to Frank's trtllm-bench blog post to perf_overview.md
Signed-off-by: zpatel <[email protected]>
1 parent 1209001 commit 4052b26

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

docs/source/performance/perf-overview.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ Tuning batch sizes, parallelism configurations, and other options may lead to im
1212

1313
For DeepSeek R1 performance, please check out our [performance guide](../blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md)
1414

15+
For more information on benchmarking with `trtllm-bench` see this NVIDIA [blog post](https://developer.nvidia.com/blog/llm-inference-benchmarking-performance-tuning-with-tensorrt-llm/).
16+
1517
## Throughput Measurements
1618

1719
The below table shows performance data where a local inference client is fed requests at an infinite rate (no delay between messages),
@@ -216,7 +218,7 @@ a model name (HuggingFace reference or path to a local model), a [generated data
216218
trtllm-bench --model $model_name throughput --dataset $dataset_file --backend pytorch --extra_llm_api_options $llm_options
217219
```
218220

219-
The data collected for the v0.20 benchmarks was run with the following file:
221+
The data collected for the v0.21 benchmarks was run with the following file:
220222

221223
`llm_options.yml`
222224
```yaml

0 commit comments

Comments
 (0)