You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`return_log_probs`|[1]|`bool`| When `true`, include log probs in the output |
247
+
|`return_log_probs`|[1]|`bool`| When `true`, include log probs in the output. Note: This requires at least one sampling parameter to be set (e.g., `runtime_top_k`, `runtime_top_p` for `tensorrt_llm` model, or `top_k`, `top_p` for `tensorrt_llm_bls` model).|
248
248
|`return_context_logits`|[1]|`bool`| When `true`, include context logits in the output |
249
249
|`return_generation_logits`|[1]|`bool`| When `true`, include generation logits in the output |
250
250
|`num_return_sequences`|[1]|`int32_t`| Number of generated sequences per request. (Default=1) |
@@ -272,7 +272,7 @@ Note: the timing metrics oputputs are represented as the number of nanoseconds s
|`cum_log_probs`|[-1]|`float`| Cumulative probabilities for each output |
275
-
|`output_log_probs`|[beam_width, -1]|`float`|Log probabilities for each output |
275
+
|`output_log_probs`|[beam_width, -1]|`float`|Per-token log probabilities for each output. Only returned when `return_log_probs` is `true` and sampling parameters are set.|
276
276
|`context_logits`|[-1, vocab_size]|`float`| Context logits for input |
277
277
|`generation_logits`|[beam_width, seq_len, vocab_size]|`float`| Generation logits for each output |
0 commit comments