You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: improve metrics documentation and fix naming
- Fix metric name from model_workers_total to model_workers
- Document model name deduplication behavior in README.md
- Add comments explaining gauge vs counter usage for runtime config metrics
- Clarify that some metrics use gauges because they're synchronized from upstream
Signed-off-by: Keiven Chang <[email protected]>
Copy file name to clipboardExpand all lines: deploy/metrics/README.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,9 +98,11 @@ These metrics come from the Model Deployment Card information provided by worker
98
98
-`dynamo_frontend_model_migration_limit`: Request migration limit for a worker serving the model (gauge)
99
99
100
100
**Worker Management Metrics:**
101
-
-`dynamo_frontend_model_workers_total`: Number of worker instances currently serving the model (gauge)
101
+
-`dynamo_frontend_model_workers`: Number of worker instances currently serving the model (gauge)
102
102
103
-
**Note**: The `dynamo_frontend_inflight_requests_total` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
103
+
**Important Notes:**
104
+
- The `dynamo_frontend_inflight_requests_total` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
105
+
-**Model Name Deduplication**: When multiple worker instances register with the same model name, only the first instance's configuration metrics (runtime config and MDC metrics) will be populated. Subsequent instances with duplicate model names will be skipped for configuration metric updates, though the worker count metric will reflect all instances.
0 commit comments