-
Notifications
You must be signed in to change notification settings - Fork 684
feat: Add output token counter to frontend metrics #4202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add output token counter to frontend metrics #4202
Conversation
Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation for better observability. Closes ai-dynamo#4131 Signed-off-by: Aryan Bagade <[email protected]>
|
👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis pull request introduces a new output token counter metric to the frontend to enable real-time tracking of output token throughput. Changes include adding OUTPUT_TOKENS_TOTAL constant definitions across Python and Rust metric modules, implementing an IntCounterVec in the HTTP service metrics, and integrating counter increments into the ResponseMetricCollector. The Python bindings also reorganize kvbm metrics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks✅ Passed checks (5 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2025-09-16T00:26:43.641ZApplied to files:
📚 Learning: 2025-09-16T00:26:43.641ZApplied to files:
📚 Learning: 2025-09-16T00:27:43.992ZApplied to files:
🧬 Code graph analysis (1)lib/llm/src/http/service/metrics.rs (2)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4202/merge) by AryanBagade.lib/bindings/python/src/dynamo/prometheus_names.py[error] 1-1: Black formatting failed. The hook reformatted lib/bindings/python/src/dynamo/prometheus_names.py. Run 'pre-commit run --all-files' again or run 'black' to format the file. ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Aryan Bagade <[email protected]>
Signed-off-by: Aryan Bagade <[email protected]>
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue on my side, thank you for sync'ing the prometheus_names.* using the automated script.
|
/ok to test 9baaa64 |
|
@AryanBagade thank you for the contribution! |
|
Thanks @itay |
Signed-off-by: Aryan Bagade <[email protected]>
Overview:
Adds
dynamo_frontend_output_tokens_totalcounter metric that updates in real-time during token generation, addressing the observability gap where the existing histogram only updates at request completion.Details:
This PR implements a new Counter metric that increments immediately as tokens are generated, providing real-time visibility into output token throughput.
Changes:
OUTPUT_TOKENS_TOTALconstant tolib/runtime/src/metrics/prometheus_names.rsoutput_tokens_counterIntCounterVec field to Metrics struct inlib/llm/src/http/service/metrics.rsResponseMetricCollector::observe_response()lib/bindings/python/src/dynamo/prometheus_names.pyTechnical Details:
num_tokens(chunk size) for each response chunkwith_label_values(&[&self.model])for per-model trackingrequest_countermetricNote: Integration tests will run in CI (macOS linking limitations prevent local execution)
Where should the reviewer start?
lib/llm/src/http/service/metrics.rslines 848-852 - Core counter increment logiclib/llm/src/http/service/metrics.rslines 1210-1355 - Unit testslib/runtime/src/metrics/prometheus_names.rsline 117 - Constant definitionRelated Issues:
Summary by CodeRabbit