feat: Add output token counter to frontend metrics #4202

AryanBagade · 2025-11-08T09:21:31Z

Overview:

Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation, addressing the observability gap where the existing histogram only updates at request completion.

Details:

This PR implements a new Counter metric that increments immediately as tokens are generated, providing real-time visibility into output token throughput.

Changes:

Added OUTPUT_TOKENS_TOTAL constant to lib/runtime/src/metrics/prometheus_names.rs
Added output_tokens_counter IntCounterVec field to Metrics struct in lib/llm/src/http/service/metrics.rs
Implemented counter increment logic in ResponseMetricCollector::observe_response()
Added 3 comprehensive unit tests covering increment behavior, zero tokens, and multiple models
Regenerated Python bindings in lib/bindings/python/src/dynamo/prometheus_names.py

Technical Details:

Counter increments by num_tokens (chunk size) for each response chunk
Uses with_label_values(&[&self.model]) for per-model tracking
Placed before early return check to ensure all token chunks are counted
Follows same pattern as existing request_counter metric

Note: Integration tests will run in CI (macOS linking limitations prevent local execution)

Where should the reviewer start?

lib/llm/src/http/service/metrics.rs lines 848-852 - Core counter increment logic
lib/llm/src/http/service/metrics.rs lines 1210-1355 - Unit tests
lib/runtime/src/metrics/prometheus_names.rs line 117 - Constant definition

Related Issues:

closes: [FEATURE]: Add output token counter to frontend metrics #4131

Summary by CodeRabbit

New Features
- Added real-time output token tracking metrics for improved monitoring of token generation.
- Introduced new block management metrics to track offload/onload operations and token matching across models.
- Enhanced metric organization for better visibility into model performance and resource utilization.

Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation for better observability. Closes ai-dynamo#4131 Signed-off-by: Aryan Bagade <[email protected]>

copy-pr-bot · 2025-11-08T09:21:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-08T09:21:38Z

👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-11-08T09:26:25Z

Walkthrough

This pull request introduces a new output token counter metric to the frontend to enable real-time tracking of output token throughput. Changes include adding OUTPUT_TOKENS_TOTAL constant definitions across Python and Rust metric modules, implementing an IntCounterVec in the HTTP service metrics, and integrating counter increments into the ResponseMetricCollector. The Python bindings also reorganize kvbm metrics.

Changes

Cohort / File(s)	Summary
Prometheus metric name definitions `lib/runtime/src/metrics/prometheus_names.rs`, `lib/bindings/python/src/dynamo/prometheus_names.py`	Added OUTPUT_TOKENS_TOTAL constant to frontend_service metric group. Python file also replaced kvbm_connector class with kvbm class and introduced new block-related constants (OFFLOAD_BLOCKS_D2H, OFFLOAD_BLOCKS_H2D, OFFLOAD_BLOCKS_D2D, ONBOARD_BLOCKS_H2D, ONBOARD_BLOCKS_D2D, MATCHED_TOKENS).
HTTP service metrics implementation `lib/llm/src/http/service/metrics.rs`	Added output_tokens_counter (IntCounterVec) metric to Metrics struct, integrated into registry initialization, and incremented in observe_response methods. Includes tests for single/multiple models, zero-token handling, and model isolation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

lib/llm/src/http/service/metrics.rs: Review counter integration points and test coverage for increments in ResponseMetricCollector—ensure counter updates align with streaming token delivery patterns
lib/bindings/python/src/dynamo/prometheus_names.py: Verify kvbm_connector→kvbm refactoring maintains backward compatibility and new block constants are correctly scoped

Poem

🐰 A counter springs forth, swift and bright,
Tokens flowing in real-time light,
No more waiting for histograms to land,
Throughput tracked with a gentle hand,
Output streams sing—hooray, hooray! 🎉

Pre-merge checks

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding an output token counter to frontend metrics.
Description check	✅ Passed	The PR description includes all required sections: Overview, Details with comprehensive changes list, reviewer guidance, and Related Issues with the appropriate action keyword.
Linked Issues check	✅ Passed	The implementation fully addresses issue #4131 by adding a Counter metric that updates in real-time during token generation, with per-chunk increments providing granular observability into output token throughput.
Out of Scope Changes check	✅ Passed	All code changes are directly related to the objective of adding an output token counter metric. Python bindings regeneration is a necessary byproduct and not out of scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04f7579 and 807a8c0.

📒 Files selected for processing (3)

lib/bindings/python/src/dynamo/prometheus_names.py (2 hunks)
lib/llm/src/http/service/metrics.rs (7 hunks)
lib/runtime/src/metrics/prometheus_names.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.

📚 Learning: 2025-09-16T00:26:43.641Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.

Applied to files:

lib/runtime/src/metrics/prometheus_names.rs
lib/bindings/python/src/dynamo/prometheus_names.py

📚 Learning: 2025-09-16T00:26:43.641Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The ai-dynamo/dynamo team uses _total as a semantic unit suffix across all metric types (including gauges like INFLIGHT_REQUESTS_TOTAL) for internal consistency, as evidenced by patterns in prometheus_names.rs. This is a deliberate architectural choice to prioritize uniform naming conventions over strict Prometheus conventions that reserve _total only for counters.

Applied to files:

lib/runtime/src/metrics/prometheus_names.rs
lib/bindings/python/src/dynamo/prometheus_names.py

📚 Learning: 2025-09-16T00:27:43.992Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:75-79
Timestamp: 2025-09-16T00:27:43.992Z
Learning: In the ai-dynamo/dynamo codebase, the project uses "_total" suffix for all Prometheus metrics including gauges like inflight_requests, which differs from standard Prometheus conventions. The constant work_handler::INFLIGHT_REQUESTS does not exist - only work_handler::INFLIGHT_REQUESTS_TOTAL exists and should be used for the inflight requests gauge metric.

Applied to files:

lib/bindings/python/src/dynamo/prometheus_names.py

🧬 Code graph analysis (1)

lib/llm/src/http/service/metrics.rs (2)

lib/bindings/python/src/dynamo/prometheus_metrics.pyi (1)

IntCounterVec (147-166)

lib/bindings/python/src/dynamo/prometheus_names.py (1)

frontend_service (37-77)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4202/merge) by AryanBagade.

lib/bindings/python/src/dynamo/prometheus_names.py

[error] 1-1: Black formatting failed. The hook reformatted lib/bindings/python/src/dynamo/prometheus_names.py. Run 'pre-commit run --all-files' again or run 'black' to format the file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: clippy (lib/runtime/examples)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (.)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (3)

lib/runtime/src/metrics/prometheus_names.rs (1)

116-118: OUTPUT_TOKENS_TOTAL constant matches naming policy.

Nice to see the real-time output token counter surfaced here with the canonical _total suffix so the bindings stay aligned across languages. Based on learnings

lib/llm/src/http/service/metrics.rs (2)

848-853: Counter increment sits in the right spot.

Incrementing the IntCounterVec ahead of the zero-token early return keeps the hot-path accurate without disturbing TTFT/ITL logic—good call.

1211-1355: Tests cover the critical scenarios.

Appreciate the focused cases for positive increments, zero-token skips, and per-model isolation—they give confidence the counter behaves under real workloads.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Aryan Bagade <[email protected]>

lib/llm/src/http/service/metrics.rs

keivenchang

No issue on my side, thank you for sync'ing the prometheus_names.* using the automated script.

lib/llm/src/http/service/metrics.rs

lib/bindings/python/src/dynamo/prometheus_names.py

rmccorm4 · 2025-11-11T04:15:50Z

/ok to test 9baaa64

itay · 2025-11-11T05:38:53Z

@AryanBagade thank you for the contribution!

AryanBagade · 2025-11-11T06:01:13Z

Thanks @itay
You mentioned a follow-up PR for worker labels might be good. I'd be happy to work on that if it's a priority.
Also, if there are any other issues or baclog/stale issues that need attention, I'd love to take them on. Feel free to tag me!
wanted to learn and contribute

Signed-off-by: Aryan Bagade <[email protected]>

feat: Add output token counter to frontend metrics

807a8c0

Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation for better observability. Closes ai-dynamo#4131 Signed-off-by: Aryan Bagade <[email protected]>

AryanBagade requested review from a team as code owners November 8, 2025 09:21

pull-request-size bot added the size/L label Nov 8, 2025

github-actions bot added external-contribution Pull request is from an external contributor feat labels Nov 8, 2025

AryanBagade added 2 commits November 8, 2025 01:33

fix: Format Python bindings with black

92a2760

Signed-off-by: Aryan Bagade <[email protected]>

fix: Remove extra blank line in kvrouter class

73d3710

Signed-off-by: Aryan Bagade <[email protected]>

itay reviewed Nov 10, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

Merge branch 'main' into feat/add-output-token-counter

9baaa64

keivenchang approved these changes Nov 11, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

rmccorm4 reviewed Nov 11, 2025

View reviewed changes

lib/bindings/python/src/dynamo/prometheus_names.py Show resolved Hide resolved

AryanBagade requested review from itay and rmccorm4 November 11, 2025 03:32

copy-pr-bot bot temporarily deployed to GITLAB November 11, 2025 04:15 Inactive

rmccorm4 enabled auto-merge (squash) November 11, 2025 04:16

copy-pr-bot bot temporarily deployed to GITLAB November 11, 2025 04:18 Inactive

rmccorm4 merged commit 8f86898 into ai-dynamo:main Nov 11, 2025
35 of 36 checks passed

AryanBagade deleted the feat/add-output-token-counter branch November 11, 2025 06:07

rmccorm4 pushed a commit that referenced this pull request Nov 11, 2025

feat: Add output token counter to frontend metrics (#4202)

5095693

Signed-off-by: Aryan Bagade <[email protected]>

rmccorm4 mentioned this pull request Nov 11, 2025

feat: Add output token counter to frontend metrics #4202 #4231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add output token counter to frontend metrics #4202

feat: Add output token counter to frontend metrics #4202

AryanBagade commented Nov 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 8, 2025

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

coderabbitai bot commented Nov 8, 2025

Uh oh!

Uh oh!

Uh oh!

keivenchang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Nov 11, 2025

Uh oh!

Uh oh!

itay commented Nov 11, 2025

Uh oh!

AryanBagade commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add output token counter to frontend metrics #4202

feat: Add output token counter to frontend metrics #4202

Conversation

AryanBagade commented Nov 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 8, 2025

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

coderabbitai bot commented Nov 8, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Uh oh!

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Nov 11, 2025

Uh oh!

Uh oh!

itay commented Nov 11, 2025

Uh oh!

AryanBagade commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AryanBagade commented Nov 8, 2025 •

edited by coderabbitai bot

Loading