Skip to content

Conversation

@AryanBagade
Copy link
Contributor

@AryanBagade AryanBagade commented Nov 8, 2025

Overview:

Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation, addressing the observability gap where the existing histogram only updates at request completion.

Details:

This PR implements a new Counter metric that increments immediately as tokens are generated, providing real-time visibility into output token throughput.

Changes:

  • Added OUTPUT_TOKENS_TOTAL constant to lib/runtime/src/metrics/prometheus_names.rs
  • Added output_tokens_counter IntCounterVec field to Metrics struct in lib/llm/src/http/service/metrics.rs
  • Implemented counter increment logic in ResponseMetricCollector::observe_response()
  • Added 3 comprehensive unit tests covering increment behavior, zero tokens, and multiple models
  • Regenerated Python bindings in lib/bindings/python/src/dynamo/prometheus_names.py

Technical Details:

  • Counter increments by num_tokens (chunk size) for each response chunk
  • Uses with_label_values(&[&self.model]) for per-model tracking
  • Placed before early return check to ensure all token chunks are counted
  • Follows same pattern as existing request_counter metric

Note: Integration tests will run in CI (macOS linking limitations prevent local execution)

Where should the reviewer start?

  1. lib/llm/src/http/service/metrics.rs lines 848-852 - Core counter increment logic
  2. lib/llm/src/http/service/metrics.rs lines 1210-1355 - Unit tests
  3. lib/runtime/src/metrics/prometheus_names.rs line 117 - Constant definition

Related Issues:

Summary by CodeRabbit

  • New Features
    • Added real-time output token tracking metrics for improved monitoring of token generation.
    • Introduced new block management metrics to track offload/onload operations and token matching across models.
    • Enhanced metric organization for better visibility into model performance and resource utilization.

Adds dynamo_frontend_output_tokens_total counter metric that updates in real-time during token generation for better observability.   Closes ai-dynamo#4131

Signed-off-by: Aryan Bagade <[email protected]>
@AryanBagade AryanBagade requested review from a team as code owners November 8, 2025 09:21
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

github-actions bot commented Nov 8, 2025

👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Nov 8, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 8, 2025

Walkthrough

This pull request introduces a new output token counter metric to the frontend to enable real-time tracking of output token throughput. Changes include adding OUTPUT_TOKENS_TOTAL constant definitions across Python and Rust metric modules, implementing an IntCounterVec in the HTTP service metrics, and integrating counter increments into the ResponseMetricCollector. The Python bindings also reorganize kvbm metrics.

Changes

Cohort / File(s) Summary
Prometheus metric name definitions
lib/runtime/src/metrics/prometheus_names.rs, lib/bindings/python/src/dynamo/prometheus_names.py
Added OUTPUT_TOKENS_TOTAL constant to frontend_service metric group. Python file also replaced kvbm_connector class with kvbm class and introduced new block-related constants (OFFLOAD_BLOCKS_D2H, OFFLOAD_BLOCKS_H2D, OFFLOAD_BLOCKS_D2D, ONBOARD_BLOCKS_H2D, ONBOARD_BLOCKS_D2D, MATCHED_TOKENS).
HTTP service metrics implementation
lib/llm/src/http/service/metrics.rs
Added output_tokens_counter (IntCounterVec) metric to Metrics struct, integrated into registry initialization, and incremented in observe_response methods. Includes tests for single/multiple models, zero-token handling, and model isolation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • lib/llm/src/http/service/metrics.rs: Review counter integration points and test coverage for increments in ResponseMetricCollector—ensure counter updates align with streaming token delivery patterns
  • lib/bindings/python/src/dynamo/prometheus_names.py: Verify kvbm_connector→kvbm refactoring maintains backward compatibility and new block constants are correctly scoped

Poem

🐰 A counter springs forth, swift and bright,
Tokens flowing in real-time light,
No more waiting for histograms to land,
Throughput tracked with a gentle hand,
Output streams sing—hooray, hooray! 🎉

Pre-merge checks

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an output token counter to frontend metrics.
Description check ✅ Passed The PR description includes all required sections: Overview, Details with comprehensive changes list, reviewer guidance, and Related Issues with the appropriate action keyword.
Linked Issues check ✅ Passed The implementation fully addresses issue #4131 by adding a Counter metric that updates in real-time during token generation, with per-chunk increments providing granular observability into output token throughput.
Out of Scope Changes check ✅ Passed All code changes are directly related to the objective of adding an output token counter metric. Python bindings regeneration is a necessary byproduct and not out of scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04f7579 and 807a8c0.

📒 Files selected for processing (3)
  • lib/bindings/python/src/dynamo/prometheus_names.py (2 hunks)
  • lib/llm/src/http/service/metrics.rs (7 hunks)
  • lib/runtime/src/metrics/prometheus_names.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.
📚 Learning: 2025-09-16T00:26:43.641Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.

Applied to files:

  • lib/runtime/src/metrics/prometheus_names.rs
  • lib/bindings/python/src/dynamo/prometheus_names.py
📚 Learning: 2025-09-16T00:26:43.641Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The ai-dynamo/dynamo team uses _total as a semantic unit suffix across all metric types (including gauges like INFLIGHT_REQUESTS_TOTAL) for internal consistency, as evidenced by patterns in prometheus_names.rs. This is a deliberate architectural choice to prioritize uniform naming conventions over strict Prometheus conventions that reserve _total only for counters.

Applied to files:

  • lib/runtime/src/metrics/prometheus_names.rs
  • lib/bindings/python/src/dynamo/prometheus_names.py
📚 Learning: 2025-09-16T00:27:43.992Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:75-79
Timestamp: 2025-09-16T00:27:43.992Z
Learning: In the ai-dynamo/dynamo codebase, the project uses "_total" suffix for all Prometheus metrics including gauges like inflight_requests, which differs from standard Prometheus conventions. The constant work_handler::INFLIGHT_REQUESTS does not exist - only work_handler::INFLIGHT_REQUESTS_TOTAL exists and should be used for the inflight requests gauge metric.

Applied to files:

  • lib/bindings/python/src/dynamo/prometheus_names.py
🧬 Code graph analysis (1)
lib/llm/src/http/service/metrics.rs (2)
lib/bindings/python/src/dynamo/prometheus_metrics.pyi (1)
  • IntCounterVec (147-166)
lib/bindings/python/src/dynamo/prometheus_names.py (1)
  • frontend_service (37-77)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4202/merge) by AryanBagade.
lib/bindings/python/src/dynamo/prometheus_names.py

[error] 1-1: Black formatting failed. The hook reformatted lib/bindings/python/src/dynamo/prometheus_names.py. Run 'pre-commit run --all-files' again or run 'black' to format the file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: clippy (lib/runtime/examples)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (.)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
lib/runtime/src/metrics/prometheus_names.rs (1)

116-118: OUTPUT_TOKENS_TOTAL constant matches naming policy.

Nice to see the real-time output token counter surfaced here with the canonical _total suffix so the bindings stay aligned across languages. Based on learnings

lib/llm/src/http/service/metrics.rs (2)

848-853: Counter increment sits in the right spot.

Incrementing the IntCounterVec ahead of the zero-token early return keeps the hot-path accurate without disturbing TTFT/ITL logic—good call.


1211-1355: Tests cover the critical scenarios.

Appreciate the focused cases for positive increments, zero-token skips, and per-model isolation—they give confidence the counter behaves under real workloads.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issue on my side, thank you for sync'ing the prometheus_names.* using the automated script.

@AryanBagade AryanBagade requested review from itay and rmccorm4 November 11, 2025 03:32
@rmccorm4
Copy link
Contributor

/ok to test 9baaa64

@rmccorm4 rmccorm4 enabled auto-merge (squash) November 11, 2025 04:16
@rmccorm4 rmccorm4 merged commit 8f86898 into ai-dynamo:main Nov 11, 2025
35 of 36 checks passed
@itay
Copy link

itay commented Nov 11, 2025

@AryanBagade thank you for the contribution!

@AryanBagade
Copy link
Contributor Author

Thanks @itay
You mentioned a follow-up PR for worker labels might be good. I'd be happy to work on that if it's a priority.
Also, if there are any other issues or baclog/stale issues that need attention, I'd love to take them on. Feel free to tag me!
wanted to learn and contribute

@AryanBagade AryanBagade deleted the feat/add-output-token-counter branch November 11, 2025 06:07
rmccorm4 pushed a commit that referenced this pull request Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add output token counter to frontend metrics

4 participants