Skip to content

Conversation

@rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Nov 11, 2025

Overview:

Cherry-pick #4202 for better visibility into output token throughput at any given point in time

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added output tokens counter metric for real-time tracking of total output tokens generated.
    • Enhanced KVBM metrics with additional offload and onboard block tracking.
  • Chores

    • Bumped version to 0.7.0 across all packages and dependencies.
    • Migrated default container registry to NVIDIA's official registry (nvcr.io/nvidia/ai-dynamo) from custom registry.
    • Updated all deployment configurations and examples with new version tags.

@rmccorm4 rmccorm4 requested review from a team as code owners November 11, 2025 06:11
@github-actions github-actions bot added the feat label Nov 11, 2025
@rmccorm4 rmccorm4 changed the base branch from main to release/0.7.0 November 11, 2025 06:11
@rmccorm4 rmccorm4 requested a review from itay November 11, 2025 06:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

Caution

Review failed

Failed to post review comments

Walkthrough

This pull request performs a workspace-wide version bump from 0.6.1 to 0.7.0, updates container image registries from "my-registry" to "nvcr.io/nvidia/ai-dynamo" with tag changes from "my-tag" to "0.7.0", adds output token metrics to the observability layer, and refactors KVBM connector constants.

Changes

Cohort / File(s) Summary
Cargo.toml & Python packaging version updates
Cargo.toml, lib/bindings/python/Cargo.toml, lib/bindings/python/pyproject.toml, lib/kvbm/Cargo.toml, lib/kvbm/pyproject.toml, lib/runtime/examples/Cargo.toml, pyproject.toml
Version bumped from 0.6.1 to 0.7.0 across workspace manifests.
Helm Chart version updates
deploy/cloud/helm/crds/Chart.yaml, deploy/cloud/helm/platform/Chart.yaml, deploy/cloud/helm/platform/components/operator/Chart.yaml, deploy/helm/chart/Chart.yaml
Chart versions and appVersions updated from 0.6.1 to 0.7.0; platform chart dynamo-operator dependency also bumped to 0.7.0.
Earthfile registry configuration
Earthfile, deploy/cloud/operator/Earthfile
DOCKER_SERVER ARG default changed from my-registry to nvcr.io/nvidia/ai-dynamo in three public ARG declarations.
Kubernetes deployment manifests—vLLM backend
benchmarks/incluster/benchmark_job.yaml, examples/backends/vllm/deploy/..., examples/basics/kubernetes/Distributed_Inference/..., examples/multimodal/deploy/agg_llava.yaml, examples/multimodal/deploy/agg_qwen.yaml, recipes/llama-3-70b/vllm/.../..., tests/fault_tolerance/deploy/templates/vllm/..., tests/planner/perf_test_configs/...
Container image references changed from my-registry/vllm-runtime:my-tag or placeholder tags to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.7.0 across Frontend and worker containers.
Kubernetes deployment manifests—TensorRT-LLM backend
examples/backends/trtllm/deploy/..., recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml, recipes/gpt-oss-120b/trtllm/agg/deploy.yaml, recipes/qwen3-32b-fp8/trtllm/.../deploy.yaml
Container image references changed from my-registry/trtllm-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.7.0 or tensorrtllm-runtime:0.7.0.
Kubernetes deployment manifests—SGLang backend
examples/backends/sglang/deploy/..., examples/deployments/GKE/sglang/disagg.yaml, recipes/deepseek-r1/sglang/disagg-.../deploy.yaml
Container image references changed from my-registry/sglang-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.7.0 or sglang-wideep-runtime:0.7.0.
Kubernetes deployment manifests—custom & other backends
examples/custom_backend/hello_world/deploy/hello_world.yaml, examples/deployments/GKE/vllm/disagg.yaml
Container image references changed to use nvcr.io/nvidia/ai-dynamo registry with version 0.7.0.
ECS task definitions
examples/deployments/ECS/task_definition_frontend.json, examples/deployments/ECS/task_definition_prefillworker.json
Container image tag updated from my-tag to 0.7.0.
Pre-deployment & benchmark configuration
benchmarks/profiler/utils/config.py, deploy/cloud/pre-deployment/nixl/README.md, deploy/cloud/pre-deployment/nixl/build_and_deploy.sh, deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml
Container image references changed from my-registry placeholder to nvcr.io/nvidia/ai-dynamo with version 0.7.0; deployment script updated to use new registry prefix pattern.
Docker secrets test
deploy/cloud/operator/internal/secrets/docker_test.go
Test data updated to reference nvcr.io/nvidia/ai-dynamo.com:5005/... instead of my-registry.com:5005/....
Documentation—installation & guides
docs/_includes/install.rst, docs/backends/trtllm/gpt-oss.md, docs/benchmarks/benchmarking.md, docs/kubernetes/deployment/create_deployment.md, docs/reference/support-matrix.md
Installation, deployment, and benchmarking documentation updated with version 0.7.0 tags and nvcr.io/nvidia/ai-dynamo registry; imagePullSecrets and dependency matrix also updated.
Example backend documentation
examples/backends/sglang/deploy/README.md, examples/backends/trtllm/deploy/README.md, examples/backends/vllm/deploy/README.md, examples/basics/kubernetes/Distributed_Inference/README.md
README files updated with new container image references and version 0.7.0.
Python metrics & observability
lib/bindings/python/src/dynamo/prometheus_names.py, lib/runtime/src/metrics/prometheus_names.rs
Added OUTPUT_TOKENS_TOTAL metric constant; renamed kvbm_connector class to kvbm and refactored constants (KVBM_CONNECTOR_LEADER/WORKEROFFLOAD_BLOCKS_*, ONBOARD_BLOCKS_*, MATCHED_TOKENS).
Metrics service implementation
lib/llm/src/http/service/metrics.rs
Added output_tokens_counter (IntCounterVec) metric, initialized and registered in Prometheus; included counter incrementation in ResponseMetricCollector::observe_response; added comprehensive unit tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Primary complexity drivers:

    • High file count (~90 files affected) but largely homogeneous changes (version bumps and image tag replacements reduce per-file cognitive load)
    • Metric additions in lib/llm/src/http/service/metrics.rs and constant refactoring in prometheus_names.py/.rs introduce logic verification needs
    • KVBM class rename and constant reorganization require careful tracking of renamed symbols
  • Areas requiring extra attention:

    • lib/llm/src/http/service/metrics.rs — verify metric initialization, registration, and test coverage for the new output tokens counter
    • lib/bindings/python/src/dynamo/prometheus_names.py — confirm KVBM constant migration is complete and backward compatibility implications (if any)
    • Cross-consistency between Python and Rust metric definitions across prometheus_names.py and prometheus_names.rs
    • Deployment manifests — spot-check a few YAML files to ensure image references are consistently updated without syntax errors

Poem

🐰 From 0.6 to 0.7 we hop,
A version bump that will not stop!
Registries renamed, metrics shine bright,
KVBM refactored—all feels right!
With images tagged and paths so clean,
The finest release we've ever seen!

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is largely incomplete, missing detailed explanation of changes and reviewer guidance despite following the template structure. Complete the 'Details' section explaining the changes made, and fill 'Where should the reviewer start?' with specific files to review (e.g., lib/llm/src/http/service/metrics.rs, lib/bindings/python/src/dynamo/prometheus_names.py).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary change: adding an output token counter metric to frontend monitoring systems.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nv-anants nv-anants merged commit e0b9af1 into release/0.7.0 Nov 12, 2025
39 of 42 checks passed
@nv-anants nv-anants deleted the rmccormick/0.7.0-cp4202 branch November 12, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants