Skip to content

[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates #2649

@harryskim

Description

@harryskim

Feature request

Hi Dynamo developers!

We wanted to provide visibility into the near-term roadmap for the Dynamo v0.4.1 and Dynamo v0.5.0 releases.

We are contributing to make progress on the five major focus areas:

  1. Performance
  2. Fault tolerance
  3. K8 deployment
  4. KV cache management and transfer
  5. Scheduling with smart router and planner

📅 Timeline

The target dates for the releases are below:

v0.4.1 v0.5.0
8/27 9/17

Dynamo v0.4.1. Features

1. Performance

Develop reproducible benchmark script for DS-R1.

2. Fault Tolerance & Observability

[High Availability] Support multiple KV routers (+frontends)

Request Migration Docs and E2E Tests vLLM

bug: metrics collection timed out

vLLM: Add a "model" label to Component metrics

k8s: Add Guide on Deploying Prometheus and Grafana in Dynamo Cloud (K8s) Deployment

fix: Frontend metrics to be renamed from nv_llm_* to dynamo_*

Create a new guide on using the MetricsRegistry APIs

fix: component metric names to be called dynamo_component_*, and labels to not collide with Kubernetes

Base metrics: NATS, drt & component grouped metrics

fix: endpoint.rs does not propagate Runtime errors back on the stack

Parameterize /health and /live endpoints

Refactor: System Server (http_server.rs) is often confused with the Dynamo frontend

3. K8s Deployment

Grove

Grove integration : multi-node support

Implement workaround to scale components when using Grove

Inference Gateway

Dynamo integration with API Gateway - EPP customization

Processor/router unit needs to not queue in nats and instead return to envoy shim/proxy

Update Documentation for Dynamo Inference Gateway

Metrics

Create a reference guide to collecting and viewing dynamo metrics in kubernetes

4. KV Cache Management & Transfer

KV Block Manager

Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage

vLLM: G2- G3 offloading, onboarding, unit tests - functionality ( <2% of LMCache perf test suites)

  • G2 unit tests
  • Connector API implementation
  • G3 onboarding
  • G3 offloading
  • G3 unit tests

5. Planning & Routing

Planner

SLA Planner integration support for SGLang (Dense models)

Dynamo v0.5.0. Features

1. Performance

Develop reproducible benchmark script for DS-R1.

Ensure Top 4 popular models are benchmarked for all three backends (SGLang, TRT-LLM, vLLM)

  • Target models: Qwen 32B, Llama 70B, GPT-OSS 120B, and DS R1
  • Reproducible benchmarking guides using K8

2. Fault Tolerance & Observability

Metrics

  • Runtime, Frontend, and Engine Metrics

Logging & tracing

  • E2E Request Level Tracing

Request handling

  • E2E Request Cancellation

Component availability & recovery

  • All components can fail and be recovered individually without restarting others

Worker availability and recovery

  • Engine GPU Health Monitoring

3. K8s Deployment

Production release - Grove for Dynamo k8s deployments

  • Helm charts with Grove + Dynamo operators (Dynamo Cloud platform)

Multi-tenancy

  • Frontend needs to support dynamo-NS scoping for backend models, so that deployments can stay isolated

Detecting failure scenarios with DCGM controller

OME Integration

  • Dynamo as a Cluster/ServingRuntime for OME

Support Dynamo namespace isolation

4. KV Cache Management & Transfer

KV Block Manager

Merge LMCache multi-connector path

Modularization: separate repo and artifacts

G4 (remote storage) support

LMCache

  • KV events and KV Routing path verification
  • e2e NIXL southbound integration verification
  • e2e NIXL southbound performance verification

TRT-LLM: G2 - G3 offloading, onboarding, unit tests
Dynamo KVBM integration

5. Planning & Routing

Router

Separate frontend and Router, so frontend and Router can be scaled independently

  • Make API server, processor and router composable
  • Create Frontend container (e.g. dynamo-frontend)

Allow serving multiple routers at the same time, and also allow for warm restarts for Router if one goes down

Planner

Test scaling Dynamo Planner

SLA Planner integration with TRT-LLM (Dense models)

6. Other

Multi-LoRA enablement

Guided Decoding

If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.

Describe the problem you're encountering

N/A

Describe alternatives you've tried

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapTracks features, enhancements, or milestones planned as part of the project roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions