[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates

### Feature request

Hi Dynamo developers! 

We wanted to provide visibility into the near-term roadmap for the Dynamo v0.4.1 and Dynamo v0.5.0 releases. 

We are contributing to make progress on the five major focus areas: 
1. Performance
2. Fault tolerance
3. K8 deployment
4. KV cache management and transfer
5. Scheduling with smart router and planner 


## 📅 Timeline
The target dates for the releases are below: 

| v0.4.1 | v0.5.0 |
| :-------: | :------: | 
| 8/27     | 9/17     | 

## Dynamo v0.4.1. Features

### 1. Performance 
Develop reproducible benchmark script for DS-R1. 

- #2387

### 2. Fault Tolerance & Observability

[High Availability] Support multiple KV routers (+frontends)
- #2324

Request Migration Docs and E2E Tests vLLM
- #2177

bug: metrics collection timed out 
- #2480 
- #2506 

vLLM: Add a "model" label to Component metrics
- #2389

k8s: Add Guide on Deploying Prometheus and Grafana in Dynamo Cloud (K8s) Deployment

fix: Frontend metrics to be renamed from nv_llm_* to dynamo_*
- #2176

Create a new guide on using the MetricsRegistry APIs
- #2159
- #2160

fix: component metric names to be called dynamo_component_*, and labels to not collide with Kubernetes
- #2180 

Base metrics: NATS, drt & component grouped metrics
- #2292

fix: endpoint.rs does not propagate Runtime errors back on the stack
- #2156

Parameterize /health and /live endpoints
- #2230

Refactor: System Server (http_server.rs) is often confused with the Dynamo frontend
- #2318
- #2354

### 3. K8s Deployment 

#### [Grove](https://github.com/NVIDIA/grove)
Grove integration : multi-node support
- #2269
- #2405

Implement workaround to scale components when using Grove
- #2531

#### Inference Gateway

Dynamo integration with API Gateway - EPP customization
- #2345

Processor/router unit needs to not queue in nats and instead return to envoy shim/proxy
- #1787

Update Documentation for Dynamo Inference Gateway
- #2257
- #2260

#### Metrics
Create a reference guide to collecting and viewing dynamo metrics in kubernetes
- #2271

 
### 4. KV Cache Management & Transfer

#### KV Block Manager

Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage

vLLM:  G2- G3 offloading, onboarding, unit tests - functionality  ( <2% of LMCache perf test suites) 
- G2 unit tests
- Connector API implementation
- G3 onboarding
- G3 offloading
- G3 unit tests

### 5. Planning & Routing

#### Planner 
SLA Planner integration support for SGLang (Dense models) 

## Dynamo v0.5.0. Features

### 1. Performance 
Develop reproducible benchmark script for DS-R1. 

Ensure Top 4 popular models are benchmarked for all three backends (SGLang, TRT-LLM, vLLM)
- Target models: Qwen 32B, Llama 70B, GPT-OSS 120B, and DS R1
- Reproducible benchmarking guides using K8

### 2. Fault Tolerance & Observability

Metrics
- Runtime, Frontend, and Engine Metrics

Logging & tracing
- E2E Request Level Tracing

Request handling 
- E2E Request Cancellation

Component availability & recovery 
- All components can fail and be recovered individually without restarting others

Worker availability and recovery
- Engine GPU Health Monitoring 

### 3. K8s Deployment 
Production release - Grove for Dynamo k8s deployments
- Helm charts with Grove + Dynamo operators (Dynamo Cloud platform)

Multi-tenancy
- Frontend needs to support dynamo-NS scoping for backend models, so that deployments can stay isolated

Detecting failure scenarios with DCGM controller

OME Integration
- Dynamo as a Cluster/ServingRuntime for OME

Support Dynamo namespace isolation
- #2394
- #2475
- #2460

### 4. KV Cache Management & Transfer

#### KV Block Manager

Merge LMCache multi-connector path

Modularization: separate repo and artifacts

G4 (remote storage) support

LMCache
- KV events and KV Routing path verification
- e2e NIXL southbound integration verification
- e2e NIXL southbound performance verification

TRT-LLM:  G2 - G3 offloading, onboarding, unit tests
Dynamo KVBM integration
- #2544

### 5. Planning & Routing

#### Router

Separate frontend and Router, so frontend and Router can be scaled independently
- Make API server, processor and router composable 
- Create Frontend container (e.g. dynamo-frontend)

Allow serving multiple routers at the same time, and also allow for warm restarts for Router if one goes down

#### Planner
Test scaling Dynamo Planner 
- #2525

SLA Planner integration with TRT-LLM (Dense models)

### 6. Other
Multi-LoRA enablement

Guided Decoding

If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.

### Describe the problem you're encountering

N/A

### Describe alternatives you've tried

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates #2649

Feature request

📅 Timeline

Dynamo v0.4.1. Features

1. Performance

2. Fault Tolerance & Observability

3. K8s Deployment

Grove

Inference Gateway

Metrics

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Planner

Dynamo v0.5.0. Features

1. Performance

2. Fault Tolerance & Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

6. Other

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates #2649

Description

Feature request

📅 Timeline

Dynamo v0.4.1. Features

1. Performance

2. Fault Tolerance & Observability

3. K8s Deployment

Grove

Inference Gateway

Metrics

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Planner

Dynamo v0.5.0. Features

1. Performance

2. Fault Tolerance & Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

6. Other

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions