Commit 3800caf
Minsung-commit
[V1 Engine][Metrics] Add token-level KV cache metrics
This commit adds token-level KV cache metrics to the V1 engine,
enabling more granular monitoring of KV cache utilization beyond
the existing percentage-based metrics.
This PR addresses the V1 metrics initiative mentioned in #14101.
Currently, vLLM V1 engine only provides kv_cache_usage as a float
(0.0-1.0) representing percentage. While useful, this doesn't give
users absolute token counts, which are critical for:
- Capacity Planning: Knowing "65% used" doesn't tell you when you'll run out
- Cost Accounting: Token-based billing requires absolute counts
- Metrics Collection: Prometheus/Grafana dashboards need concrete numbers
- Debugging: Understanding exact cache state during issues
Add three new fields to SchedulerStats dataclass:
- kv_cache_total_tokens: int = 0
- kv_cache_used_tokens: int = 0
- kv_cache_free_tokens: int = 0
Add get_num_total_blocks() method to BlockPool:
- Returns total GPU blocks available for allocation
- Excludes 1 block reserved for system use (-1)
- Matches internal allocation behavior
Add three read-only properties to KVCacheManager:
- total_tokens: Total capacity (num_total_blocks × block_size)
- free_tokens: Available space (num_free_blocks × block_size)
- used_tokens: Occupied space (total_tokens - free_tokens)
Update make_stats() to populate new token metrics:
- kv_cache_total_tokens from kv_cache_manager.total_tokens
- kv_cache_used_tokens from kv_cache_manager.used_tokens
- kv_cache_free_tokens from kv_cache_manager.free_tokens
- Actionable Metrics: "28k tokens left" vs "35% free"
- Prometheus Export: Direct token counts for dashboards
- Cost Attribution: Token-based billing becomes trivial
- Capacity Planning: Know exactly when to scale
- Backward Compatible: Existing code continues to work
- Minimal Overhead: Simple arithmetic, no new allocations
Before (only percentage):
```
kv_cache_usage: 0.65
```
After (percentage + tokens):
```
kv_cache_usage: 0.65
kv_cache_total_tokens: 82448
kv_cache_used_tokens: 53591
kv_cache_free_tokens: 28857
```
Now operators can see: "We have ~29k tokens left before we need to scale"
- All modified files pass Python syntax check (py_compile)
- No breaking changes to existing metrics
- New fields have default values (backward compatible)
- Closes #12283 - Add KV Cache Metrics to Usage Object
- Addresses #26850 - Add new stats metrics for available_kv_cache_memory
- Supersedes #14101 - Frontend KV cache metrics PR
Signed-off-by: dlalstjd931203 <[email protected]>
Signed-off-by: Minsung-commit <[email protected]>1 parent 6fc5841 commit 3800caf
File tree
4 files changed
+53
-0
lines changed- vllm/v1
- core
- sched
- metrics
4 files changed
+53
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
440 | 440 | | |
441 | 441 | | |
442 | 442 | | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
443 | 455 | | |
444 | 456 | | |
445 | 457 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
| |||
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
152 | 187 | | |
153 | 188 | | |
154 | 189 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1439 | 1439 | | |
1440 | 1440 | | |
1441 | 1441 | | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
1442 | 1445 | | |
1443 | 1446 | | |
1444 | 1447 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
174 | 177 | | |
175 | 178 | | |
176 | 179 | | |
| |||
0 commit comments