Commit 83cd248
Minsung-commit
[V1 Engine][Metrics] Add token-level KV cache metrics
This commit adds token-level KV cache metrics to the V1 engine,
enabling more granular monitoring of KV cache utilization beyond
the existing percentage-based metrics.
This PR addresses the V1 metrics initiative mentioned in #14101.
Currently, vLLM V1 engine only provides kv_cache_usage as a float
(0.0-1.0) representing percentage. While useful, this doesn't give
users absolute token counts, which are critical for:
- Capacity Planning: Knowing "65% used" doesn't tell you when you'll run out
- Cost Accounting: Token-based billing requires absolute counts
- Metrics Collection: Prometheus/Grafana dashboards need concrete numbers
- Debugging: Understanding exact cache state during issues
Add three new fields to SchedulerStats dataclass:
- kv_cache_total_tokens: int = 0
- kv_cache_used_tokens: int = 0
- kv_cache_free_tokens: int = 0
Add get_num_total_blocks() method to BlockPool:
- Returns total GPU blocks available for allocation
- Excludes 1 block reserved for system use (-1)
- Matches internal allocation behavior
Add three read-only properties to KVCacheManager:
- total_tokens: Total capacity (num_total_blocks × block_size)
- free_tokens: Available space (num_free_blocks × block_size)
- used_tokens: Occupied space (total_tokens - free_tokens)
Update make_stats() to populate new token metrics:
- kv_cache_total_tokens from kv_cache_manager.total_tokens
- kv_cache_used_tokens from kv_cache_manager.used_tokens
- kv_cache_free_tokens from kv_cache_manager.free_tokens
- Actionable Metrics: "28k tokens left" vs "35% free"
- Prometheus Export: Direct token counts for dashboards
- Cost Attribution: Token-based billing becomes trivial
- Capacity Planning: Know exactly when to scale
- Backward Compatible: Existing code continues to work
- Minimal Overhead: Simple arithmetic, no new allocations
Before (only percentage):
```
kv_cache_usage: 0.65
```
After (percentage + tokens):
```
kv_cache_usage: 0.65
kv_cache_total_tokens: 82448
kv_cache_used_tokens: 53591
kv_cache_free_tokens: 28857
```
Now operators can see: "We have ~29k tokens left before we need to scale"
- All modified files pass Python syntax check (py_compile)
- No breaking changes to existing metrics
- New fields have default values (backward compatible)
- Closes #12283 - Add KV Cache Metrics to Usage Object
- Addresses #26850 - Add new stats metrics for available_kv_cache_memory
- Supersedes #14101 - Frontend KV cache metrics PR
Signed-off-by: dlalstjd931203 <[email protected]>
Signed-off-by: Minsung-commit <[email protected]>1 parent f72a817 commit 83cd248
File tree
4 files changed
+53
-0
lines changed- vllm/v1
- core
- sched
- metrics
4 files changed
+53
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
425 | 437 | | |
426 | 438 | | |
427 | 439 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| 107 | + | |
107 | 108 | | |
108 | 109 | | |
109 | 110 | | |
| |||
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
148 | 183 | | |
149 | 184 | | |
150 | 185 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1352 | 1352 | | |
1353 | 1353 | | |
1354 | 1354 | | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
1355 | 1358 | | |
1356 | 1359 | | |
1357 | 1360 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
165 | 168 | | |
166 | 169 | | |
167 | 170 | | |
| |||
0 commit comments