Skip to content

Commit b7d77eb

Browse files
Change wording
Signed-off-by: Timothy Gao <[email protected]>
1 parent 1627f0e commit b7d77eb

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/disaggregated/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ Additionally, we offer a fully executable script—please refer to [Disaggregate
202202

203203
## Mixed Precision Context and Generation
204204

205-
In disaggregated serving, the context (prefill) workers and generation (decode) workers have different performance characteristics: prefill workers are more compute-bound while decode workers are more memory-bound. Therefore, it may be beneficial to run prefill workers in higher precision. Running these workers with different precisions also enables the ability to interpolate between performance/compute trade-offs of different quantization levels.
205+
In disaggregated serving, the context workers and generation workers have different performance characteristics: context workers are compute-bound while generation workers are memory-bound. Therefore, it may be beneficial to run context workers in higher precision. Running these workers with different precisions also enables the ability to interpolate between performance/compute trade-offs of different quantization levels.
206206

207207
### Prerequisites
208208

@@ -211,7 +211,7 @@ To enable mixed precision serving, you'll need:
211211
2. The original unquantized checkpoint
212212
3. Both checkpoints must use the same KV cache dtype to ensure compatibility during transfer
213213

214-
### Example (BF 16 Prefill, FP 8 Decode)
214+
### Example (BF 16 Gen, FP 8 Ctx)
215215

216216
A quantized checkpoint can be created `--kv_cache_qformat none`.
217217

0 commit comments

Comments
 (0)