Add DeepSeek-R1 B200 Recipes #85

pavanimajety · 2025-10-09T00:30:52Z

WIP. Will try merging into existing DeepSeek

Signed-off-by: Pavani Majety <[email protected]>

nvpohanh · 2025-10-09T01:07:48Z

DeepSeek/DeepSeekR1-B200.md

@@ -0,0 +1,459 @@
+# DSR1 Status with vLLM: Aggregated Serving on B200
+
+**Overall Health**: Most paths work. DP Attention is failing in combination with Flashinfer MOE Kernels


could we link this to a GitHub issue?

nvpohanh · 2025-10-09T01:08:57Z

DeepSeek/DeepSeekR1-B200.md

+**How to Invoke:**
+
+FlashInfer:
+- Automatic on SM100 (requires flashinfer installed)


What does "Automatic" mean here? it automatically uses FlashInfer gemm on SM100? But this seems to conflict with line 26 which says DeepGemm is the default?

nvpohanh · 2025-10-09T01:09:34Z

DeepSeek/DeepSeekR1-B200.md

+- `VLLM_USE_DEEP_GEMM_E8M0=1` (default)
+
+CUTLASS BlockScale:
+- Automatic fallback (requires CUDA 12.8+ for SM100)


What does "Automatic fallback" mean? Fall back from what to what?

nvpohanh · 2025-10-09T01:10:47Z

DeepSeek/DeepSeekR1-B200.md

+- `VLLM_USE_FLASHINFER_MOE_FP8=1 VLLM_FLASHINFER_MOE_BACKEND=throughput`
+
+CUTLASS BlockScale:
+- Default on SM100 (auto-selected with block quant)


If either FlashInfer TRTLLM-Gen or DeepGemm is the most performant, why do we default to CUTLASS? Should we just use DeepGemm by default?

nvpohanh · 2025-10-09T01:11:05Z

DeepSeek/DeepSeekR1-B200.md

+**Available Backends:**
+
+TP/EP:
+- Flashinfer TRTLLM-Gen


This should be Flashinfer TRTLLM-Gen and CUTLASS, right? (according to the "How to Invoke" section)

oh, maybe CUTLASS does not support this yet.

nvpohanh · 2025-10-09T01:23:48Z

DeepSeek/DeepSeekR1-B200.md

+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+VLLM_USE_STANDALONE_COMPILE=0 \
+VLLM_USE_FLASHINFER_MOE_FP4=1 \
+VLLM_FLASHINFER_MOE_BACKEND="latency" \


if we recommend using latency mode, should we just make it the default?

nvpohanh · 2025-10-09T01:23:55Z

DeepSeek/DeepSeekR1-B200.md

+VLLM_USE_FLASHINFER_MOE_FP4=1 \
+VLLM_FLASHINFER_MOE_BACKEND="latency" \
+  vllm serve nvidia/DeepSeek-R1-FP4 
+    --quantization="modelopt_fp4" \


is this flag needed?

nvpohanh · 2025-10-09T01:24:04Z

DeepSeek/DeepSeekR1-B200.md

+VLLM_FLASHINFER_MOE_BACKEND="latency" \
+  vllm serve nvidia/DeepSeek-R1-FP4 
+    --quantization="modelopt_fp4" \
+    --trust-remote-code \


is this flag needed?

nvpohanh · 2025-10-09T01:24:26Z

DeepSeek/DeepSeekR1-B200.md

+    --quantization="modelopt_fp4" \
+    --trust-remote-code \
+    --max-model-len=2048 \
+    --block-size=128 \


what does this flag do?

nvpohanh · 2025-10-09T01:25:06Z

DeepSeek/DeepSeekR1-B200.md

+    --enable-expert-parallel \
+    --gpu-memory-utilization=0.8 \
+    --tensor-parallel-size=1 \
+    --data-parallel-size=4


Maybe we should also disable prefix caching for perf benchmarking?

pavanimajety added 2 commits October 8, 2025 17:29

Add DeepSeek-R1 B200 Recipes

cd94b64

Make the formatting better

172aeb6

Signed-off-by: Pavani Majety <[email protected]>

nvpohanh reviewed Oct 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DeepSeek-R1 B200 Recipes #85

Add DeepSeek-R1 B200 Recipes #85

Uh oh!

pavanimajety commented Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

nvpohanh Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,459 @@
		# DSR1 Status with vLLM: Aggregated Serving on B200

		Overall Health: Most paths work. DP Attention is failing in combination with Flashinfer MOE Kernels

Add DeepSeek-R1 B200 Recipes #85

Are you sure you want to change the base?

Add DeepSeek-R1 B200 Recipes #85

Uh oh!

Conversation

pavanimajety commented Oct 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants