Skip to content

Conversation

@Spycsh
Copy link

@Spycsh Spycsh commented Nov 10, 2025

Overview:

Draft on enabling intel gaudi on dynamo. Also fixed issues mentioned in 4208.

Details:

Validate running a vLLM PD disaggregation example in Dynamo on Intel Gaudi. vLLM with NIXLConnector is enabled with the support on vLLM-Gaudi through on-host buffer via UCX.

Here are the steps:

export VLLM_NIXL_DEVICE_TO_DEVICE=false
export VLLM_SKIP_WARMUP=true
NIXL_BUFFER_DEVICE=cpu
VLLM_NIXL_BACKEND=UCX
export no_proxy=localhost,127.0.0.1
export ETCD_ENDPOINTS=http://localhost:2381

	0) frontend, etcd
nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2381/ --advertise-client-urls http://0.0.0.0:2381/ --data-dir /tmp/etcd &
export ETCD_ENDPOINTS=http://localhost:2381


python -m dynamo.frontend &


	1) Prefill
VLLM_NIXL_SIDE_CHANNEL_PORT=5601 HABANA_VISIBLE_DEVICES=0 python3 -m dynamo.vllm   --model Qwen/Qwen3-0.6B   --kv-transfer-config "{\"kv_connector\": \"NixlConnector\", \"kv_role\": \"kv_both\", \"kv_buffer_device\": \"${NIXL_BUFFER_DEVICE}\", \"kv_connector_extra_config\": {\"backends\": [\"${VLLM_NIXL_BACKEND}\"]}}"  --no-enable-prefix-caching --is-prefill-worker

	2) Decode
VLLM_NIXL_SIDE_CHANNEL_PORT=5602 HABANA_VISIBLE_DEVICES=1 python3 -m dynamo.vllm   --model Qwen/Qwen3-0.6B   --kv-transfer-config "{\"kv_connector\": \"NixlConnector\", \"kv_role\": \"kv_both\", \"kv_buffer_device\": \"${NIXL_BUFFER_DEVICE}\", \"kv_connector_extra_config\": {\"backends\": [\"${VLLM_NIXL_BACKEND}\"]}}"  --no-enable-prefix-caching

	3) Test
curl -X POST http://localhost:8000/v1/chat/completions   -H 'Content-Type: application/json'   -H 'x-request-id: 8372eac7-5f43-4d76-beca-0a94cfb311d0'   -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {
        "role": "user",
        "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"
      }
    ],
    "stream": true,
    "max_tokens": 1000
  }'

Where should the reviewer start?

Regarding to 4208, components/src/dynamo/vllm/args.py and components/src/dynamo/vllm/handlers.py should be the fix.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: 4208.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi Spycsh! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Nov 10, 2025
@Spycsh Spycsh changed the title Enable intel gaudi on nv-dynamo feat: Enable intel gaudi on nv-dynamo Nov 10, 2025
@github-actions github-actions bot added the feat label Nov 10, 2025
@Spycsh Spycsh changed the title feat: Enable intel gaudi on nv-dynamo feat: Enable intel gaudi on dynamo Nov 11, 2025
Copy link
Contributor

@rmccorm4 rmccorm4 Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmonty12 @julienmancuso can you help review?

Copy link
Author

@Spycsh Spycsh Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the k8s related path are not validated currently from my side, and I think it would be good and easier to add gaudi-related resource type/plugin after this PR #3548 is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alec-flowers @ziqifan617 can you review vllm changes and make sure there's no issues for either disagg or kvbm connector logic?

@rmccorm4
Copy link
Contributor

Related: #3548

@rmccorm4
Copy link
Contributor

rmccorm4 commented Nov 13, 2025

@Spycsh thanks for contributing this! Can you --signoff your commit to pass the DCO check, and fix the failing pre-commit check as well?

Comment on lines +279 to +280
# if a specific --kv_transfer_config is passed, ignore the --connector handling
if has_connector_flag and not has_kv_transfer_config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it worth keeping ValueError. Without it, we might have a silent fail, which is frustrating from a user's perspective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could you please not change this in this PR. This one: #4317 has a nice fix, which will benefit this PR as well

Copy link
Author

@Spycsh Spycsh Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will revert this fix in this PR after #4317 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants