-
Notifications
You must be signed in to change notification settings - Fork 603
feat: Sglang canary health check #3103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bb5411b
to
e01016c
Compare
WalkthroughIntroduces a backend-specific health check payload class for sglang and wires it into the sglang backend flow. The payload is instantiated and converted to a dict, then passed to serve_endpoint during model registration and prefill initialization. No other control, readiness, or error-handling logic is changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Caller
participant SgMain as sglang.main
participant HC as SglangHealthCheckPayload
participant Serve as serve_endpoint
rect rgba(220,235,245,0.5)
Note over SgMain: register_model()
Caller->>SgMain: register_model(...)
SgMain->>HC: new()
HC-->>SgMain: to_dict() -> health_check_payload
SgMain->>Serve: serve_endpoint(..., health_check_payload)
Serve-->>SgMain: ready/not ready
SgMain-->>Caller: result
end
sequenceDiagram
autonumber
participant Caller as Caller
participant SgMain as sglang.main
participant HC as SglangHealthCheckPayload
participant Serve as serve_endpoint
rect rgba(235,245,220,0.5)
Note over SgMain: init_prefill()
Caller->>SgMain: init_prefill(...)
SgMain->>HC: new()
HC-->>SgMain: to_dict() -> health_check_payload
SgMain->>Serve: serve_endpoint(..., health_check_payload)
Serve-->>SgMain: prefill initialized
SgMain-->>Caller: ack
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
components/backends/sglang/src/dynamo/sglang/health_check.py (2)
35-42
: Make canary runs deterministic (seed=None -> fixed seed).For stable health signals, prefer a fixed seed. Proposal:
- "sampling_options": { + "sampling_options": { "n": 1, - "temperature": 0.0, + "temperature": 0.0, "top_p": 1.0, "top_k": -1, - "seed": None, + "seed": 0, "use_beam_search": False, },If sglang treats seed=None specially, confirm that 0 is accepted and produces deterministic behavior.
10-18
: Optional: export symbol for clarity.Helps static import tools and avoids accidental re-exports.
from dynamo.health_check import HealthCheckPayload class SglangHealthCheckPayload(HealthCheckPayload): + __all__ = ["SglangHealthCheckPayload"]
components/backends/sglang/src/dynamo/sglang/main.py (1)
161-167
: Pass the prefill‑shaped payload to serve_endpoint.Ensure you’re using the wrapped payload for prefill.
generate_endpoint.serve_endpoint( handler.generate, graceful_shutdown=True, metrics_labels=[("model", server_args.served_model_name)], - health_check_payload=health_check_payload, + health_check_payload=health_check_payload, )Please sanity‑test the prefill endpoint’s health check locally; it should at least yield bootstrap_info and not KeyError on sampling_params.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/backends/sglang/src/dynamo/sglang/health_check.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/main.py
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/backends/sglang/src/dynamo/sglang/health_check.py (1)
lib/bindings/python/src/dynamo/health_check.py (1)
HealthCheckPayload
(61-96)
components/backends/sglang/src/dynamo/sglang/main.py (4)
components/backends/sglang/src/dynamo/sglang/health_check.py (1)
SglangHealthCheckPayload
(13-46)lib/bindings/python/src/dynamo/health_check.py (1)
to_dict
(86-96)components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py (1)
generate
(56-94)components/backends/sglang/src/dynamo/sglang/request_handlers/prefill_handler.py (1)
generate
(49-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - sglang
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/sglang/src/dynamo/sglang/main.py (2)
18-18
: Import path looks right.Assuming module is packaged with backend, this import should resolve.
If CI fails on import, check packaging of components/backends/sglang/src in PYTHONPATH.
123-128
: Confirm serve_endpoint supports health_check_payload kwarg.
lib/bindings/python/src/dynamo/_core.pyi declares serve_endpoint with a health_check_payload parameter (Optional[Dict], default None), so the Python API exposes the kwarg and existing call sites remain compatible. Verify the underlying binding/runtime implementation actually accepts/forwards/uses this parameter and update docs/examples if needed.components/backends/sglang/src/dynamo/sglang/health_check.py (1)
24-45
: Validate sampling field names against the builders
- components/backends/sglang/src/dynamo/sglang/health_check.py sets many sampling_options keys, but components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py currently only maps/reads temperature — either (A) translate/accept the exact keys the decode/prefill builders expect (e.g. max_tokens → max_new_tokens where applicable), or (B) shrink the default_payload to only the supported keys.
- Treat top_k = -1 as “disabled” (omit or map to None) and ensure temperature = 0.0 is accepted (don’t treat 0 as falsy).
- Verify vLLM backend mapping (components/backends/vllm/src/dynamo/vllm/handlers.py uses SamplingParams(**self.default_sampling_params) and iterates request["sampling_options"]) and keep key semantics consistent across backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the rabbit comments. I think they are useful to read here.
I've added SglangPrefillHealthCheckPayload for prefill health check request. Created https://linear.app/nvidia/issue/DIS-649/use-bos-for-the-token-id-in-health-check-request for the BOS token id. As this is a general issue for all backends' payload. I will work on this in another PR |
Overview:
Add canary health check to Sglang. Main logic in PR#2903
Details:
Add canary health check to Sglang.
Update the input type of PrefillWorkerHandler::generate() to dict
If we pass a JSON string (wrong):
If we pass a dict (correct):
Where should the reviewer start?
components/backends/sglang/src/dynamo/sglang/health_check.py: default payload
components/backends/sglang/src/dynamo/sglang/main.py: add payload to endpoint
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-636
Summary by CodeRabbit