Skip to content

Conversation

tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Sep 18, 2025

Overview:

Add canary health check to Sglang. Main logic in PR#2903

Details:

Add canary health check to Sglang.
Update the input type of PrefillWorkerHandler::generate() to dict

If we pass a JSON string (wrong):

Python: "{"request": {...}}"  (string)
   ↓ pythonize::depythonize
Rust: Value::String("{"request": {...}}")  
   ↓ network serialization
Wire: "\"{\\"request\\": {...}}\""  (double-encoded!)
   ↓ server receives
Python: "{"request": {...}}"  (string, needs decoding)

If we pass a dict (correct):

Python: {"request": {...}}  (dict)
   ↓ pythonize::depythonize  
Rust: Value::Object({"request": ...})
   ↓ network serialization
Wire: {"request": {...}}  (proper JSON)
   ↓ server receives
Python: {"request": {...}}  (dict, ready to use!)

Where should the reviewer start?

components/backends/sglang/src/dynamo/sglang/health_check.py: default payload
components/backends/sglang/src/dynamo/sglang/main.py: add payload to endpoint

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-636

Summary by CodeRabbit

  • New Features
    • Introduced a customizable health-check payload for the sglang backend with sensible defaults, overridable via environment variables.
    • Endpoints now accept a health_check_payload, enabling tailored readiness checks during model registration and prefill initialization.
    • Improved startup validation and health monitoring for more reliable readiness detection and reduced false negatives.

@tzulingk tzulingk force-pushed the tzulingk/health_check_sglang_cherry branch from bb5411b to e01016c Compare September 18, 2025 01:26
@tzulingk tzulingk enabled auto-merge (squash) September 18, 2025 01:27
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

Introduces a backend-specific health check payload class for sglang and wires it into the sglang backend flow. The payload is instantiated and converted to a dict, then passed to serve_endpoint during model registration and prefill initialization. No other control, readiness, or error-handling logic is changed.

Changes

Cohort / File(s) Summary
Health check payload module
components/backends/sglang/src/dynamo/sglang/health_check.py
Adds SglangHealthCheckPayload (subclass of HealthCheckPayload) that sets a sglang-specific default payload (token_ids, stop_conditions with max_tokens=1, sampling_options, empty eos_token_ids/annotations) and calls super().__init__() for env overrides.
Main integration
components/backends/sglang/src/dynamo/sglang/main.py
Imports the new payload class, constructs health_check_payload = SglangHealthCheckPayload().to_dict(), and passes it to serve_endpoint in both register_model() and init_prefill(). Public-facing change: serve_endpoint now receives a health_check_payload kwarg at these call sites.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant SgMain as sglang.main
  participant HC as SglangHealthCheckPayload
  participant Serve as serve_endpoint

  rect rgba(220,235,245,0.5)
  Note over SgMain: register_model()
  Caller->>SgMain: register_model(...)
  SgMain->>HC: new()
  HC-->>SgMain: to_dict() -> health_check_payload
  SgMain->>Serve: serve_endpoint(..., health_check_payload)
  Serve-->>SgMain: ready/not ready
  SgMain-->>Caller: result
  end
Loading
sequenceDiagram
  autonumber
  participant Caller as Caller
  participant SgMain as sglang.main
  participant HC as SglangHealthCheckPayload
  participant Serve as serve_endpoint

  rect rgba(235,245,220,0.5)
  Note over SgMain: init_prefill()
  Caller->>SgMain: init_prefill(...)
  SgMain->>HC: new()
  HC-->>SgMain: to_dict() -> health_check_payload
  SgMain->>Serve: serve_endpoint(..., health_check_payload)
  Serve-->>SgMain: prefill initialized
  SgMain-->>Caller: ack
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • feat: Trtllm canary health check #3082 — Introduces a backend-specific HealthCheckPayload and updates main flow to pass health_check_payload into serve_endpoint, matching the same change pattern.

Poem

A nibble of tokens, just one will do,
I twitch my ears—health checks run through.
Hop to the endpoint, payload in tow,
Prefill primed, we’re ready to go.
In burrows of bytes, I thump with delight—
All green carrots, systems bright! 🥕✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: Sglang canary health check" succinctly and accurately summarizes the primary change—adding a canary health check for the Sglang backend—and follows conventional commit styling. It is concise, on-topic, and gives a teammate scanning history a clear sense of the main purpose without unnecessary detail.
Description Check ✅ Passed The PR description includes the required template headings (Overview, Details, Where should the reviewer start, and Related Issues) and provides concrete details about the change, including the canary health check addition, the PrefillWorkerHandler::generate() input-type update, example JSON vs dict behavior, and files to review. The Details section is specific and actionable and the Overview references the related PR. The only shortcoming is that the Related Issues line does not use an action keyword (Closes/Fixes/Resolves/Relates to) as required by the repository template.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
components/backends/sglang/src/dynamo/sglang/health_check.py (2)

35-42: Make canary runs deterministic (seed=None -> fixed seed).

For stable health signals, prefer a fixed seed. Proposal:

-            "sampling_options": {
+            "sampling_options": {
                 "n": 1,
-                "temperature": 0.0,
+                "temperature": 0.0,
                 "top_p": 1.0,
                 "top_k": -1,
-                "seed": None,
+                "seed": 0,
                 "use_beam_search": False,
             },

If sglang treats seed=None specially, confirm that 0 is accepted and produces deterministic behavior.


10-18: Optional: export symbol for clarity.

Helps static import tools and avoids accidental re-exports.

 from dynamo.health_check import HealthCheckPayload
 
 
 class SglangHealthCheckPayload(HealthCheckPayload):
+    __all__ = ["SglangHealthCheckPayload"]
components/backends/sglang/src/dynamo/sglang/main.py (1)

161-167: Pass the prefill‑shaped payload to serve_endpoint.

Ensure you’re using the wrapped payload for prefill.

         generate_endpoint.serve_endpoint(
             handler.generate,
             graceful_shutdown=True,
             metrics_labels=[("model", server_args.served_model_name)],
-            health_check_payload=health_check_payload,
+            health_check_payload=health_check_payload,
         )

Please sanity‑test the prefill endpoint’s health check locally; it should at least yield bootstrap_info and not KeyError on sampling_params.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd3326 and e01016c.

📒 Files selected for processing (2)
  • components/backends/sglang/src/dynamo/sglang/health_check.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/main.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/backends/sglang/src/dynamo/sglang/health_check.py (1)
lib/bindings/python/src/dynamo/health_check.py (1)
  • HealthCheckPayload (61-96)
components/backends/sglang/src/dynamo/sglang/main.py (4)
components/backends/sglang/src/dynamo/sglang/health_check.py (1)
  • SglangHealthCheckPayload (13-46)
lib/bindings/python/src/dynamo/health_check.py (1)
  • to_dict (86-96)
components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py (1)
  • generate (56-94)
components/backends/sglang/src/dynamo/sglang/request_handlers/prefill_handler.py (1)
  • generate (49-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/sglang/src/dynamo/sglang/main.py (2)

18-18: Import path looks right.

Assuming module is packaged with backend, this import should resolve.

If CI fails on import, check packaging of components/backends/sglang/src in PYTHONPATH.


123-128: Confirm serve_endpoint supports health_check_payload kwarg.
lib/bindings/python/src/dynamo/_core.pyi declares serve_endpoint with a health_check_payload parameter (Optional[Dict], default None), so the Python API exposes the kwarg and existing call sites remain compatible. Verify the underlying binding/runtime implementation actually accepts/forwards/uses this parameter and update docs/examples if needed.

components/backends/sglang/src/dynamo/sglang/health_check.py (1)

24-45: Validate sampling field names against the builders

  • components/backends/sglang/src/dynamo/sglang/health_check.py sets many sampling_options keys, but components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py currently only maps/reads temperature — either (A) translate/accept the exact keys the decode/prefill builders expect (e.g. max_tokens → max_new_tokens where applicable), or (B) shrink the default_payload to only the supported keys.
  • Treat top_k = -1 as “disabled” (omit or map to None) and ensure temperature = 0.0 is accepted (don’t treat 0 as falsy).
  • Verify vLLM backend mapping (components/backends/vllm/src/dynamo/vllm/handlers.py uses SamplingParams(**self.default_sampling_params) and iterates request["sampling_options"]) and keep key semantics consistent across backends.

Copy link
Contributor

@ishandhanani ishandhanani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the rabbit comments. I think they are useful to read here.

@tzulingk
Copy link
Contributor Author

Please take a look at the rabbit comments. I think they are useful to read here.

I've added SglangPrefillHealthCheckPayload for prefill health check request. Created https://linear.app/nvidia/issue/DIS-649/use-bos-for-the-token-id-in-health-check-request for the BOS token id. As this is a general issue for all backends' payload. I will work on this in another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants