-
Notifications
You must be signed in to change notification settings - Fork 1.6k
update disagg slurm scripts #6248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: lingjiew <[email protected]>
WalkthroughThe changes update a SLURM script to invoke the Changes
Estimated code review effort2 (~12 minutes) Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/source/scripts/disaggregated/disaggr_torch.slurm
(1 hunks)docs/source/scripts/disaggregated/gen_yaml.py
(2 hunks)
🔇 Additional comments (2)
docs/source/scripts/disaggregated/gen_yaml.py (1)
200-202
: LGTM - Consistent kv_cache configuration restructuring for generation servers.The configuration change is applied consistently to both context and generation servers, maintaining structural coherence.
docs/source/scripts/disaggregated/disaggr_torch.slurm (1)
99-99
: Verifytrtllm-serve
availability in container imageI didn’t find any
trtllm-serve
installation steps in Dockerfiles or build scripts, though it’s registered as a CLI entry point insetup.py:247
. Please confirm that:
- The container image used on Qwen3 installs the
tensorrt_llm
package (so thattrtllm-serve
is on$PATH
).trtllm-serve
runs successfully in that environment with the disaggregated flags.If it’s missing, add the appropriate
pip install .
or package installation step to your Dockerfile or build pipeline.
'kv_cache_config': { | ||
'dtype': 'fp8' | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
LGTM - Consistent kv_cache configuration restructuring for context servers.
The change from 'kv_cache_dtype': 'fp8'
to 'kv_cache_config': {'dtype': 'fp8'}
aligns with the updated TensorRT-LLM serving API structure.
Please verify that this configuration format change is compatible with the target TensorRT-LLM version and that existing deployments using the old format have been updated accordingly.
🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining usage of the old kv_cache_dtype format
# Expected: No occurrences of the old format should remain
echo "Searching for old kv_cache_dtype format..."
rg -n "kv_cache_dtype" --type py
echo "Searching for new kv_cache_config format to confirm consistency..."
rg -A 3 "kv_cache_config" --type py
Length of output: 186878
Incomplete Migration of kv_cache_dtype
to Nested kv_cache_config
The search revealed numerous remaining references to the old flat kv_cache_dtype
format across core scripts, examples, tests, and CLI definitions. Updating only docs/source/scripts/disaggregated/gen_yaml.py
will break those workflows. Please either provide backward-compatibility logic or fully migrate all usages to the nested format.
Key areas to address:
- Core quantization tool:
tensorrt_llm/quantization/quantize_by_modelopt.py
- C++ engine build helper:
cpp/tests/resources/scripts/build_gptj_engines.py
- Example Gen-YAML scripts:
examples/wide_ep/slurm_scripts/gen_yaml.py
- CLI argument definitions and validation:
tensorrt_llm/llmapi/llm_args.py
&examples/quantization/quantize.py
- All examples under
examples/llm-api/
andexamples/llm-eval/
- Integration and unit tests referencing
--kv_cache_dtype=
orkv_cache_dtype
in code
Suggested action:
- Replace every
kv_cache_dtype
occurrence withkv_cache_config: { dtype: … }
in those files, or - Retain a temporary fallback that maps the old key to the new nested structure.
🤖 Prompt for AI Agents
In docs/source/scripts/disaggregated/gen_yaml.py around lines 177 to 179, the
migration from the flat kv_cache_dtype to the nested kv_cache_config structure
is incomplete and inconsistent across the codebase. To fix this, update all
references to kv_cache_dtype in the listed core scripts, examples, CLI
definitions, and tests to use the nested kv_cache_config dictionary with a dtype
key. Alternatively, implement backward-compatibility logic that detects the old
kv_cache_dtype format and converts it to the new nested structure to avoid
breaking existing workflows.
@Shixiaowei02 @qiaoxj07 Could you help to review this PR? Seems like I don't have permission to add reviewers. |
Sure, thanks |
Tested the scripts and fix some issues.
These scripts works on Qwen3 as well.
Summary by CodeRabbit