fix(deps): update dependency vllm to ^0.10.0 #257
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
^0.5.0
->^0.10.0
Release Notes
vllm-project/vllm (vllm)
v0.10.1
Compare Source
Highlights
v0.10.1 release includes 727 commits, 245 committers (105 new contributors).
Model Support
Engine Core
Hardware & Performance
reshape_and_cache_flash
CUDA kernel (#22036), CPU transfer support in NixlConnector (#18293).Quantization
API & Frontend
Dependencies
pip install vllm[flashinfer]
for flexible installation (#21959).V0 Deprecation
Important: As part of the ongoing V0 engine cleanup, several breaking changes have been introduced:
--task
with--runner
and--convert
options (#21470), deprecated--disable-log-requests
in favor of--enable-log-requests
for clearer semantics (#21739), renamed--expand-tools-even-if-tool-choice-none
to--exclude-tools-when-tool-choice-none
for consistency (#20544).What's Changed
SpecializedManager
by @zhouwfang in #21407--expand-tools-even-if-tool-choice-none
with--exclude-tools-when-tool-choice-none
for v0.10.0 by @okdshin in #20544flashinfer
tov0.2.8
by @cjackal in #21385cutlass_fp4_group_mm
illegal memory access by @yewentao256 in #21465run-batch
supports V1 by @DarkLight1337 in #21541site_url
for RunLLM by @hmellor in #21564requirements/common.txt
to run unit tests by @zhouwfang in #215722025072
by @yaochengji in #21555has_flashinfer_moe
Import Error when it is not installed by @yewentao256 in #21634moe_align_block_size_triton
by @yewentao256 in #21335torch.compile
for bailing moe by @jinzhen-lin in #21664--task
with--runner
and--convert
by @DarkLight1337 in #21470Ernie 4.5
] Name Change for Base 0.3B Model by @vasqu in #21735metavar
to list the choices for a CLI arg when custom values are also accepted by @hmellor in #21760dynamic_scaled_fp8_quant
andstatic_scaled_fp8_quant
by @yewentao256 in #21773_lazy_init()
by @smarterclayton in #21472CompressedTensorsW8A8Fp8MoEMethod
andCompressedTensorsW8A8Fp8MoECutlassMethod
by @yewentao256 in #21775uv
in GPU installation docs by @davidxia in #20277flashinfer_python
to CUDA wheel requirements by @mgoin in #21389__nv_fp8_e4m3
instead ofc10::e4m3
forper_token_group_quant
by @yewentao256 in #21867ZE_AFFINITY_MASK
for device select on xpu by @jikunshang in #21815Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.