Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Please use the following recommended versions to get started quickly:
| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
|v0.9.2rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
|v0.9.1rc2|Next stable release|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
|v0.9.1rc3|Next stable release|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
|v0.7.3.post1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/stable/installation.html) for more details|

## Contributing
Expand Down
2 changes: 1 addition & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
|v0.9.2rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
|v0.9.1rc2| 下一个正式/稳定版 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
|v0.9.1rc3| 下一个正式/稳定版 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
|v0.7.3.post1| 最新正式/稳定版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html)[安装指南](https://vllm-ascend.readthedocs.io/en/stable/installation.html)了解更多|

## 贡献
Expand Down
2 changes: 2 additions & 0 deletions docs/source/community/versioning_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
|-------------|--------------|------------------|-------------|--------------------|--------------|
| v0.9.2rc1 | v0.9.2 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250619 | |
| v0.9.1rc3 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1 | |
| v0.9.1rc2 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1| |
| v0.9.1rc1 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250528 | |
| v0.9.0rc2 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
Expand All @@ -38,6 +39,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

| Date | Event |
|------------|-------------------------------------------|
| 2025.08.22 | Release candidates, v0.9.1rc3 |
| 2025.08.06 | Release candidates, v0.9.1rc2 |
| 2025.07.11 | Release candidates, v0.9.2rc1 |
| 2025.06.22 | Release candidates, v0.9.1rc1 |
Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
# - main branch: 'main'
# - vX.Y.Z branch: latest vllm-ascend release tag
'vllm_ascend_version': 'v0.9.1rc2',
'vllm_ascend_version': 'v0.9.1rc3',
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
# This value should be updated when cut down release.
'pip_vllm_ascend_version': "0.9.1rc2",
'pip_vllm_ascend_version': "0.9.1rc3",
'pip_vllm_version': "0.9.1",
# CANN image tag
'cann_image_tag': "8.2.rc1-910b-ubuntu22.04-py3.11",
Expand Down
2 changes: 1 addition & 1 deletion docs/source/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Version Specific FAQs

- [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
- [[v0.9.0rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1115)
- [[v0.9.1rc3] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2410)

## General FAQs

Expand Down
32 changes: 31 additions & 1 deletion docs/source/user_guide/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,35 @@
# Release note

## v0.9.1rc3 - 2025.08.22

This is the 3rd release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/) to get started.

### Core

- MTP supports V1 scheduler [#2371](https://github.com/vllm-project/vllm-ascend/pull/2371)
- Add LMhead TP communication groups [#1956](https://github.com/vllm-project/vllm-ascend/pull/1956)
- Fix the bug that qwen3 moe doesn't work with aclgraph [#2478](https://github.com/vllm-project/vllm-ascend/pull/2478)
- Fix `grammar_bitmask` IndexError caused by outdated `apply_grammar_bitmask` method [#2314](https://github.com/vllm-project/vllm-ascend/pull/2314)
- Remove `chunked_prefill_for_mla` [#2177](https://github.com/vllm-project/vllm-ascend/pull/2177)
- Fix bugs and refactor cached mask generation logic [#2326](https://github.com/vllm-project/vllm-ascend/pull/2326)
- Fix configuration check logic about ascend scheduler [#2327](https://github.com/vllm-project/vllm-ascend/pull/2327)
- Cancel the verification between deepseek-mtp and non-ascend scheduler in disaggregated-prefill deployment [#2368](https://github.com/vllm-project/vllm-ascend/pull/2368)
- Fix issue that failed with ray distributed backend [#2306](https://github.com/vllm-project/vllm-ascend/pull/2306)
- Fix incorrect req block length in ascend scheduler [#2394](https://github.com/vllm-project/vllm-ascend/pull/2394)
- Fix header include issue in rope [#2398](https://github.com/vllm-project/vllm-ascend/pull/2398)
- Fix mtp config bug [#2412](https://github.com/vllm-project/vllm-ascend/pull/2412)
- Fix error info and adapt `attn_metedata` refactor [#2402](https://github.com/vllm-project/vllm-ascend/pull/2402)
- Fix torchair runtime errror caused by configuration mismtaches and `.kv_cache_bytes` file missing [#2312](https://github.com/vllm-project/vllm-ascend/pull/2312)
- Move `with_prefill` allreduce from cpu to npu [#2230](https://github.com/vllm-project/vllm-ascend/pull/2230)

### Docs

- Add document for deepseek large EP [#2339](https://github.com/vllm-project/vllm-ascend/pull/2339)

### Known Issues

- `test_aclgraph.py` failed with `"full_cuda_graph": True` on A2 (910B1) [#2182](https://github.com/vllm-project/vllm-ascend/issues/2182)

## v0.9.1rc2 - 2025.08.06
This is the 2nd release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/) to get started.

Expand Down Expand Up @@ -120,7 +150,7 @@ This is the 1st release candidate of v0.9.2 for vLLM Ascend. Please follow the [
- Fix the accuracy problem with deploy models with parallel parameters. [#1678](https://github.com/vllm-project/vllm-ascend/pull/1678)
- The pre-built wheel package now requires lower version of glibc. Users can use it by `pip install vllm-ascend` directly. [#1582](https://github.com/vllm-project/vllm-ascend/pull/1582)

## Other
### Other
- Official doc has been updated for better read experience. For example, more deployment tutorials are added, user/developer docs are updated. More guide will coming soon.
- Fix accuracy problem for deepseek V3/R1 models with torchair graph in long sequence predictions. [#1331](https://github.com/vllm-project/vllm-ascend/pull/1331)
- A new env variable `VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP` has been added. It enables the fused allgather-experts kernel for Deepseek V3/R1 models. The default value is `0`. [#1335](https://github.com/vllm-project/vllm-ascend/pull/1335)
Expand Down