Skip to content

Commit cb81bf2

Browse files
authored
r2.8 release note (#3817)
1 parent ab949ae commit cb81bf2

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

docs/tutorials/releases.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
Releases
22
========
33

4+
## 2.8.0
5+
6+
We are excited to announce the release of Intel® Extension for PyTorch* 2.8.0+cpu which accompanies PyTorch 2.8. This release mainly brings you new LLM model optimization including Qwen3 and Whisper large-v3, enhancement of API for multi-LoRA inference kernels and optimizations of LLM generation sampler. This release also includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions.
7+
8+
Besides providing optimization in Intel® Extension for PyTorch*, over the past years, we have also upstreamed most of our features and optimizations for Intel® platforms into PyTorch\* and will continue pushing remaining ones into PyTorch\* in future. Moving forward, we will change our working model to prioritize developing new features and optimization directly in PyTorch\*, and de-prioritize development in Intel® Extension for PyTorch\*, effective after 2.8 release. We will continue providing critical bug fixes and security patches if needed throughout the PyTorch\* 2.9 timeframe to ensure a smooth transition for our partners and community.
9+
10+
### Highlights
11+
12+
* Qwen3 support
13+
14+
[Qwen3](https://qwenlm.github.io/blog/qwen3/) has recently been released, the latest addition to the Qwen family of large language models. Intel® Extension for PyTorch* provides [support of Qwen3](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-qwen3-large-language-models.html) since its launch date with early release version for MoE models like [Qwen3-30B](https://huggingface.co/Qwen/Qwen3-30B-A3B) and middle-size dense model like [Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B). Related optimizations have been included in this official release.
15+
16+
* Whisper large-v3 support
17+
18+
Intel® Extension for PyTorch* provides optimization for [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3), a state-of-the-art model for automatic speech recognition (ASR) and speech translation. Key improvements include replacing the cross-attention mechanism with the Indirect Access Key-Value (IAKV) Cache kernel, bringing you well-performing experience with weight-only INT8 quantization on Intel® Xeon® processors.
19+
20+
* General Large Language Model (LLM) optimization
21+
22+
Intel® Extension for PyTorch* provides sgmv support in the API for multi-LoRA inference kernels for LLM serving frameworks and optimizes the LLM generation sampler. A full list of optimized models can be found at [LLM optimization](https://github.com/intel/intel-extension-for-pytorch/tree/v2.8.0+cpu/examples/cpu/llm/inference).
23+
24+
* Bug fixing and other optimization
25+
26+
- Optimized the performance of LLM [#3688](https://github.com/intel/intel-extension-for-pytorch/commit/9659e2e76c610c4a01d5579a9775c6a071679cb6) [#3708](https://github.com/intel/intel-extension-for-pytorch/commit/84834fbc1747458660b837491316b10e9a43e6d5) [#3754](https://github.com/intel/intel-extension-for-pytorch/commit/2978fd620fbdf347264a5e0d502cd019e1ab639b)
27+
- Removed the dependency on torch-ccl and oneCCL [#3690](https://github.com/intel/intel-extension-for-pytorch/commit/4d00e5a49b38257f28d13b076f2c8564740afa71)
28+
29+
**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v2.7.0+cpu...v2.8.0+cpu
30+
431
## 2.7.0
532

633
We are excited to announce the release of Intel® Extension for PyTorch* 2.7.0+cpu which accompanies PyTorch 2.7. This release mainly brings you new LLM model optimization including DeepSeek-R1-671B and Phi-4, new APIs for LLM serving frameworks including sliding window and softcap support in PagedAttention APIs, MambaMixer API for Jamba and Mamba model and API for multi-LoRA inference kernels. This release also includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.

0 commit comments

Comments
 (0)