Releases: intel/neural-compressor
Intel® Neural Compressor v1.14 Release
- Highlights
- New Features
- Improvements
- Bug Fixes
- Productivity
- Examples
Highlights
We are excited to announce the release of Intel® Neural Compressor v1.14! We release new Pruning API for PyTorch, allowing users select better combinations of criteria, pattern and scheduler to achieve better pruning accuracy. This release also supports Keras input for TensorFlow quantization, and self-distilled quantization for better quantization accuracy.
New Features
- Pruning/Sparsity
- Quantization
- GUI
- Add mixed precision (commit 26e902)
Improvement
- Enhance tuning for Quantization with IPEX 1.12 to remove additional Quant/DeQuant (commit 192100)
- Add upstream and download API for HuggingFace model hub, which can handle configuration files, tokenizer files and int8 model weights in the format of transformers (commit 46d945)
- Align with Intel PyTorch extension new API (commit cc368a)
- Add load with yaml and pt to be compatible with older PyTorch model saving type (commit a28705)
Bug Fixes
- Quantization
- Export
- Fix export_to_onnx API (commit 158c7f)
Productivity
- Support TensorFlow 2.10.0 (commit d6b6c9 & 8130e7)
- Support OnnxRuntime 1.12 (commit 498ac4)
- Export PyTorch QAT to Onnx (commit 029a63)
- Add Tensorflow and PyTorch container tpp file (commit d245b5)
Examples
- Add example of download from HuggingFace model hub and example of upstream models to the hub (commit 46d945)
- Add notebooks for Neural Coder (commit 105db7)
- Add 2 IPEX examples: bert_large (squad), distilbert_base (squad) (commit 192100)
- ADD 2 DDP for prune once for all examples: roberta-base and Bert Base (commit 26a476)
Validated Configurations
- Python 3.7, 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.9, 2.10
- Intel TensorFlow 2.7, 2.8, 2.9
- PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
- IPEX 1.10.0, 1.11.0, 1.12.0
- MxNet 1.7, 1.9
- ONNX Runtime 1.10, 1.11, 1.12
Intel® Neural Compressor v1.13.1 Release
Features
-
Support experimental auto-coding quantization for PyTorch
- Post-training static and dynamic quantization for PyTorch
- Post-training static quantization for IPEX
- Mixed-precision (BF16, INT8, and FP32) for PyTorch
-
Refactor quantization utilities for ONNX Runtime
Bug fix
- Fixed model compression orchestration issue caused by PyTorch v1.11
- Fixed GUI issues
Validated Configurations
- Python 3.8
- Centos 8.4
- TensorFlow 2.9
- Intel TensorFlow 2.9
- PyTorch 1.12.0+cpu
- IPEX 1.12.0
- MXNet 1.7.0
- ONNX Runtime 1.11.0
Intel® Neural Compressor v1.13 Release
Features
-
Quantization
- Support new quantization APIs for Intel TensorFlow
- Support FakeQuant (QDQ) quantization format for ITEX
- Improve INT8 quantization recipes for ONNX Runtime
-
Mixed Precision
- Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
-
Neural Architecture Search
- Support SuperNet-based neural architecture search (DyNAS)
-
Sparsity
- Support training for block-wise structured sparsity
-
Strategy
- Support operator-type based tuning strategy
Productivity
- Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
- Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
- Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry
Ecosystem
- Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
- Upstream 10 quantized Transformers-based models to HuggingFace Model Hub
Examples
- Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
- Add more quantization examples from TensorFlow Model Zoo
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.7, 2.8, 2.9
- Intel TensorFlow 2.7, 2.8, 2.9
- PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
- IPEX 1.10.0, 1.11.0, 1.12.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.9.0, 1.10.0, 1.11.0
Intel® Neural Compressor v1.12 Release
Features
-
Quantization
- Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
- Improve post-training quantization (static & dynamic) on PyTorch
- Improve post-training quantization on TensorFlow
- Improve QLinear and QDQ quantization modes on ONNX Runtime
- Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
-
Pruning
- Improve pruning-once-for-all for NLP models
-
Sparsity
- Support experimental sparse kernel for reference examples
Productivity
- Support model deployment by loading INT8 models directly from HuggingFace model hub
- Improve GUI with optimized model downloading, performance profiling, etc.
Ecosystem
- Highlight simple quantization usage with few clicks on ONNX Model Zoo
- Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo
Examples
- Add Bert-mini distillation + quantization notebook example
- Add DLRM & SSD-ResNet34 quantization examples on IPEX
- Improve BERT structured sparsity training example
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.6.2, 2.7, 2.8
- Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
- PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
- IPEX 1.8.0, 1.9.0, 1.10.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.8.0, 1.9.0, 1.10.0
Intel® Neural Compressor v1.11 Release
Features
- Quantization
- Supported QDQ as experimental quantization format for ONNX Runtime
- Improved FX symbolic tracing for PyTorch
- Supported multi-metrics for quantization tuning
- Knowledge distillation
- Improved distillation algorithm for intermediate layer knowledge transfer
- Productivity
- Improved quantization productivity for ONNX Runtime through GUI
- Improved PyTorch INT8 model save/load methods
- Ecosystem
- Upstreamed INC quantized Yolov3, DenseNet, Mask-Rcnn, Yolov4 models to ONNX Model Zoo
- Became PyTorch ecosystem tool shortly after published PyTorch INC tutorial
- Examples
- Added INC quantized ResNet50 v1.5 and BERT-Large model for IPEX
- Supported dynamic quantization & weight sharing on bare metal reference engine
Intel® Neural Compressor v1.10 Release
Features
- Quantization
- Supported the quantization on latest deep learning frameworks
- Supported the quantization for a new model domain (Audio)
- Supported the compatible quantization recipes for framework upgrade
- Pruning & Knowledge distillation
- Supported fine-tuning and quantization using INC & Optimum for “Prune Once for All: Sparse Pre-Trained Language Models” published at ENLSP NeurIPS Workshop 2021
- Structured sparsity
- Proved the sparsity training recipes across multiple model domains (CV, NLP, and Recommendation System)
Productivity
- Improved INC GUI for easy quantization
- Supported Windows OS conda installation
Ecosystem
- Upgraded INC v1.9 into HuggingFace Optimum
- Upsteamed INC quantized mobilenet & faster-rcnn models to ONNX Model Zoo
Examples
- Supported quantization on 300 random models
- Added bare-metal examples for Bert-mini and DLRM
Validated Configurations
- Python 3.7, 3.8, 3.9
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.6.2, 2.7, 2.8
- Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
- PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
- IPEX 1.8.0, 1.9.0, 1.10.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.8.0, 1.9.0, 1.10.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.9 Release
Features
-
Knowledge distillation
- Supported one-shot compression pipelines (knowledge distillation during quantization-aware training) on PyTorch
- Added more distillation examples on TensorFlow and PyTorch
-
Quantization
- Supported multi-objective tuning for quantization
- Supported Intel Extension for PyTorch v1.10 version
- Improved quantization-aware training support on PyTorch v1.10
-
Pruning
- Added more magnitude pruning examples on TensorFlow
-
Reference bara-metal examples
- Supported BF16 optimizations on NLP models
- Added sparse DLRM model (experimental)
-
Productivity
- Added Python favorable API (alternative to YAML configuration file)
- Improved user facing APIs more pythonic
-
Ecosystem
- Integrated pruning API into HuggingFace Optimum
- Added ssd-mobilenetv1, efficientnet, ssd, fcn_rn50, inception_v1 quantized models to ONNX Model Zoo
Validated Configurations
- Python 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.8.1 Release
Features
- Knowledge distillation
- Supported knowledge distillation on TensorFlow
- Pruning
- Support Multi-node training on TensorFlow
- Acceleration library
- Supported Hugging Face minilm_l6_h384_uncased_sst2, bert_base_cased_mrpc, and bert_base_nli_mean_tokens_stsb models
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.8 Release
Features
- Knowledge distillation
- Implemented the algorithms of paper “Pruning Once For All” accepted by NeurIPS 2021 ENLSP workshop
- Supported optimization pipelines (knowledge distillation & quantization-aware training) on PyTorch
- Quantization
- Added the support of ONNX RT 1.7
- Added the support of TensorFlow 2.6.2 and 2.7
- Added the support of PyTorch 1.10
- Pruning
- Supported magnitude pruning on TensorFlow
- Acceleration library
- Supported Hugging Face top 10 downloaded NLP models
Productivity
- Added performance profiling feature to INC UI service.
- Improved ease-of-use user interface for quantization with few clicks
Ecosystem
- Added notebook of using HuggingFace optimization library (Optimum) to Transformers
- Enabled top 20 downloaded Hugging Face NLP models with Optimum
- Upstreamed more INC quantized models to ONNX Model Zoo
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.7.1 Release
Intel® Neural Compressor(formerly known as Intel® Low Precision Optimization Tool) v1.7 release is featured by:
Features
- Acceleration library
- Support unified buffer memory allocation policy
Ecosystem
- Upstreamed INC quantized models (alexnet/caffenet/googlenet/squeezenet) to ONNX Model Zoo
Documentation
- Performance and accuracy data update
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.0
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact INC Maintainers, if you get any questions.