Skip to content

Conversation

ashwins990
Copy link
Contributor

This PR improves the performance of INT8 dynamic quant path on ARM architecture. The performance gain is obtained by

  1. Replacing Kleidiai dotprod microkernel with imm version
  2. Quantization and packing is done within the parallel section closer to matmul execution.
  3. Both row and column is blocked.

We get the below performance by running vllm benchmark_serving.py script, for different request rates.

image

@ashwins990 ashwins990 requested review from a team as code owners August 19, 2025 18:16
@ashwins990
Copy link
Contributor Author

@maxnick Kindly review the changes. Thanks!

@maxnick maxnick self-assigned this Aug 27, 2025
@mg-intel mg-intel added category: CPU OpenVINO CPU plugin platform: arm OpenVINO on ARM / ARM64 labels Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants