v0.1.4.2

cmodi-meta released this 15 Mar 00:11

42bbb97

Local Inference Support

The major update is to upgrade to support ExecuTorch v0.5.0 framework for on-device inferencing. Some of the notable improvements are:

Include support for KleidiAI Blockwise Kernels in XNNPACK to give 20%+ gain in Llama prefill
Support models quantized via torchao’s quanitize_ api
Enable stable lowering into XNNPACK
Feature and fixes on Qualcomm and MediaTek backends (support to come in the future)
Bug fixes

It's compatible with models (.pte files) that were exported with the previous 0.4 version of ExecuTorch.

Demo App Location

To help consolidate reference material, we’ve moved the example demo apps from llama-stack-apps to llama-stack-client-kotlin.

Contributors

@ashwinb, @cmodi-meta, @dltn , @Riandy, @WuhanMonkey, @yanxi0830

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2