Quantized Llama3.2 1B on par decode, >1x prefill performance on S24 compared to ET + XNNPACK (4-bit quantized). cc @SS-JIA @manuelcandales @cbilgin