[Vulkan] Improve LLM Prefill Performance #12920

Closed

Labels

Milestone

opened

on Jul 28, 2025

Quantized Llama3.2 1B on par decode, >1x prefill performance on S24 compared to ET + XNNPACK (4-bit quantized).

cc @SS-JIA @manuelcandales @cbilgin

Metadata

Assignees

No one assigned

Labels

Type

Projects

ExecuTorch Vulkan Backend

Status

Done

Milestone

1.0.0

Relationships

None yet

Development

No branches or pull requests