Skip to content

[Vulkan] Improve LLM Prefill Performance #12920

@digantdesai

Description

@digantdesai

Quantized Llama3.2 1B on par decode, >1x prefill performance on S24 compared to ET + XNNPACK (4-bit quantized).

cc @SS-JIA @manuelcandales @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: vulkanIssues related to the Vulkan delegate and code under backends/vulkan/

    Type

    Projects

    Status

    In progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions