Skip to content

Add Flash Attention 2 support #1265

@rajveer43

Description

@rajveer43

Is your feature request related to a problem? Please describe.
Flash Attention 2 is a library that provides attention operation kernels for faster and more memory efficient inference and training:

Describe the solution you'd like
(https://github.com/Dao-AILab/flash-attention)

Metadata

Metadata

Assignees

Labels

scoping requiredFeatures that need significant design and planning before being actionabletype:featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions