Skip to content

Conversation

@Bissmella
Copy link

What does this PR do?

This is a draft implementation of the Unified SP attention approach.

  • Implements _all_to_all_dim_exchange with custom scatter and gather indices
  • Implements TemplatedUnifiedAttention

Core implementation complete, needs:

  • Testing
  • Validation
  • Documentation
  • Performance benchmarks

@sayakpaul
Copy link
Member

It would be nice to get a testing script so that we can quickly check things.

@KarthikSundar2002
Copy link

I added a basic test script with a simple forward and backward op. Is it better to have a test script with flash_attention_backward and forward??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants