[core] support sage attention through `kernels` #12439

sayakpaul · 2025-10-06T05:48:06Z

What does this PR do?

Code to test:

from diffusers import DiffusionPipeline 
import torch 

repo_id = "black-forest-labs/FLUX.1-dev"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
pipe.transformer.set_attention_backend("sage_hub")

image = pipe(
    prompt="a dog sitting by the sea, waiting for its companion to come",
    guidance_scale=3.5,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
).images[0]
image.save("sage_flux.png")

Result:

Notes

It would be nice to get torch.compile support when using sage attention like we have for flash and flash 3. Currently, this fails.

Code to test

from diffusers import DiffusionPipeline 
import torch 

repo_id = "black-forest-labs/FLUX.1-dev"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
pipe.transformer.set_attention_backend("sage_hub")
pipe.transformer.compile_repeated_blocks(fullgraph=True)

with (
    torch._inductor.utils.fresh_inductor_cache(),
    torch._dynamo.config.patch(error_on_recompile=True),
):
    image = pipe(
        prompt="a dog sitting by the sea, waiting for its companion to come",
        guidance_scale=3.5,
        num_inference_steps=30,
        max_sequence_length=512,
        generator=torch.manual_seed(0)
    ).images[0]
image.save("sage_flux.png")

Error: https://pastebin.com/3HS6HNzR

We have other sageattn variants (see here), which would be cool to expose from the Hub kernel.

Cc: @MekkCyber

HuggingFaceDocBuilderDev · 2025-10-06T05:56:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Very cool ! I will try to look into the torch compile compatibility, but for the other variants, they are the same as sageattn, what i mean is sageattn is just a wrapper that dispatches to the correct kernel depending on the hardware used : https://github.com/thu-ml/SageAttention/blob/main/sageattention/core.py#L140

sayakpaul · 2025-10-06T13:44:54Z

they are the same as sageattn, what i mean is sageattn is just a wrapper that dispatches to the correct kernel depending on the hardware used :

So, you mean we shouldn't have to have different dispatched functions like this?

diffusers/src/diffusers/models/attention_dispatch.py

Line 194 in ce90f9b

_SAGE_QK_INT8_PV_FP8_CUDA = "_sage_qk_int8_pv_fp8_cuda"

MekkCyber · 2025-10-06T14:45:44Z

Yes I think we don't need that because it depends on the hardware. For example if a user chooses : _sage_qk_int8_pv_fp8_cuda on A100 (8.0) it will fail, because this function is only supported and compiled for 8.9 gpus

sayakpaul · 2025-10-07T13:39:18Z

src/diffusers/models/attention_dispatch.py

-_SAGE_ATTENTION_PV_ACCUM_DTYPE = Literal["fp32", "fp32+fp32"]
-_SAGE_ATTENTION_QK_QUANT_GRAN = Literal["per_thread", "per_warp"]
-_SAGE_ATTENTION_QUANTIZATION_BACKEND = Literal["cuda", "triton"]


I don't see their usage, hence removed.

up

e9ea1c5

sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Oct 6, 2025

sayakpaul requested a review from DN6 October 6, 2025 05:48

MekkCyber reviewed Oct 6, 2025

View reviewed changes

Merge branch 'main' into sage-kernels

f630dab

StrongerXi mentioned this pull request Oct 6, 2025

Add torch.compile support to SageAttention thu-ml/SageAttention#218

Open

sayakpaul added 2 commits October 7, 2025 14:59

Merge branch 'main' into sage-kernels

18c3e8e

support automatic dispatch.

d344134

sayakpaul marked this pull request as draft October 7, 2025 13:38

sayakpaul commented Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] support sage attention through `kernels` #12439

[core] support sage attention through `kernels` #12439

sayakpaul commented Oct 6, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

sayakpaul commented Oct 6, 2025

Uh oh!

MekkCyber commented Oct 6, 2025

Uh oh!

sayakpaul Oct 7, 2025

Uh oh!

Uh oh!

[core] support sage attention through kernels #12439

Are you sure you want to change the base?

[core] support sage attention through kernels #12439

Conversation

sayakpaul commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Notes

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Oct 6, 2025

Uh oh!

MekkCyber commented Oct 6, 2025

Uh oh!

sayakpaul Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[core] support sage attention through `kernels` #12439

[core] support sage attention through `kernels` #12439

sayakpaul commented Oct 6, 2025 •

edited

Loading