Fixes #12673. `record_stream` in group offloading is not working properly #12721

KimbingNg · 2025-11-26T10:56:00Z

What does this PR do?

Fixes # (12673) and Tencent-Hunyuan/HunyuanVideo-1.5#10 I identified the root cause: record_stream isn’t functioning as expected. This causes the offload_to_memory function to update param.data without waiting for the default stream (current_stream) to complete (the forward pass).

Updates [Nov 27]: The root cause!!
The current_stream() call inside _transfer_tensor_to_device is under with torch.cuda.stream(self.stream). So it is not returning the correct default stream to call record_stream on !

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
[s] Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @asomoza @sayakpaul @a-r-r-o-w @DN6

KimbingNg · 2025-11-26T11:09:27Z

Also relevant to Tencent-Hunyuan/HunyuanVideo-1.5#10

sayakpaul

Thanks for your PR!

Could we also run the test_group_offloading tests?

pytest tests/models -k "test_group_offloading"?

src/diffusers/hooks/group_offloading.py

KimbingNg · 2025-11-26T12:06:06Z

pytest tests/models -k "test_group_offloading"

Running pytest:

=============================================== 365 passed, 37 skipped, 3604 deselected, 757 warnings in 100.34s (0:01:40) ================================================

HuggingFaceDocBuilderDev · 2025-11-26T12:36:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6

Thanks @KimbingNg I think we might also need to make a similar change to _offload_to_disk.

src/diffusers/hooks/group_offloading.py

Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

KimbingNg · 2025-11-27T07:44:15Z

@DN6 , I found the root cause!!!!

The current_stream() call inside _transfer_tensor_to_device is under with torch.cuda.stream(self.stream). So it is not returning the correct default stream to call record_stream on !

As for the onload_from_disk case, the current_stream is obtained outside the with context, so it works fine.

I force push a new commit, please help reviewing that again, thanks!

DN6 · 2025-11-27T08:25:49Z

@KimbingNg Ah great catch! 👍🏽

sayakpaul

We can also remove the following lines:

if self.__class__.__name__ == "AutoencoderKLCosmosTests" and offload_type == "leaf_level":
            pytest.skip("With `leaf_type` as the offloading type, it fails. Needs investigation.")

in the test_group_offloading_with_disk() test implementation.

KimbingNg · 2025-11-27T12:53:08Z

We can also remove the following lines:
if self.__class__.__name__ == "AutoencoderKLCosmosTests" and offload_type == "leaf_level":
            pytest.skip("With `leaf_type` as the offloading type, it fails. Needs investigation.")
in the test_group_offloading_with_disk() test implementation.

I ran pytest tests/models -k "test_group_offloading" after removing these two lines:

=============================================== 366 passed, 36 skipped, 3604 deselected, 1177 warnings in 123.01s (0:02:03) ===============================================

Should I push a commit that removes these two lines?

sayakpaul · 2025-11-27T13:05:15Z

Yes

KimbingNg · 2025-11-27T13:09:49Z

The two lines are removed

KimbingNg mentioned this pull request Nov 26, 2025

QwenImagepipeline imcompatable with group offloading with record_stream=True #12673

Open

KimbingNg force-pushed the record_stream_bug branch from 0de99c0 to cef1ed6 Compare November 26, 2025 11:07

sayakpaul reviewed Nov 26, 2025

View reviewed changes

src/diffusers/hooks/group_offloading.py Outdated Show resolved Hide resolved

KimbingNg changed the title ~~Fixes #12673. record_stream is not working properly~~ Fixes #12673. record_stream in group offloading is not working properly Nov 26, 2025

sayakpaul requested a review from DN6 November 26, 2025 12:30

DN6 reviewed Nov 27, 2025

View reviewed changes

src/diffusers/hooks/group_offloading.py Outdated Show resolved Hide resolved

Fixes huggingface#12673.

9159ed7

Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

KimbingNg force-pushed the record_stream_bug branch from 60444c5 to 9159ed7 Compare November 27, 2025 07:41

update

4d39e8c

Merge branch 'main' into record_stream_bug

e5533c1

sayakpaul reviewed Nov 27, 2025

View reviewed changes

Update test

e56f511

sayakpaul requested a review from DN6 November 27, 2025 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes #12673. `record_stream` in group offloading is not working properly #12721

Fixes #12673. `record_stream` in group offloading is not working properly #12721

KimbingNg commented Nov 26, 2025 •

edited

Loading

Uh oh!

KimbingNg commented Nov 26, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

Uh oh!

KimbingNg commented Nov 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 26, 2025

Uh oh!

DN6 left a comment

Uh oh!

Uh oh!

KimbingNg commented Nov 27, 2025 •

edited

Loading

Uh oh!

DN6 commented Nov 27, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

KimbingNg commented Nov 27, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Nov 27, 2025

Uh oh!

KimbingNg commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes #12673. record_stream in group offloading is not working properly #12721

Are you sure you want to change the base?

Fixes #12673. record_stream in group offloading is not working properly #12721

Conversation

KimbingNg commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

KimbingNg commented Nov 26, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KimbingNg commented Nov 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 26, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KimbingNg commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Nov 27, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

KimbingNg commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Nov 27, 2025

Uh oh!

KimbingNg commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes #12673. `record_stream` in group offloading is not working properly #12721

Fixes #12673. `record_stream` in group offloading is not working properly #12721

KimbingNg commented Nov 26, 2025 •

edited

Loading

KimbingNg commented Nov 27, 2025 •

edited

Loading

KimbingNg commented Nov 27, 2025 •

edited

Loading