[Wan 2.2 VAE] fix VAE tiling encode/decode #12191

miaojinc · 2025-08-19T12:37:21Z

What does this PR do?

Current AutoencoderKLWan lacks some patchify stuff when tiling.
Also add patch_size config for Wan VAE unit tests.

Without this, we will got error likes:

  File "/home/mjc/diffusers/src/diffusers/pipelines/wan/pipeline_wan.py", line 645, in __call__
    video = self.vae.decode(latents, return_dict=False)[0]
  File "/home/mjc/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/mjc/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1248, in decode
    decoded = self._decode(z).sample
  File "/home/mjc/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1204, in _decode
    return self.tiled_decode(z, return_dict=return_dict)
  File "/home/mjc/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1374, in tiled_decode
    decoded = self.decoder(tile, feat_cache=self._feat_map, feat_idx=self._conv_idx)
  File "/root/miniforge3/envs/wan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/envs/wan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mjc/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 893, in forward
    x = up_block(x, feat_cache, feat_idx, first_chunk=first_chunk)
  File "/root/miniforge3/envs/wan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/envs/wan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mjc/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 709, in forward
    x = x + self.avg_shortcut(x_copy, first_chunk=first_chunk)
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 2

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[*] Did you read the contributor guideline?
[*] Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

hi @yiyixuxu @a-r-r-o-w
Could you please help to review it, thanks

Current AutoencoderKLWan lacks some patchify stuff when tiling. Also add patch_size config for Wan VAE unit tests. Signed-off-by: Jincheng Miao <[email protected]>

Apply patchify/unpatchify if needed. Signed-off-by: Jincheng Miao <[email protected]>

miaojinc added 3 commits August 19, 2025 08:22

[Wan 2.2 VAE] fix VAE tiling encode/decode

022e259

Current AutoencoderKLWan lacks some patchify stuff when tiling. Also add patch_size config for Wan VAE unit tests. Signed-off-by: Jincheng Miao <[email protected]>

Merge branch 'main' into main

0018f77

[Wan2.2 VAE] revert modification of encoder/decoder out channels

0adfad4

Apply patchify/unpatchify if needed. Signed-off-by: Jincheng Miao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Wan 2.2 VAE] fix VAE tiling encode/decode #12191

[Wan 2.2 VAE] fix VAE tiling encode/decode #12191

Uh oh!

miaojinc commented Aug 19, 2025

Uh oh!

Uh oh!

[Wan 2.2 VAE] fix VAE tiling encode/decode #12191

Are you sure you want to change the base?

[Wan 2.2 VAE] fix VAE tiling encode/decode #12191

Uh oh!

Conversation

miaojinc commented Aug 19, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!