Skip to content

Propose to update & upgrade SkyReels-V2 #12167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoz tolgacangoz commented Aug 17, 2025

Skywork/SkyReels-V2-DF-1.3B-540P
seed=0
main: ~14 min. Wan's RoPE
main.mp4
Wan.s_RoPE.mp4
Wan's RoPE + compile_repeated_blocks(fullgraph=True): ~12 min. Wan's RoPE + compile_repeated_blocks(fullgraph=True) + "_native_cudnn" for attn1 and "flash" for attn2, FA=2.8.3: ~8 min.
Wan.s_RoPE+regional.mp4
Wan.s_RoPE+regional+FA.mp4
Reproducer
!uv pip install git+https://github.com/tolgacangoz/diffusers.git@update-skyreels-v2

import torch, os
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline, UniPCMultistepScheduler
from diffusers.utils import export_to_video

# For faster loading into the GPU
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"

model_id = "Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers"
vae = AutoModel.from_pretrained(model_id,
                                subfolder="vae",
                                torch_dtype=torch.float32,
                                device_map="cuda")
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
    model_id,
    vae=vae,
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)
flow_shift = 8.0  # 8.0 for T2V, 5.0 for I2V
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)

# Some acceleration helpers
# Be sure to install Flash Attention: https://github.com/Dao-AILab/flash-attention#installation-and-features
#for block in pipeline.transformer.blocks:
#    block.attn1.set_attention_backend("_native_cudnn")
#    block.attn2.set_attention_backend("flash")
#pipeline.transformer.compile_repeated_blocks(fullgraph=True)

prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."

output = pipeline(
    prompt=prompt,
    num_inference_steps=30,
    height=544,  # 720 for 720P
    width=960,   # 1280 for 720P
    num_frames=97,
    base_num_frames=97,  # 121 for 720P
    ar_step=5,  # Controls asynchronous inference (0 for synchronous mode)
    causal_block_size=5,  # Number of frames in each block for asynchronous processing
    overlap_history=None,  # Number of frames to overlap for smooth transitions in long videos; 17 for long video generations
    addnoise_condition=20,  # Improves consistency in long video generation
    generator=torch.Generator("cpu").manual_seed(0)
).frames[0]
export_to_video(output, "T2V.mp4", fps=24, quality=8)
Environment
- 🤗 Diffusers version: 0.35.0 or this branch
- Platform: Linux-4.4.0-x86_64-with-glibc2.36
- Running on Google Colab?: No
- Python version: 3.12.6
- PyTorch version (GPU?): 2.8.0+cu126 (True)
- Flax version (CPU?/GPU?/TPU?): 0.11.0 (gpu)
- Jax version: 0.7.0
- JaxLib version: 0.7.0
- Huggingface_hub version: 0.34.3
- Transformers version: 4.55.0
- Accelerate version: 1.9.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.6.1
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-40GB, 40960 MiB

@a-r-r-o-w @yiyixuxu @stevhliu

tolgacangoz and others added 12 commits August 17, 2025 22:28
Wraps the visual demonstration section in a Markdown code block.

This change corrects the rendering of ASCII diagrams and examples, improving the overall readability of the document.
Improves the readability of the `step_matrix` examples by replacing long sequences of repeated numbers with a more compact `value×count` notation.

This change makes the underlying data patterns in the examples easier to understand at a glance.
@@ -39,79 +40,121 @@
logger = logging.get_logger(__name__) # pylint: disable=invalid-name


class SkyReelsV2AttnProcessor2_0:
def _get_qkv_projections(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from doesn't work here?

@tolgacangoz tolgacangoz marked this pull request as draft August 20, 2025 15:20
@tolgacangoz tolgacangoz changed the title Propose to update SkyReels-V2 Propose to update & upgrade SkyReels-V2 Aug 21, 2025
@tolgacangoz tolgacangoz marked this pull request as ready for review August 21, 2025 12:59
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the docs!


Key Pattern: Block i lags behind Block i-1 by exactly ar_step=5 timesteps, creating the
staggered "diffusion forcing" effect where later blocks condition on cleaner earlier blocks.
```text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving! I think the text block should only be used for the graph and chart visuals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified accordingly. I used backticks for rows representation too, otherwise the columns don't seem as aligned. How is it now?


## Notes

- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the LoRA example being removed?

Copy link
Contributor Author

@tolgacangoz tolgacangoz Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was copied from Wan's page. Since I didn't test anything with LoRAs, I removed the specific example. But SkyReels-V2 and Wan have almost the same architecture. I preserved - SkyReels-V2 supports LoRAs with [~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights].

@tolgacangoz tolgacangoz requested a review from stevhliu August 22, 2025 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants