Skip to content

Commit 8173a29

Browse files
committed
Merge branch 'main' into torchao-compile-tests
2 parents 38c213f + 8adc600 commit 8173a29

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+5743
-282
lines changed

.github/workflows/pr_style_bot.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ jobs:
1414
with:
1515
python_quality_dependencies: "[quality]"
1616
secrets:
17-
bot_token: ${{ secrets.GITHUB_TOKEN }}
17+
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,8 @@
283283
title: AllegroTransformer3DModel
284284
- local: api/models/aura_flow_transformer2d
285285
title: AuraFlowTransformer2DModel
286+
- local: api/models/chroma_transformer
287+
title: ChromaTransformer2DModel
286288
- local: api/models/cogvideox_transformer3d
287289
title: CogVideoXTransformer3DModel
288290
- local: api/models/cogview3plus_transformer2d
@@ -405,6 +407,8 @@
405407
title: AutoPipeline
406408
- local: api/pipelines/blip_diffusion
407409
title: BLIP-Diffusion
410+
- local: api/pipelines/chroma
411+
title: Chroma
408412
- local: api/pipelines/cogvideox
409413
title: CogVideoX
410414
- local: api/pipelines/cogview3
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ChromaTransformer2DModel
14+
15+
A modified flux Transformer model from [Chroma](https://huggingface.co/lodestones/Chroma)
16+
17+
## ChromaTransformer2DModel
18+
19+
[[autodoc]] ChromaTransformer2DModel
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Chroma
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
18+
</div>
19+
20+
Chroma is a text to image generation model based on Flux.
21+
22+
Original model checkpoints for Chroma can be found [here](https://huggingface.co/lodestones/Chroma).
23+
24+
<Tip>
25+
26+
Chroma can use all the same optimizations as Flux.
27+
28+
</Tip>
29+
30+
## Inference (Single File)
31+
32+
The `ChromaTransformer2DModel` supports loading checkpoints in the original format. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.
33+
34+
The following example demonstrates how to run Chroma from a single file.
35+
36+
Then run the following example
37+
38+
```python
39+
import torch
40+
from diffusers import ChromaTransformer2DModel, ChromaPipeline
41+
from transformers import T5EncoderModel
42+
43+
bfl_repo = "black-forest-labs/FLUX.1-dev"
44+
dtype = torch.bfloat16
45+
46+
transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v35.safetensors", torch_dtype=dtype)
47+
48+
text_encoder = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
49+
tokenizer = T5Tokenizer.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype)
50+
51+
pipe = ChromaPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder=text_encoder, tokenizer=tokenizer, torch_dtype=dtype)
52+
53+
pipe.enable_model_cpu_offload()
54+
55+
prompt = "A cat holding a sign that says hello world"
56+
image = pipe(
57+
prompt,
58+
guidance_scale=4.0,
59+
output_type="pil",
60+
num_inference_steps=26,
61+
generator=torch.Generator("cpu").manual_seed(0)
62+
).images[0]
63+
64+
image.save("image.png")
65+
```
66+
67+
## ChromaPipeline
68+
69+
[[autodoc]] ChromaPipeline
70+
- all
71+
- __call__

docs/source/en/api/pipelines/cosmos.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,22 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
3636
- all
3737
- __call__
3838

39+
## Cosmos2TextToImagePipeline
40+
41+
[[autodoc]] Cosmos2TextToImagePipeline
42+
- all
43+
- __call__
44+
45+
## Cosmos2VideoToWorldPipeline
46+
47+
[[autodoc]] Cosmos2VideoToWorldPipeline
48+
- all
49+
- __call__
50+
3951
## CosmosPipelineOutput
4052

4153
[[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput
54+
55+
## CosmosImagePipelineOutput
56+
57+
[[autodoc]] pipelines.cosmos.pipeline_output.CosmosImagePipelineOutput

docs/source/en/quantization/bitsandbytes.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,45 @@ text_encoder_2_4bit.dequantize()
416416
transformer_4bit.dequantize()
417417
```
418418

419+
## torch.compile
420+
421+
Speed up inference with `torch.compile`. Make sure you have the latest `bitsandbytes` installed and we also recommend installing [PyTorch nightly](https://pytorch.org/get-started/locally/).
422+
423+
<hfoptions id="bnb">
424+
<hfoption id="8-bit">
425+
```py
426+
torch._dynamo.config.capture_dynamic_output_shape_ops = True
427+
428+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
429+
transformer_4bit = AutoModel.from_pretrained(
430+
"black-forest-labs/FLUX.1-dev",
431+
subfolder="transformer",
432+
quantization_config=quant_config,
433+
torch_dtype=torch.float16,
434+
)
435+
transformer_4bit.compile(fullgraph=True)
436+
```
437+
438+
</hfoption>
439+
<hfoption id="4-bit">
440+
441+
```py
442+
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True)
443+
transformer_4bit = AutoModel.from_pretrained(
444+
"black-forest-labs/FLUX.1-dev",
445+
subfolder="transformer",
446+
quantization_config=quant_config,
447+
torch_dtype=torch.float16,
448+
)
449+
transformer_4bit.compile(fullgraph=True)
450+
```
451+
</hfoption>
452+
</hfoptions>
453+
454+
On an RTX 4090 with compilation, 4-bit Flux generation completed in 25.809 seconds versus 32.570 seconds without.
455+
456+
Check out the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) for more details.
457+
419458
## Resources
420459

421460
* [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)

docs/source/en/quantization/torchao.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ transformer = torch.compile(transformer, mode="max-autotune", fullgraph=True)
6565

6666
For speed and memory benchmarks on Flux and CogVideoX, please refer to the table [here](https://github.com/huggingface/diffusers/pull/10009#issue-2688781450). You can also find some torchao [benchmarks](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks) numbers for various hardware.
6767

68+
> [!TIP]
69+
> The FP8 post-training quantization schemes in torchao are effective for GPUs with compute capability of at least 8.9 (RTX-4090, Hopper, etc.). FP8 often provides the best speed, memory, and quality trade-off when generating images and videos. We recommend combining FP8 and torch.compile if your GPU is compatible.
70+
6871
torchao also supports an automatic quantization API through [autoquant](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md#autoquantization). Autoquantization determines the best quantization strategy applicable to a model by comparing the performance of each technique on chosen input types and shapes. Currently, this can be used directly on the underlying modeling components. Diffusers will also expose an autoquant configuration option in the future.
6972

7073
The `TorchAoConfig` class accepts three parameters:

examples/community/ip_adapter_face_id.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -282,10 +282,7 @@ def load_ip_adapter_face_id(self, pretrained_model_name_or_path_or_dict, weight_
282282
revision = kwargs.pop("revision", None)
283283
subfolder = kwargs.pop("subfolder", None)
284284

285-
user_agent = {
286-
"file_type": "attn_procs_weights",
287-
"framework": "pytorch",
288-
}
285+
user_agent = {"file_type": "attn_procs_weights", "framework": "pytorch"}
289286
model_file = _get_model_file(
290287
pretrained_model_name_or_path_or_dict,
291288
weights_name=weight_name,

0 commit comments

Comments
 (0)