-
Notifications
You must be signed in to change notification settings - Fork 926
Add regional AoT compilation #3057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sayakpaul after re-thinking about regional compilation, I think that the current process is still a bit too complex to be included in the blogpost. I think that simplifying this process at library level (either in |
|
@cbensimon good point. However, I think since the post is the only go-to resource for the devs (building on ZeroGPU) out there, it's nice to include the regional compilation section. Once we have an API in spaces or anywhere else, we can simply swap it back. Regarding using |
Vaibhavs10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably okay to go ahead and merge this as-is and then you can refine as you abstract away the complexities a bit more
|
Yeah pretty much. Things are already in progress, so should be just a few days once we swap out things from here. So, waiting for Charles to hear what he thinks. |
| - [LTX Video](https://huggingface.co/spaces/zerogpu-aoti/ltx-dev-fast) | ||
|
|
||
| ### Regional compilation | ||
| - [Regional compilation recipe](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought that it was your recent tutorial on regional AoT. Still nice to include this one though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about to be merged: pytorch/tutorials#3543
|
Approved. Only TODO link left @sayakpaul (link to the push and re-use collection) |
Co-authored-by: Charles <[email protected]>
|
Will merge after updating the link. |
zerogpu-aoti.md
Outdated
|
|
||
| In our example, we can compile the repeated blocks of the Flux transformer ahead of time like so. The [Flux Transformer](https://github.com/huggingface/diffusers/blob/c2e5ece08bf22d249c62e964f91bc326cf9e3759/src/diffusers/models/transformers/transformer_flux.py) has two kinds of repeated blocks: `FluxTransformerBlock` and `FluxSingleTransformerBlock`. | ||
|
|
||
| You can check out [this Space](https://huggingface.co/spaces/zerogpu-aoti/Qwen-Image-Edit-AoT-Regional) for a complete example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was clarifying to me, rather than the demo space itself. Perhaps we could link to both and use the code to illustrate the explanations.
However, I only see pipeline.transformer.transformer_blocks[0] being compiled, whereas we mentioned two different kinds of repeated blocks in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writing demonstrates with Flux. The demo uses Qwen which has a single block. I have changed the link to Flux from @cbensimon. But just a link to the demo is fine, IMO.
| ### Use a compiled graph from the Hub | ||
|
|
||
| Once a model (or even a model block) is compiled ahead of time, we can serialize the compiled graph module | ||
| as an artifact and reuse later. In the context of a ZeroGPU-powered demo on Spaces, this will significantly | ||
| cut down the demo startup time. | ||
|
|
||
| To keep the storage light, we can just save the compiled model graph without including any model parameters | ||
| inside the artifact. | ||
|
|
||
| Check out [this collection](TODO) that shows a full workflow of obtaining compiled model graph, pushing it | ||
| to the Hub, and then using it to build a demo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this section. What are the benefits of persisting the serialization vs the code demonstrated in the previous example? Also, the collection is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the collection is missing.
I don't understand this section. What are the benefits of persisting the serialization vs the code demonstrated in the previous example?
We skip the compilation time reusing a compiled graph.
Co-authored-by: Pedro Cuenca <[email protected]>
No description provided.