Skip to content

Conversation

@stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Dec 13, 2024

Forked off #454
Would also probably replace #416

  • Move the preview decoding logic from examples/cli/main.cpp to stable-diffusion.cpp
  • Image preview is disabled by default
  • Adds possibility to chose between previewing image with latent projection (as demonstrated in fast latent image preview #454), TAE, or VAE
  • Adds possibility to load TAE for preview only (decode final image with VAE)
  • Default image preview path is preview.png

Related to #354, if the user uses an image viewer that updates its render when the image file changes, then it's possible to see the progress in real time.

@stduhpf stduhpf force-pushed the image-preview branch 2 times, most recently from bab1323 to d235ded Compare December 29, 2024 21:02
@wandbrandon
Copy link

+1 for this in main, great work!

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 25, 2025

Before this is merged, should I rename the "proj" preview method to "latent2rgb" like it's called in ComfyUI?

@leejet
Copy link
Owner

leejet commented Oct 26, 2025

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

@wbruna
Copy link
Contributor

wbruna commented Oct 26, 2025

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

@leejet , your comments aren't showing up for me. But I guess you could be referring to where the projection matrices come from?

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 26, 2025

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

@leejet , your comments aren't showing up for me. But I guess you could be referring to where the projection matrices come from?

I'm not seeing them either, I was very confused.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 26, 2025

But I guess to avoid any licensing issues I could just train the projection matrices myself. It kinda feels like reinventing a perfectly working wheel though.

@wbruna
Copy link
Contributor

wbruna commented Oct 26, 2025

It could be argued that the matrices are just the product of an algorithm (training, a simple least-squares approximation, etc), and thus not restricted by copyright.

The problem is the "arguing" part 😕 Even if that argument is sound (and I personally believe it is), sidestepping the issue through an independent implementation would completely avoid that kind of headache.

@leejet
Copy link
Owner

leejet commented Oct 26, 2025

My original review comment:

There may be potential licensing issues here: ComfyUI is under the GPL license, while sd.cpp is under the MIT license. Unless it can be proven that this data is not exclusive to ComfyUI and instead comes from a permissively licensed source, there could be conflicts.
For example, the mean/std values in Wan2.2 come directly from the official Wan2.2 repository (https://github.com/Wan-Video/Wan2.2/blob/main/wan/modules/vae2_2.py#L904), which is licensed under the Apache License 2.0.

@leejet
Copy link
Owner

leejet commented Oct 26, 2025

As far as I know, algorithms themselves are not protected by copyright law — only the specific source code implementations are.
Therefore, rewriting the Python code in C++ does not trigger the GPL restrictions.
However, directly copying data embedded in the original code may fall under the GPL if that data is original or creative in nature.
If the data consists solely of factual or non-creative information, then it is generally not subject to copyright protection and thus not restricted by the GPL.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 26, 2025

SD3's projection was taken directly from the official inference code (MIT). For the others I'm pretty the data is distilled from the VAEs. I don't think it counts as "creative", but if we really want to be extra safe, we could re-train them. As far as I know, ComfyUI doesn't say where these weights come from.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 27, 2025

Ok I'm doing it, It will take some time to get them for all supported VAEs because my process might not be the most efficient, but here are the weight I came up with for sd1 VAE already:

const float sd_latent_rgb_proj[4][3] = {
    {0.303418f, 0.205030f, 0.223200f},
    {0.158560f, 0.272113f, 0.092085f},
    {-0.229890f, 0.170979f, 0.213735f},
    {-0.155664f, -0.226876f, -0.498111f}};
float sd_latent_rgb_bias[3] = {-0.054481f, -0.125704f, -0.211548f};

My way of training it is to get a very large set of various images (my output folder), encode them with vae to get the latents while also downscaling them (using the RMS value of the 8x8 patches for gamma-correctish downscaling), and putting the average RGB and latent channels in a very big CSV (>900 MB). Then I take a large random sample of these rows, biased towards the more saturated colors (otherwise the previews are washed out), and do a least square regression with bias.

Here it is compared to the one from ComfyUI (labeled as old) and a "ground truth" downscaling of the decoded image (that is probably not achievable with a simple affine regression like this one)

old projection downscaled original new projection
old8x downscaled8x new8x

Full res original:
output

Edit: The results I'm getting with SDXL VAE aren't as good for some reason (visibly worse than ComfyUI's, but still usable).
Flux one seems good.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 28, 2025

Ok I updated all latent to RGB projections except for sd3.x.

Only SDXL projection feels like a small downgrade, everything else seems about on par or better than the previous version.

I trained Wan's 21 and 2.2 proj on still images only, but it seems to handle motion fine (not perfect but good enough for now).

Comment on lines +162 to +165
// change range
r = r * .5f + .5f;
g = g * .5f + .5f;
b = b * .5f + .5f;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see I could easily bake this into the proj matrices and bias (new_proj = proj*0.5, new_bias = bias*0.5+0.5) , not sure if it's worth putting time into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chroma Radiance would still need this though, so probably not worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants