Image preview #522

stduhpf · 2024-12-13T18:17:20Z

Forked off #454
Would also probably replace #416

Move the preview decoding logic from examples/cli/main.cpp to stable-diffusion.cpp
Image preview is disabled by default
Adds possibility to chose between previewing image with latent projection (as demonstrated in fast latent image preview #454), TAE, or VAE
Adds possibility to load TAE for preview only (decode final image with VAE)
Default image preview path is preview.png

Related to #354, if the user uses an image viewer that updates its render when the image file changes, then it's possible to see the progress in real time.

wandbrandon · 2025-02-17T19:37:42Z

+1 for this in main, great work!

stable-diffusion.cpp

stduhpf · 2025-10-25T19:41:09Z

Before this is merged, should I rename the "proj" preview method to "latent2rgb" like it's called in ComfyUI?

leejet · 2025-10-26T03:13:31Z

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

wbruna · 2025-10-26T11:23:51Z

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

@leejet , your comments aren't showing up for me. But I guess you could be referring to where the projection matrices come from?

stduhpf · 2025-10-26T11:51:05Z

I think the naming doesn’t really matter. Once the potential license issue I mentioned in the review comments is resolved, this PR can be merged.

@leejet , your comments aren't showing up for me. But I guess you could be referring to where the projection matrices come from?

I'm not seeing them either, I was very confused.

stduhpf · 2025-10-26T13:04:33Z

But I guess to avoid any licensing issues I could just train the projection matrices myself. It kinda feels like reinventing a perfectly working wheel though.

wbruna · 2025-10-26T13:42:38Z

It could be argued that the matrices are just the product of an algorithm (training, a simple least-squares approximation, etc), and thus not restricted by copyright.

The problem is the "arguing" part 😕 Even if that argument is sound (and I personally believe it is), sidestepping the issue through an independent implementation would completely avoid that kind of headache.

leejet · 2025-10-26T14:40:45Z

My original review comment:

There may be potential licensing issues here: ComfyUI is under the GPL license, while sd.cpp is under the MIT license. Unless it can be proven that this data is not exclusive to ComfyUI and instead comes from a permissively licensed source, there could be conflicts.
For example, the mean/std values in Wan2.2 come directly from the official Wan2.2 repository (https://github.com/Wan-Video/Wan2.2/blob/main/wan/modules/vae2_2.py#L904), which is licensed under the Apache License 2.0.

leejet · 2025-10-26T14:48:52Z

As far as I know, algorithms themselves are not protected by copyright law — only the specific source code implementations are.
Therefore, rewriting the Python code in C++ does not trigger the GPL restrictions.
However, directly copying data embedded in the original code may fall under the GPL if that data is original or creative in nature.
If the data consists solely of factual or non-creative information, then it is generally not subject to copyright protection and thus not restricted by the GPL.

stduhpf · 2025-10-26T15:01:45Z

SD3's projection was taken directly from the official inference code (MIT). For the others I'm pretty the data is distilled from the VAEs. I don't think it counts as "creative", but if we really want to be extra safe, we could re-train them. As far as I know, ComfyUI doesn't say where these weights come from.

stduhpf · 2025-10-27T00:26:05Z

Ok I'm doing it, It will take some time to get them for all supported VAEs because my process might not be the most efficient, but here are the weight I came up with for sd1 VAE already:

const float sd_latent_rgb_proj[4][3] = {
    {0.303418f, 0.205030f, 0.223200f},
    {0.158560f, 0.272113f, 0.092085f},
    {-0.229890f, 0.170979f, 0.213735f},
    {-0.155664f, -0.226876f, -0.498111f}};
float sd_latent_rgb_bias[3] = {-0.054481f, -0.125704f, -0.211548f};

My way of training it is to get a very large set of various images (my output folder), encode them with vae to get the latents while also downscaling them (using the RMS value of the 8x8 patches for gamma-correctish downscaling), and putting the average RGB and latent channels in a very big CSV (>900 MB). Then I take a large random sample of these rows, biased towards the more saturated colors (otherwise the previews are washed out), and do a least square regression with bias.

Here it is compared to the one from ComfyUI (labeled as old) and a "ground truth" downscaling of the decoded image (that is probably not achievable with a simple affine regression like this one)

old projection	downscaled original	new projection

Full res original:

Edit: The results I'm getting with SDXL VAE aren't as good for some reason (visibly worse than ComfyUI's, but still usable).
Flux one seems good.

stduhpf · 2025-10-28T14:04:42Z

Ok I updated all latent to RGB projections except for sd3.x.

Only SDXL projection feels like a small downgrade, everything else seems about on par or better than the previous version.

I trained Wan's 21 and 2.2 proj on still images only, but it seems to handle motion fine (not perfect but good enough for now).

stduhpf · 2025-10-28T14:09:42Z

latent-preview.h

+                // change range
+                r = r * .5f + .5f;
+                g = g * .5f + .5f;
+                b = b * .5f + .5f;


I now see I could easily bake this into the proj matrices and bias (new_proj = proj*0.5, new_bias = bias*0.5+0.5) , not sure if it's worth putting time into this.

Chroma Radiance would still need this though, so probably not worth it.

stduhpf force-pushed the image-preview branch 2 times, most recently from bab1323 to d235ded Compare December 29, 2024 21:02

stduhpf force-pushed the image-preview branch from 65f636a to a539cb5 Compare February 6, 2025 15:04

stduhpf force-pushed the image-preview branch from dfd9c4c to efc6db8 Compare February 22, 2025 15:16

iwr-redmond mentioned this pull request Mar 12, 2025

[Feature Request] Generation Preview Option Teriks/dgenerate#28

Open

SkutteOleg mentioned this pull request May 6, 2025

Intermediate step images #672

Open

stduhpf force-pushed the image-preview branch 3 times, most recently from f03d84c to 1bef24d Compare July 15, 2025 17:36

stduhpf added 20 commits August 30, 2025 18:23

fast latent image preview

e8ac336

fix posix compile

de9c492

move latent preview code to a separate file

ee4aef8

Latent preview support for img2img and img2vid

75a9abd

add latent-preview to .gitignore

8dcb814

Refactor latent preview + support tae/vae preview

ef62078

update usage

2cedeb5

Fix build + add warning

be0a442

Disable preview by default in sdcpp too

31b0fdd

Done not preload preview tensor when preview is disabled.

95fd31c

Fix VAE preview darkening

cbd8c99

Increase context memory when loading multiple auto encoders

c3d72c0

Increase context memory when previewing with auto encoder instead

8059ac3

fix compile warnings

8e6024f

fix print-params

19ac567

fix preview with unet inpaint models

430f7d8

do not spam pretty progress when using tiled vae/tae as preview

2272068

change log level of "processing %i tiles"

eeca697

Refactor preview to match the other callbacks

beb0e91

preview: new API

d465a70

remove tensor shape spam

059f025

stduhpf commented Oct 22, 2025

View reviewed changes

stable-diffusion.cpp Outdated Show resolved Hide resolved

Fix progress display

6563d46

wbruna mentioned this pull request Oct 23, 2025

qwen-image-edit progress it/s possible wrong #892

Closed

stduhpf added 9 commits October 25, 2025 19:33

Merge branch 'master' into image-preview

fff9930

preview: support pixel space diffusion

b1fc7cd

include preview (and apply_mask) in speed stats properly

31d36b2

support noisy preview via API

4e3500c

missing includes

27af5a4

supports noisy preview in main

07c61f1

fix tae-preview-only (bad merge issue)

f80f61a

format code

6c68e39

update help in readme

fc2a71e

stduhpf added 2 commits October 28, 2025 14:55

use bespoke latent to rgb projection to prevent licensing issues

8a3346f

fix sd3 null bias breaking build

b5e73f9

stduhpf commented Oct 28, 2025

View reviewed changes

stduhpf added 3 commits October 28, 2025 16:35

Merge branch 'master' into image-preview

a50e2ce

use new ggml_ext function names

c1226d6

Fix radiance proj support

3db7fb1

Uh oh!

Image preview #522

Are you sure you want to change the base?

Image preview #522

Uh oh!

Conversation

stduhpf commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wandbrandon commented Feb 17, 2025

Uh oh!

Uh oh!

stduhpf commented Oct 25, 2025

Uh oh!

leejet commented Oct 26, 2025

Uh oh!

wbruna commented Oct 26, 2025

Uh oh!

stduhpf commented Oct 26, 2025

Uh oh!

stduhpf commented Oct 26, 2025

Uh oh!

wbruna commented Oct 26, 2025

Uh oh!

leejet commented Oct 26, 2025

Uh oh!

leejet commented Oct 26, 2025

Uh oh!

stduhpf commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stduhpf commented Dec 13, 2024 •

edited

Loading

stduhpf commented Oct 26, 2025 •

edited

Loading

stduhpf commented Oct 27, 2025 •

edited

Loading

stduhpf commented Oct 28, 2025 •

edited

Loading