RFE: encoder: add stride & various raw input formats support #11

elmarco · 2025-02-12T14:35:16Z

The input format is not necessarily RGB or RGBA, and doing a pre-pass conversion can be quite costly (adding about 15-20% of total time from empirical study)

For simplicity reasons, I made sure not to break the existing API.

Those changes don't seem to affect the encoder performance in a significant way.

elmarco · 2025-02-13T11:13:04Z

See also: Devolutions/IronRDP#670

elmarco · 2025-03-18T15:42:30Z

@aldanor hi, wdyt?

elmarco · 2025-04-15T19:46:09Z

@aldanor ping :thanks:

elmarco · 2025-05-09T10:47:06Z

@aldanor are you still maintaining the crate? thanks

Joshix-1 · 2025-05-22T20:31:57Z

The api feels kinda confusing too me. It would be cool to control Input format and output format independently. I have the use case that I have an image with RGBA data where I know that the image has no translucency, so I want to save it as RGB. I don't know how I would do this with this PR. Which is why I created #15

Would be cool if it was possible to control the output format independently like in my pr.

And I don't really understand the stride argument. That should maybe be documented better.

elmarco · 2025-05-24T17:32:05Z

The api feels kinda confusing too me. It would be cool to control Input format and output format independently. I have the use case that I have an image with RGBA data where I know that the image has no translucency, so I want to save it as RGB. I don't know how I would do this with this PR. Which is why I created #15

You can use the RawChannels::Rgbx or Bgrx, or Xbgr etc.. from this PR for that.

Would be cool if it was possible to control the output format independently like in my pr.

Well, you can't really mix input and output formats freely. You need alpha channel input for alpha channel output. And QOI has only two output formats, rgb and rgba...

And I don't really understand the stride argument. That should maybe be documented better.

https://learn.microsoft.com/en-us/windows/win32/medfound/image-stride

Joshix-1 · 2025-05-25T15:27:57Z

You can use the RawChannels::Rgbx or Bgrx, or Xbgr etc.. from this PR for that.

True, didn't think about that. But what if someone wants to add an alpha channel?

You need alpha channel input for alpha channel output.

Adding an alpha channel with default values of 255 would be imho fine and not unexpected.

elmarco · 2025-05-27T17:41:32Z

You need alpha channel input for alpha channel output.

Adding an alpha channel with default values of 255 would be imho fine and not unexpected.

Well, this use case goes beyond the typical simple encoding imo.

Joshix-1 · 2025-05-28T14:13:30Z

Why is RawChannels::bytes_per_pixel not public? There should either be another Encoder constructor without a stride argument or bytes_per_pixel should be public. Otherwise the Encoder::new_raw API is imho too clunky to use if you have no stride.

elmarco · 2025-05-28T14:29:49Z

Why is RawChannels::bytes_per_pixel not public? There should either be another Encoder constructor without a stride argument or bytes_per_pixel should be public. Otherwise the Encoder::new_raw API is imho too clunky to use if you have no stride.

We could make stride an Option. Alternatively, we could have an EncoderBuilder, but this might be overkill.

Joshix-1 · 2025-05-28T14:32:55Z

Stride could even be Option<NonZeroUsize>, but Option<usize> would be fine as well

Joshix-1

I like the EncoderBuilder. It really improves the usability. Have a few nitpics

src/encode.rs

Joshix-1 · 2025-06-04T17:09:30Z

src/encode.rs

+        if stride * (height - 1) as usize + width as usize * raw_channels.bytes_per_pixel() < size {
+            return Err(Error::InvalidImageLength { size, width, height });
+        }
+        if guess_stride && size != width as usize * height as usize * raw_channels.bytes_per_pixel()


width as usize * raw_channels.bytes_per_pixel() is a common element in the if statements, maybe that could be put in a variable

width_usize ? hmm, I don't know if this is a common pattern. I am a bit unsure.

src/encode.rs

RFE: encoder: add stride & various raw input formats support aldanor#11

src/encode.rs

src/error.rs

src/encode.rs

src/types.rs

tests/test_misc.rs

aldanor · 2025-07-28T00:20:04Z

@elmarco Added some comments, please lmk if you want to continue working on this pr or not?

Also, you'd need to rebase your branch after removing commits from this branch as lots of fixes are already on main:

bench cli fix (plus --stream mode addon)
fuzz package fix
large buffer fix
run-starting edgecase fix
there's also commits in this pr that seemingly belong to the fork like renaming the package

There's also a few questions:

can you run the benchmark before/after on the encoder and post the results, to make sure that double nested loop in the encoder isn't affecting things much?
should there be any tests with the streaming mode? (it should just work, but still)

Signed-off-by: Marc-André Lureau <[email protected]>

elmarco · 2025-07-28T12:55:58Z

@aldanor thanks for the review

* can you run the benchmark before/after on the encoder and post the results, to make sure that double nested loop in the encoder isn't affecting things much?

This is the biggest problem, it seems I didn't benchmark properly. I get a -33% encoding perf. I am trying to understand where it comes from and how to fix it. thanks

aldanor · 2025-07-28T13:06:25Z

I am trying to understand where it comes from and how to fix it

It might come from a double loop with now-unknown number of iterations in each preventing some common loop optimizations? I guess you can godbolt something equivalent.

Yea, -33% is pretty bad for sure... -3% would still be not nice but within noise bounds.

elmarco · 2025-07-28T17:03:41Z

Yea, -33% is pretty bad for sure... -3% would still be not nice but within noise bounds.

it turns out it's the assert_eq!(), this adds a bunch of panic handling code and prevent some optimizations. I switched it to debug_assert!(). Now, no performance regression is observable (with perf stat).

Joshix-1 · 2025-07-28T17:14:52Z

assert_eq

Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        279.8       381.8
         MB/s        876.6      1196.4
--------------------------------------
encode   Mp/s        224.4       206.9
         MB/s        703.3       648.2
--------------------------------------

debug_assert_eq

Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        280.2       381.9
         MB/s        878.0      1196.6
--------------------------------------
encode   Mp/s        225.6       208.9
         MB/s        706.9       654.7
--------------------------------------

81c14c4

Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        278.7       364.3
         MB/s        873.4      1141.6
--------------------------------------
encode   Mp/s        218.3       254.8
         MB/s        684.2       798.3
--------------------------------------

For me it's still slower with debug_assert_eq

Signed-off-by: Marc-André Lureau <[email protected]>

elmarco · 2025-07-28T17:16:59Z

did you compile with --release ?

Joshix-1 · 2025-07-28T17:18:09Z

I ran cargo run --release ../assets/ in the bench folder

This extra feature allows to turn on/off the extra input formats and shows that encode_impl() isn't correctly optimized independently of the various existing formats. This should probably be reported or analyzed by the compiler team. At least, I am not able to explain the reason. Signed-off-by: Marc-André Lureau <[email protected]>

elmarco · 2025-07-28T18:49:01Z

(I switched to stable rust, as I get slightly different results with nightly which can create confusion)

@Joshix-1 try with the latest commit. You should not see performance loss (I actually observe a slight improvement something like 0.5%). Something is weird when enabling the "extra-source" feature, the encode_impl() don't get optimized the same way and we can observe -5-10%. It seems to be related to the Fn / closure somehow. But each format or fn specialization should be receiving an independant analysis and optimization no? It may be worth to ask/report to the compiler team. In the mean time, perhaps the "extra-source" feature flag is acceptable?

Joshix-1 · 2025-07-28T18:57:12Z

ce61683e8682edb68f8f040f4cbbce1eae143fb7 has a similar effect. Seems to be some weird case where the compiler doesn't optimize

elmarco · 2025-07-28T19:11:21Z

@Joshix-1 I am really curious to know how you found about adding this extra E/Infallible !

Joshix-1 · 2025-07-28T19:15:00Z

Created a flamegraph and saw Try take a big chunk.

elmarco · 2025-07-29T08:50:34Z

I tried to make a subset test case to report to the compiler without success atm.

@Joshix-1 what to do next?:

use your workaround
add the feature flag I proposed
look for other solution (is stream writer really useful?)
wait for a compiler fix

thanks

Joshix-1 · 2025-07-29T12:44:49Z

I think my commit could also improve the performance of master. Which would mean that this pr has to get even faster.

I have some ideas I want to test out. I'll do some more testing an benchmarking. Stream writer is imho useful, I don't see a reason to remove it

Joshix-1 · 2025-07-29T17:20:00Z

Could not improve performance of master. I would suggest using 50293f3

With

[profile.release]
opt-level = 3
debug = true

in bench/Cargo.toml I get

$ cargo run --release ../assets/
Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        274.2       383.2
         MB/s        859.3      1200.9
--------------------------------------
encode   Mp/s        220.7       260.4
         MB/s        691.7       816.0
--------------------------------------

without the debug = true I get

Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        280.9       370.7
         MB/s        880.2      1161.4
--------------------------------------
encode   Mp/s        226.5       252.6
         MB/s        709.8       791.6
--------------------------------------

on master I get:

Overall results: (7 images, 7.78 MB raw, 2.48 MP):
--------------------------------------
                     qoi.h    qoi-rust
--------------------------------------
decode   Mp/s        284.8       368.8
         MB/s        892.4      1155.7
--------------------------------------
encode   Mp/s        220.2       257.1
         MB/s        690.0       805.7
--------------------------------------

But I don't really know anymore, it's all really weird and unexpected

elmarco · 2025-07-29T18:50:30Z

indeed, I am ok with 50293f3, we can later revert it.

Joshix-1 · 2025-07-29T19:53:05Z

Why revert it later? The change makes sense. Infallible methods shouldn't return results that could contain errors

elmarco · 2025-07-29T20:04:13Z

The compiler should be able to infer that BytesMut Writer is in fact Infallable when inlining, I think that's what it's doing when "extra-source" is off.

elmarco force-pushed the raw branch from ce6013e to 08963ce Compare February 13, 2025 08:00

elmarco force-pushed the raw branch from 08963ce to c795db5 Compare February 26, 2025 19:01

elmarco force-pushed the raw branch from c795db5 to 9fc76e8 Compare March 17, 2025 11:13

elmarco force-pushed the raw branch from 9fc76e8 to 37d063f Compare June 4, 2025 14:06

Joshix-1 approved these changes Jun 4, 2025

View reviewed changes

elmarco force-pushed the raw branch from 37d063f to 2c0bb0d Compare June 4, 2025 19:25

Joshix-1 mentioned this pull request Jul 1, 2025

New Release? #14

Open

elmarco force-pushed the raw branch from 2c0bb0d to 9b0e41c Compare July 22, 2025 09:54

elmarco added a commit to elmarco/qoi-rust that referenced this pull request Jul 22, 2025

Merge pull request #6 from elmarco/raw

a24a542

RFE: encoder: add stride & various raw input formats support aldanor#11

aldanor reviewed Jul 28, 2025

View reviewed changes

Fix a typo

d1c57d2

Signed-off-by: Marc-André Lureau <[email protected]>

elmarco force-pushed the raw branch from 9b0e41c to e815d42 Compare July 28, 2025 17:02

elmarco force-pushed the raw branch 3 times, most recently from 9fd810a to 5b9aa9b Compare July 28, 2025 17:13

encoder: add stride & various input formats support

2302767

Signed-off-by: Marc-André Lureau <[email protected]>

elmarco force-pushed the raw branch from 5b9aa9b to 2302767 Compare July 28, 2025 17:15

RFE: encoder: add stride & various raw input formats support #11

Are you sure you want to change the base?

RFE: encoder: add stride & various raw input formats support #11

Uh oh!

Conversation

elmarco commented Feb 12, 2025

Uh oh!

elmarco commented Feb 13, 2025

Uh oh!

elmarco commented Mar 18, 2025

Uh oh!

elmarco commented Apr 15, 2025

Uh oh!

elmarco commented May 9, 2025

Uh oh!

Joshix-1 commented May 22, 2025

Uh oh!

elmarco commented May 24, 2025

Uh oh!

Joshix-1 commented May 25, 2025

Uh oh!

elmarco commented May 27, 2025

Uh oh!

Joshix-1 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elmarco commented May 28, 2025

Uh oh!

Joshix-1 commented May 28, 2025

Uh oh!

Joshix-1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Joshix-1 Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

elmarco Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aldanor commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elmarco commented Jul 28, 2025

Uh oh!

aldanor commented Jul 28, 2025

Uh oh!

elmarco commented Jul 28, 2025

Uh oh!

Joshix-1 commented Jul 28, 2025

Uh oh!

elmarco commented Jul 28, 2025

Uh oh!

Joshix-1 commented Jul 28, 2025

Uh oh!

elmarco commented Jul 28, 2025

Uh oh!

Joshix-1 commented Jul 28, 2025

Uh oh!

elmarco commented Jul 28, 2025

Uh oh!

Joshix-1 commented Jul 28, 2025

Uh oh!

elmarco commented Jul 29, 2025

Uh oh!

Joshix-1 commented Jul 29, 2025

Uh oh!

Joshix-1 commented Jul 29, 2025

Uh oh!

Joshix-1 commented May 28, 2025 •

edited

Loading

aldanor commented Jul 28, 2025 •

edited

Loading