Avoid the explicit `CATransaction` #275

madsmtm · 2025-08-12T22:31:04Z

Transactions are expensive, and the layer should be able to figure out the timing of when to render by itself (by virtue of being installed in a view). The only reason why we did it before was to avoid a fade transition between layer content changes.

Part of #83. I have not benchmarked this, but I have visibly confirmed less stuttering when resizing.

notgull · 2025-08-13T03:10:24Z

src/backends/cg.rs

 use objc2_foundation::{
    ns_string, NSDictionary, NSKeyValueChangeKey, NSKeyValueChangeNewKey,
-    NSKeyValueObservingOptions, NSNumber, NSObject, NSObjectNSKeyValueObserverRegistration,
+    NSKeyValueObservingOptions, NSNull, NSNumber, NSObject, NSObjectNSKeyValueObserverRegistration,
    NSString, NSValue,
 };


Never liked these big mass imports. Any chance we could instead do:

use objc2_foundation as found;

...then do:

found::NSNull

Hmm, I'd rather avoid that, when reading the implementation, it doesn't actually matter much which framework a specific thing is from - and besides, everything is already prefixed ("NS", "CG" etc.).

I'd rather do:

use objc2_core_foundation::*; use objc2_core_graphics::*; use objc2_foundation::*; use objc2_quartz_core::*;

?

(I guess objc2_* could have taken the same approach as cidre, e.g. cidre::ns::Null, which is admittedly much more "Rusty", though I decided not to, since it makes it harder to figure out what a specific type corresponds to underneath).

I think it makes the code later on harder to read for humans. My vote's still on having a module prefix. I take the same approach in new Win32 code that I write.

For the record, I do something similar in the Windows part of the code.

If you think it's out of scope for this PR, I can file one later.

nicoburns · 2025-08-13T11:40:54Z

Wow. This is dramatically faster (up to 1000x !!!) for me. I'm seeing present times measured in 10s of microseconds rather than 10s of milliseconds. Specifically: running Blitz (a winit application) on my 14" MacBook Pro (M1), I'm getting the following for the times to call surface_buffer.present().unwrap();:

Test	`softbuffer` 0.4	This PR	buffer_mut	`pixels` 0.15
800x600 1x	1.5ms	17us	200us	500us
800x600 2x	6ms	25us	650us	1ms
1512x982 2x	18ms	30us	1.5ms	3ms

To reproduce:

Clone https://github.com/DioxusLabs/blitz/
Checkout cpu branch (optional, but makes other parts of rendering faster)

Then:

To test softbuffer 0.4: cargo run -rp readme --no-default-features --features comrak,log_frame_times,log_phase_times,cpu-softbuffer .
To test this PR add the following the bottom of the root-level (workspace) Cargo.toml:
```
[patch.crates-io]
softbuffer = { git = "https://github.com/rust-windowing/softbuffer", branch = "cg-avoid-transaction" }
```
and then run the same command as for softbuffer 0.4.
To test the pixels crate: cargo run -rp readme --no-default-features --features comrak,log_frame_times,log_phase_times,cpu-pixels .

You should then see output like:

Resolve: 11ms (style: 76us, construct: 10ms, flush: 32us, layout: 224us)
Frame time: 12ms (cmd: 11ms, flush: 105us, render: 277us, swizel: 336us, present: 19us)

It is the present time that is the call to softbuffer's (or pixels's) present.

madsmtm · 2025-08-13T15:11:17Z

Well, to be fair, this PR is not actually doing any work in the present call any more.

The actual work now happens in buffer_mut (allocation of a new buffer, which is deallocated once the CGImage is no longer referenced) and internally somewhere in Apple's rendering pipeline (maybe -[CALayer display]). I think if you're going to benchmark this, you'll need to use Instruments.app, flamegraph, or some other whole-program benchmarking.

These are expensive, and the layer should be able to figure out the timing of when to render by itself.

nicoburns · 2025-08-13T15:29:41Z

Well, to be fair, this PR is not actually doing any work in the present call any more.
The actual work now happens in buffer_mut ... and internally somewhere in Apple's rendering pipeline (maybe -[CALayer display]).

Hmm... I wasn't previously timing buffer_mut, but I just added it and the amount of time spent there doesn't seem to have changed much (~200us to ~1ms depending on buffer size for both 0.4 and this PR). I guess there could be some time being spent elsewhere within Apple frameworks that I'm not capturing. But this PR is visually much smoother for me, so I suspect it is genuinely faster overall.

madsmtm · 2025-08-13T15:37:04Z

Hmm... I wasn't previously timing buffer_mut, but I just added it and the amount of time spent there doesn't seem to have changed much (~200us to ~1ms depending on buffer size for both 0.4 and this PR).

Sorry, that wasn't particularly clear; I meant that buffer_mut was, and still is, doing a large part of the work (and some of this work would be lessened by using IOSurface and/or swapping between buffers instead of reallocating).

But this PR is visually much smoother for me, so I suspect it is genuinely faster overall.

Definitely agree.

MarijnS95 · 2025-08-13T16:15:41Z

Reading the definition of CATransaction, doesn't this simply offload/postpone the cost to somewhere else (e.g. the "when the thread’s runloop next iterates.")?

Or were we accidentally waiting for the transaction to have completed, while this new model allows multiple implicit transactions to be created and submitted "asynchronously"?

Just curious to map this to all other platforms' "compositor transaction" abstractions :)

madsmtm · 2025-08-13T16:51:58Z

Reading the definition of CATransaction, doesn't this simply offload/postpone the cost to somewhere else (e.g. the "when the thread’s runloop next iterates.")?

I think that's true, yeah. Disassembling setContents:, I found that it locks the CATransaction and inserts the change into that (such that it will be applied later).

I suspect that the real problem is actually in the way that Winit schedules redraws such that they happen outside display/drawRect: (and thereby outside the transaction) in the first place (good old rust-windowing/winit#2640 strikes again).

MarijnS95 · 2025-08-13T21:12:34Z

Thanks for looking into that! Some quick local testing shows that ::commit() on a MacBook Air M4 takes about 3.7ms on average on the animation example.

Could it be that this call is blocking, when applications use it directly? Assigning a completion handler shows that it completes at around the same time. Curious how you "disassembled" so that we can look into ::commit() instead.

Never ::begin()ing a new transaction shows that the completion handler is now called between 5-15ms (with some 21ms outliers) after setContents(), so I'm really curious if we're just trading some predicable/visible "CPU overhead" (or blocking? - need to attach a profiler) for increased latency?
EDIT: Adding a frame index shows that these transactions are completing out of order a bunch of times...

Note that animationDuration is equal to 0.25 by default (and the animationTimingFunction is None) - setting that to 0f64 shows a consistent delay of 250µs in the completion handler...

madsmtm · 2025-08-13T21:33:38Z

Curious how you "disassembled" so that we can look into ::commit() instead.

I did lldb target/debug/examples/rectangle and set a few breakpoints.

Never ::begin()ing a new transaction

Uhh, pretty sure that's invalid use of the API, otherwise you may be committing work done by something higher in your call stack.

Could it be that this call is blocking, when applications use it directly? Assigning a completion handler shows that it completes at around the same time.

I'm really curious if we're just trading some predicable/visible "CPU overhead" (or blocking? - need to attach a profiler) for increased latency?

I don't completely know how CATransaction::commit works when outside of a draw call issued by the OS (such as -[NSView drawRect:]), but I think it submits the result to the compositor immediately. Testing on current master by calling .present() a hundred times per requested redraw in the rectangle example seems to back this theory up.

And yeah, with this PR, you will get a bit of latency here, in that the result is not actually presented immediately, but instead only presented the next time the OS renders.

(I'm pretty sure all of these issues just go away if the Winit issue was fixed, since then the CATransaction would know that it was run inside a draw call by the OS, and the commit wouldn't render immediately).

MarijnS95 · 2025-08-13T21:38:46Z

Apologies, I meant to also skip ::commit(), i.e. it the layer update and the callback to time this would run later in that thread's runloop that I linked before, and what you do in this PR.

Curious to see those complete out of order, some frames taking very long.

I should've assumed the debugger might have "enough" debug symbols to see what is going on under the hood 😅

And yeah, I've been wanting to write better present-timing abstractions in Winit for years. RedrawRequested is also fundamentally broken on Android. Curious if you get less delay on Mac if it's running at a closer time before vblank (or does it always run right after the previous vblank?).

madsmtm · 2025-08-13T22:29:59Z

Curious if you get less delay on Mac if it's running at a closer time before vblank (or does it always run right after the previous vblank?).

No idea honestly, and unsure of how I'd test it?

MarijnS95 · 2025-08-13T22:45:42Z

No idea honestly, and unsure of how I'd test it?

This is what I did, perhaps we could add it to that draw(Rect): callback and see how much of a delay it has, respectively?

let s = Instant::now();
unsafe { self.imp.layer.setContents(Some(image.as_ref())) };

static mut FRAME: u32 = 0;
let frame = unsafe { FRAME };
unsafe { FRAME += 1 };

unsafe {
    CATransaction::setCompletionBlock(Some(
        // Does this clone or otherwise reference the block? After the move,
        // the closure lifetime is 'static and could use StackBlock as well?
        block2::RcBlock::new(move || {
            println!("{frame:0>6}: {:?}", s.elapsed());
        })
        .deref(),
    ))
};

nicoburns · 2025-08-20T21:15:27Z

This PR seems to have a memory problem. I am able to get memory usage to spike as high as 3GB+ with this PR just by scrolling my app (which it causes it to render frames). Interestingly, it does drop if I resize the window, but only down to ~700mb. Rendering this same app with pixels it sits at around 150mb.

madsmtm added enhancement New feature or request CoreGraphics macOS/iOS/tvOS/watchOS/visionOS backend labels Aug 12, 2025

madsmtm changed the title ~~Avoid the explicit CATransaction commit~~ Avoid the explicit CATransaction Aug 12, 2025

madsmtm mentioned this pull request Aug 12, 2025

Investigate more optimal way to implement CoreGraphics backend #83

Open

madsmtm force-pushed the cg-avoid-transaction branch from cecb0bc to e8ddecc Compare August 12, 2025 22:40

notgull requested changes Aug 13, 2025

View reviewed changes

madsmtm force-pushed the cg-avoid-transaction branch from e8ddecc to 91c1904 Compare August 13, 2025 15:04

Avoid an explicit CATransaction commit

e23a8fb

These are expensive, and the layer should be able to figure out the timing of when to render by itself.

madsmtm force-pushed the cg-avoid-transaction branch from 91c1904 to e23a8fb Compare August 13, 2025 15:26

Avoid the explicit CATransaction #275

Are you sure you want to change the base?

Avoid the explicit CATransaction #275

Conversation

madsmtm commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notgull Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

madsmtm Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

madsmtm Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

notgull Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

notgull Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

nicoburns commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madsmtm commented Aug 13, 2025

Uh oh!

nicoburns commented Aug 13, 2025

Uh oh!

madsmtm commented Aug 13, 2025

Uh oh!

MarijnS95 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madsmtm commented Aug 13, 2025

Uh oh!

MarijnS95 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madsmtm commented Aug 13, 2025

Uh oh!

MarijnS95 commented Aug 13, 2025

Uh oh!

madsmtm commented Aug 13, 2025

Uh oh!

MarijnS95 commented Aug 13, 2025

Uh oh!

nicoburns commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Avoid the explicit `CATransaction` #275

Avoid the explicit `CATransaction` #275

madsmtm commented Aug 12, 2025 •

edited

Loading

nicoburns commented Aug 13, 2025 •

edited

Loading

MarijnS95 commented Aug 13, 2025 •

edited

Loading

MarijnS95 commented Aug 13, 2025 •

edited

Loading

nicoburns commented Aug 20, 2025 •

edited

Loading