Skip to content

Improve pacing, latency, and add tweakable options #106221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

darksylinc
Copy link
Contributor

@darksylinc darksylinc commented May 9, 2025

Supersedes #105496
Supersedes #105435

It merges both PRs into one unified solution.

Fixes an unrelated bug when NAVIGATION_2D_DISABLED or NAVIGATION_3D_DISABLED are defined.

This is a heavily modified version of #100031

Reasons

Waitable Swapchains (#105496) were written after the AUTO code (#105435). But it became abundantly clear that waitable swapchains where many times superior at doing the same job as AUTO.

However not all GPUs support the necessary Vulkan extension on all platforms (notably AMD doesn't support it on Windows); which is why AUTO code was kept around to be used as a fallback solution when waitable swapchains are not available.

The way the user interacts with these settings has been simplified.

New Features

This PR adds:

  • rendering/rendering_device/vsync/latency_mode which supports 4 options:
    • low_extreme (only available through the GDScript API and command line interface. Cannot be set by default)
    • low (default)
    • medium
    • high_throughput
  • PacingMethod which describes which method is being used (None, AUTO, Waitable Swapchains, Android Swappy).
  • A CLI parameter --latency-mode to override the latency mode (low_extreme, low, etc).
  • A debug CLI parameter --pacing-mode-mask <mask> (mask is a hex number, where valid combinations are bits OR'ed from PacingMethod enum). This prevents Godot from using certain pacing modes; which are useful for debugging (or troubleshooting a bug in pacing methods).
  • Monitors to debug and understand by AUTO decides to use SEQ or PAR. Even if Waitable Swapchains are being used; these monitor values are very useful for detecting jitter and stutter.

Changes from previous PRs

  • AUTO is the only option. The user cannot specify between AUTO, CPU_GPU_SYNC_SEQUENTIAL & CPU_GPU_SYNC_PARALLEL. Which setting to use is derived from the latency_mode setting.
  • Setting low_extreme will force CPU_GPU_SYNC_SEQUENTIAL. This behavior is consistent with Waitable Swapchains when using low_extreme. Because of how aggressive this setting can be, it's not possible to set it by default.

Documentation

I have written documentation about these new features and troubleshooting pacing issues; which will be pushed to godot-docs but it's not ready yet.

This has been a monumental testing task; and this PR should close most of the tickets that are complaining about pacing issues.

After thorough testing; I've come to the conclussion that most (if not all?) pacing issues that would remain are not caused by Godot, but rather by external SW or HW. Which is why I'm writing the documentation to troubleshoot all those issues.

Testing

Latency tester written by KeyboardDanni is very good.

The improvements are massive. Before this PR on some systems latency could be as bad as 5 frames. With this new setting it's anywhere between 0 and 3 depending on settings, OS and hardware.

Device API OS Setting PresentMode FrameType Low Extreme Low Medium Vanilla
NVIDIA GeForce 1060 3GB Vulkan Windowed W10 DXGI Composed Flip 1 2 3 5
NVIDIA GeForce 1060 3GB Vulkan Fullscreen W10 DXGI Composed Flip 1 2 3 5
NVIDIA GeForce 1060 3GB Vulkan Windowed W10 Native Composed Copy with GPU GDI 1 1 1 1
NVIDIA GeForce 1060 3GB Vulkan Fullscreen W10 Native Hardware Legacy Flip 0 1 2 3
NVIDIA GeForce 1060 3GB D3D12 Windowed W10 N/A Composed Flip 3 3 3 4
NVIDIA GeForce 1060 3GB D3D12 Fullscreen W10 N/A Composed Flip 3 4 3 4
AMD Radeon RX 6800 XT 16GB Vulkan Windowed W10 N/A Hardware Composed Independent Flip N/A N/A N/A 3
AMD Radeon RX 6800 XT 16GB Vulkan Fullscreen W10 N/A Hardware Composed Independent Flip N/A N/A N/A 3
AMD Radeon RX 6800 XT 16GB D3D12 Windowed W10 N/A Hardware Composed Independent Flip 2 2 2 3
AMD Radeon RX 6800 XT 16GB D3D12 Fullscreen W10 N/A Hardware Composed Independent Flip 2 3 2 3
AMD Radeon RX 560 2GB Vulkan Windowed Linux N/A X11 No Compositor 0 1 2 3
AMD Radeon RX 560 2GB Vulkan Fullscreen Linux N/A X11 No Compositor 0 1 2 2
AMD Radeon RX 6800 XT 16GB Vulkan Windowed Linux N/A X11 picom 2 3 4 5
AMD Radeon RX 6800 XT 16GB Vulkan Fullscreen Linux N/A X11 picom 1 2 3 4
AMD Radeon RX 6800 XT 16GB Vulkan Windowed Linux N/A X11 No Compositor 1 2 3 4
AMD Radeon RX 6800 XT 16GB Vulkan Fullscreen Linux N/A X11 No Compositor 0 1 1 1
AMD Radeon RX 560 2GB Vulkan Windowed W11 N/A Composed Copy with GPU GDI N/A N/A N/A 5
AMD Radeon RX 560 2GB Vulkan Fullscreen W11 N/A Hardware Legacy Flip N/A N/A N/A 2
AMD Radeon RX 560 2GB D3D12 Windowed W11 N/A Composed Flip 3 3 3 4
AMD Radeon RX 560 2GB D3D12 Fullscreen W11 N/A Hardware Independent Flip 2 3 2 3
NVIDIA GeForce 1060 3GB Vulkan Windowed W11 DXGI Composed Flip 1 2 3 5
NVIDIA GeForce 1060 3GB Vulkan Fullscreen W11 DXGI Hardware Independent Flip 1 2 3 5
NVIDIA GeForce 1060 3GB Vulkan Windowed W11 Native Composed Copy with GPU GDI 1 1 1 1
NVIDIA GeForce 1060 3GB Vulkan Fullscreen W11 Native Hardware Legacy Flip 0 1 2 3
NVIDIA GeForce 1060 3GB D3D12 Windowed W10 N/A Composed Flip 3 3 3 4
NVIDIA GeForce 1060 3GB D3D12 Fullscreen W10 N/A Hardware Independent Flip 2 3 2 3
AMD Radeon RX 6800 XT 16GB* Vulkan Windowed W10 N/A Composed Flip N/A N/A N/A 4
AMD Radeon RX 6800 XT 16GB* Vulkan Fullscreen W10 N/A Hardware Independent Flip N/A N/A N/A 3
  • Dual Monitor

Documentation

See godotengine/godot-docs#10952

@darksylinc darksylinc requested review from a team as code owners May 9, 2025 21:28
@darksylinc darksylinc force-pushed the low-lat-waitable-swapchain branch from e17ebba to 961faae Compare May 9, 2025 22:01
@darksylinc
Copy link
Contributor Author

darksylinc commented May 9, 2025

Does anybody understand why MSVC thinks this is unreachable code?

RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
	if (available_pacing_methods.has_flag(PACING_METHOD_ANDROID_SWAPPY)) {
		return PACING_METHOD_ANDROID_SWAPPY;
	}
	if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
		return PACING_METHOD_WAITABLE_SWAPCHAIN;
	}
	if (available_pacing_methods.has_flag(PACING_METHOD_SEQUENTIAL_SYNC) && p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
		return PACING_METHOD_SEQUENTIAL_SYNC;
	}
	return PACING_METHOD_NONE;
}

At first I thought it was smart enough to figure out PACING_METHOD_ANDROID_SWAPPY is never set on Windows, so I commented it out. But then it complained on the next one:

> complains on this line <	if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
		return PACING_METHOD_WAITABLE_SWAPCHAIN;
	}

This is just wrong.

Update: I tried changing it to this code:

RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
	RDD::PacingMethod method = PACING_METHOD_NONE;
	if (available_pacing_methods.has_flag(PACING_METHOD_ANDROID_SWAPPY)) {
		method = PACING_METHOD_ANDROID_SWAPPY;
	}
	if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
		method = PACING_METHOD_WAITABLE_SWAPCHAIN;
	}
	if (available_pacing_methods.has_flag(PACING_METHOD_SEQUENTIAL_SYNC) && p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
		method = PACING_METHOD_SEQUENTIAL_SYNC;
	}
	return method;
}

And it complains on this line! RDD::PacingMethod method = PACING_METHOD_NONE; WTF???

Update 2:

// Complains
RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
	if (p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
		return PACING_METHOD_SEQUENTIAL_SYNC;
	}
	return PACING_METHOD_NONE;
}

// Does NOT complain
RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
	if (p_sequential_sync) {
		return PACING_METHOD_SEQUENTIAL_SYNC;
	}
	return PACING_METHOD_NONE;
}

@AThousandShips AThousandShips changed the title Improve pacing, latency and add tweakable options Improve pacing, latency, and add tweakable options May 10, 2025
@AThousandShips AThousandShips added this to the 4.x milestone May 10, 2025
@darksylinc darksylinc force-pushed the low-lat-waitable-swapchain branch from 961faae to 7344985 Compare May 12, 2025 14:27
@darksylinc darksylinc force-pushed the low-lat-waitable-swapchain branch 2 times, most recently from d2e535c to 1a38bba Compare May 19, 2025 18:34
Comment on lines 2763 to 2764
Godot will prefer maximizing FPS (frames per second), with no consideration for latency.
This setting is ideal for apps that have no user interaction, like servers or headless processes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Godot will prefer maximizing FPS (frames per second), with no consideration for latency.
This setting is ideal for apps that have no user interaction, like servers or headless processes.
Godot will prefer maximizing FPS (frames per second), with no consideration for latency. This setting is ideal for apps that have no user interaction, like servers or headless processes.

The high throughput mode may also be useful for Movie Maker mode, but I'll need to benchmark that beforehand to make sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: High Throughput makes Godot behave just the way it it does before this PR. This PR (potentially) negatively affects throughput to improve latency. But this is undesired for tasks that need to run without user interaction, hence high_throughput mode. But it won't improve performance compared to Godot before the PR.

Comment on lines +429 to +433
RD *device = RD::get_singleton();
if (device) {
const int latency_mode = int(GLOBAL_GET("rendering/rendering_device/vsync/latency_mode")) + 1;
device->set_latency_mode((RD::LatencyMode)latency_mode);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if we should have an editor setting for the latency mode, like we do for V-Sync mode already. Most of the time, you'll want the editor to be using a low-latency setting to ensure good UI responsiveness.

In particular, I foresee some users willing to go for the low_extreme latency mode, which isn't available in the project settings but could be made available in the editor settings. Of course, if we decide to make it possible, we'll have to add a warning label that's displayed below the FPS counter in the 3D editor's View Frame Time panel.

</constant>
<constant name="FRAME_PACING_EVALUATED_SYNC_MODE" value="62" enum="Monitor">
The mode decided by Godot that we should be in for each frame based on Total Time. "1" means we should be in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we should be in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL].
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is.
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is. The higher the monitor refresh rate, the higher the system requirements are for the line to be flat.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on Linux + KDE X11 with compositing off + NVIDIA, it works as expected in all rendering methods. With the default settings, this gets rid of 1 frame of latency on Forward+/Mobile compared to master.

However, latency is still 1 frame higher than OpenGL (which achieves the same latency in master and this PR). You can achieve the same latency as OpenGL by using the low_extreme latency mode, but this comes at the cost of throughput. I wonder if we can do anything to match OpenGL latency without compromising on throughput.

Code looks good to me.

Monitors on the 3D Platformer demo, running around the level:

60 Hz monitor

Image

240 Hz monitor

Image

480 Hz monitor

Image

The higher your monitor refresh rate, the less likely the low-latency mode will be used as part of automatic detection (due to frametime variations). However, it probably won't make much of a noticeable difference when each frame is only 2.1 millisesconds.

On Compatibility, some of the monitors will always be 0:

image

We don't have an explicit way of marking them as unsupported, but we should investigate implementing something like that in the future.

- Add CPU_GPU_SYNC_AUTO
- Remove redundant calls to get_ticks_usec()
- Add waitable swapchain
- Add rendering/rendering_device/vsync/latency_mode which supports 4
options:
low_extreme (only available through the GDScript API and command
line interface. Cannot be set by default)
    low (default)
    medium
    high_throughput
- Add PacingMethod which describes which method is being used (None,
AUTO, Waitable Swapchains, Android Swappy).
- Add CLI parameter --latency-mode to override the latency mode
(low_extreme, low, etc).
- Add debug CLI parameter --pacing-mode-mask <mask> (mask is a hex
number, where valid combinations are bits OR'ed from PacingMethod enum).
This prevents Godot from using certain pacing modes; which are useful
for debugging (or troubleshooting a bug in pacing methods).
- Add Monitors to debug and understand by AUTO decides to use SEQ or
PAR. Even if Waitable Swapchains are being used; these monitor values
are very useful for detecting jitter and stutter.

Fixes an unrelated bug when NAVIGATION_2D_DISABLED or
NAVIGATION_3D_DISABLED are defined

Co-authored-by: Danni <[email protected]>
Co-authored-by: Matias N. Goldberg <[email protected]>
@darksylinc darksylinc force-pushed the low-lat-waitable-swapchain branch from 1a38bba to d3369a8 Compare May 20, 2025 14:34
@KeyboardDanni
Copy link
Contributor

KeyboardDanni commented Jun 25, 2025

Thanks for your work on this!

Tested on Windows 11, NVIDIA GeForce RTX 3070, driver version 572.60. Seems to have a few unexpected issues.

  • The default low_latency value of low seems to limit the FPS in windowed mode even with V-Sync Off (and the resulting framepacing seems worse than V-Sync On). At 240hz with VRR enabled, it's limited to 240 FPS. At 60hz with VRR disabled, it's limited to 90 FPS, strangely.
  • When fullscreen, it often prints ERROR: vkWaitForPresentKHR timeout exceeded, which has me wondering if the waitable swapchain is working correctly. Supposedly there have been some recent driver fixes for this extension, so I will update my drivers and try again. (Edit: Updated drivers, problem persists)

It's unclear to me what exactly the different latency modes are doing behind the scenes, and when I would want to pick one over the other. It seems like the latency modes and framepacing are going hand in hand somehow, but how framepacing is affected is unclear to me. I would rather have these separately adjustable especially since manual framepacing sometimes does more harm than good when compositing is involved.

I think enforcing a reasonable config is fine, as long as there's some form of transparency. Part of the reason the current presentation/framepacing story is such a mess is because of everyone (drivers, compositors, apps, tweak tools) trying to be a framerate limiter (that, and users making ill-advised changes to their system configuration). That is, stacking more and more solutions on top of each other instead of starting over with the basics and only adding complexity as necessary. Early versions of the Unity Linux runtime forced V-Sync off and limited the framerate if a compositor was detected, and the result felt pretty choppy (worse, there was no way for the user to override this if their system supported direct scanout for fullscreen apps).

Ideally everything should just be direct scanout with the app trusting that the swapchain will do its job, unless the user intentionally limits the FPS to a certain number in game settings, either to save power or have a more stable framerate. Sadly direct scanout isn't always available (a recent Windows 11 update broke windowed optimizations when I have both monitors enabled), so I think we should take the time to investigate how best to handle this situation, but leave such framepacing as an optional tweakable on its own in case it causes problems.

However, latency is still 1 frame higher than OpenGL (which achieves the same latency in master and this PR). You can achieve the same latency as OpenGL by using the low_extreme latency mode, but this comes at the cost of throughput. I wonder if we can do anything to match OpenGL latency without compromising on throughput.

I haven't been able to get latency to match sequential OpenGL with either this PR or #100031. And the only way I can get this PR to the same latency as mine is via low_extreme, which is strange as the default low reports CPU_GPU_SYNC_SEQUENTIAL for the actual sync mode. If both modes report the same sync mode, I would expect them to have the same display latency with the same swapchain settings.

Granted, a latency of 2 frames is still pretty good, but with the level of performance hit, I would expect 1 frame like OpenGL can achieve. So there's probably something else causing the latency here.

@KeyboardDanni
Copy link
Contributor

KeyboardDanni commented Jun 26, 2025

Tried D3D12 to see if waitable swapchain works there, but I can't seem to launch Godot reliably. Sometimes I get a crash on D3D12 initialization in LoadLibraryW(L"DXGI.dll"), sometimes I can get to the editor but it's very slow, and sometimes the window stays blank and it spams the console with

RenderingDeviceDriverD3D12::swap_chain_resize: Parameter "pfn_DCompositionCreateDevice" is null. <C++ Source> drivers\d3d12\rendering_device_driver_d3d12.cpp:2647 @ RenderingDeviceDriverD3D12::swap_chain_resize()

Attempting to close normally throws an exception in HeapFree

godot.windows.editor.x86_64.exe!_free_base(void * block) Line 105
[Inline Frame] godot.windows.editor.x86_64.exe!memdelete(MethodBind * p_class) Line 142
godot.windows.editor.x86_64.exe!ClassDB::cleanup() Line 2361
godot.windows.editor.x86_64.exe!unregister_core_types() Line 484
godot.windows.editor.x86_64.exe!Main::cleanup(bool p_force) Line 5128

This is on Windows 11 24H2, latest nVidia drivers (576.80), and the following monitor setup:

Monitor 1: 1920x1080, 239.96hz, G-Sync Compatible enabled
Monitor 2: 1360x768 (active signal mode 1920x1080), 60hz, no VRR

Monitor 1 is the primary, and where I tried to launch Godot. MPO is enabled. Both monitors are set to 100% DPI scale. Neither monitor has HDR.

D3D12 seems to work fine in master.

@@ -2628,6 +2635,11 @@ Error RenderingDeviceDriverD3D12::swap_chain_resize(CommandQueueID p_cmd_queue,
ERR_FAIL_COND_V(!SUCCEEDED(res), ERR_CANT_CREATE);
}

if (creation_flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) {
swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(frames.size()));
Copy link
Contributor

@KeyboardDanni KeyboardDanni Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe SetMaximumFrameLatency() is meant to control the effective swapchain size, i.e. how many backbuffers can be queued on the swapchain before present calls will block, with 1 being a double-buffered swapchain, and 2 being triple-buffered. This line is setting it to the CPU submission queue size, which doesn't seem correct. By changing this line to:

swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(CLAMP(p_desired_framebuffer_count - 1, 1, 3)));

I was able to get 2 frames of latency. Otherwise, 3 was the best I could manage, which is a regression compared to #100031.

@darksylinc
Copy link
Contributor Author

The default low_latency value of low seems to limit the FPS in windowed mode even with V-Sync Off (and the resulting framepacing seems worse than V-Sync On). At 240hz with VRR enabled, it's limited to 240 FPS. At 60hz with VRR disabled, it's limited to 90 FPS, strangely.

That sounds like our bug. I will check it out.

When fullscreen, it often prints ERROR: vkWaitForPresentKHR timeout exceeded, which has me wondering if the waitable swapchain is working correctly. Supposedly there have been some recent driver fixes for this extension, so I will update my drivers and try again. (Edit: Updated drivers, problem persists)

NVIDIA or Windows 11 bug. I'll have to talk it with NV. It definitely stems from your multimonitor setup. Disabling one monitor will fix it, but this shouldn't happen at all.

Part of the reason the current presentation/framepacing story is such a mess is because of everyone (drivers, compositors, apps, tweak tools) trying to be a framerate limiter (that, and users making ill-advised changes to their system configuration).

Agreed!

I haven't been able to get latency to match sequential OpenGL

OpenGL was left out from this PR because there were various issues with different driver/HW causing massive stutter.

Tried D3D12 to see if waitable swapchain works there, but I can't seem to launch Godot reliably. Sometimes I get a crash on D3D12 initialization in LoadLibraryW(L"DXGI.dll"), sometimes I can get to the editor but it's very slow, and sometimes the window stays blank and it spams the console with

That sounds weird. As if something's wrong with your installation or the way it was compiled. (I mean, DXGI.dll and pfn_DCompositionCreateDevice should be included with Win 11) and there were not significant deviations from master.

I believe SetMaximumFrameLatency() is meant to control the effective swapchain size, i.e. how many backbuffers can be queued on the swapchain before present calls will block, with 1 being a double-buffered swapchain, and 2 being triple-buffered. This line is setting it to the CPU submission queue size, which doesn't seem correct. By changing this line to:

I'll have to think about it. It's true that SetMaximumFrameLatency() and swapchain count are extremely related.
But there are 3 things "running" in parallel:

  1. CPU preparing commands
  2. GPU executing work
  3. Presentation

A very low swapchain count makes GPU wait on Presentation (i.e. GPU wait on GPU) while a very low
SetMaximumFrameLatency makes the CPU wait on Presentation. I need to think it about this detail.

@KeyboardDanni
Copy link
Contributor

KeyboardDanni commented Jun 27, 2025

The default low_latency value of low seems to limit the FPS in windowed mode even with V-Sync Off (and the resulting framepacing seems worse than V-Sync On). At 240hz with VRR enabled, it's limited to 240 FPS. At 60hz with VRR disabled, it's limited to 90 FPS, strangely.

That sounds like our bug. I will check it out.

I think it's specific to the presentation mode being "Composed Flip". I don't have the problem with "Hardware Composed: Independent Flip".

Aside: it's true that you can't really get tearing in this mode, but having an uncapped framerate is still useful for a few things (benchmarking, and it also kind of acts like "mailbox" V-Sync). It's not great for energy use but that's orthogonal to what we're doing here.

When fullscreen, it often prints ERROR: vkWaitForPresentKHR timeout exceeded, which has me wondering if the waitable swapchain is working correctly. Supposedly there have been some recent driver fixes for this extension, so I will update my drivers and try again. (Edit: Updated drivers, problem persists)

NVIDIA or Windows 11 bug. I'll have to talk it with NV. It definitely stems from your multimonitor setup. Disabling one monitor will fix it, but this shouldn't happen at all.

It's definitely a weird setup. The second monitor is a drawing tablet I just sorta repurposed as a second monitor for displaying widgets and watching videos while doing other things. Whenever I wake my system from sleep, there's usually some point where the graphics driver will freeze and then crash, killing Godot and most other game engines and causing issues in other apps due to the graphics reset.

But even with this setup, "Hardware Composed: Independent Flip" used to work in windowed mode. It worked in Windows 11 23H2. But since the update to 24H2, if I have both monitors on, it only does "Composed Flip", as if it thinks the window is spread across both monitors even though it's only on the first display. If I turn off the second monitor, windowed optimizations work again.

Having said that, I did try turning off the second monitor for this test and I still get the vkWaitForPresentKHR errors in fullscreen. It doesn't seem to hurt the FPS at all (I still get 1800+), I just get an error in the console about it every second I'm in fullscreen due to repeatedly hitting the timeout.

Well, okay, I did some more testing and the errors only happen if the driver is layering the swapchain on DXGI. If I force native presentation in the nVidia Control Panel, the errors go away. So it seems like nVidia hasn't hooked this extension up to their layered DXGI feature (might be easier said than done though, as Vulkan and DXGI both handle this a bit differently).

I haven't been able to get latency to match sequential OpenGL

OpenGL was left out from this PR because there were various issues with different driver/HW causing massive stutter.

Oh, by that comment I meant that I can still only get latency in RenderingDevice down to 2 frames, while sequential OpenGL can manage 1 frame. It's something I'd like to investigate. I was hoping that waitable swapchain would be the last part of that puzzle, but it looks like the solution sits elsewhere.

That sounds weird. As if something's wrong with your installation or the way it was compiled. (I mean, DXGI.dll and pfn_DCompositionCreateDevice should be included with Win 11) and there were not significant deviations from master.

It doesn't always happen. And the crash isn't on the DLL load itself, but the DXGI initialization that happens as a result of loading the DLL. Almost like there's some memory corruption going on. I'll poke this further and see if I can figure out what's going on.

I believe SetMaximumFrameLatency() is meant to control the effective swapchain size, i.e. how many backbuffers can be queued on the swapchain before present calls will block, with 1 being a double-buffered swapchain, and 2 being triple-buffered. This line is setting it to the CPU submission queue size, which doesn't seem correct. By changing this line to:

I'll have to think about it. It's true that SetMaximumFrameLatency() and swapchain count are extremely related. But there are 3 things "running" in parallel:

  1. CPU preparing commands
  2. GPU executing work
  3. Presentation

A very low swapchain count makes GPU wait on Presentation (i.e. GPU wait on GPU) while a very low SetMaximumFrameLatency makes the CPU wait on Presentation. I need to think it about this detail.

Even without SetMaximumFrameLatency, at some point the CPU will get stuck waiting until there's a swapchain image to acquire. Or at least, that's how it works in older 3D APIs. I'm less familiar with the model used in newer APIs since there's a focus on doing things as part of the GPU's command queue. I suppose in that case, the stall would eventually occur on the CPU submission queue being full... which is kind of like a CPU wait on swapchain, by proxy.

By the way, if you have a line of communication with nVidia, I'd like to hear what they think about using WGL_NV_DX_interop2 to present OpenGL using DXGI. I've attempted to salvage #94503 but between the two techniques available, the "direct to framebuffer" method seriously harms max FPS, while the "intermediate buffer" method causes other apps to become choppy when Godot's at high FPS. The latency is better than the driver's layered DXGI, but I'm not happy with a solution that causes the whole desktop to get sluggish. That's just not a good experience, especially if you want to watch a video on another display, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants