-
-
Notifications
You must be signed in to change notification settings - Fork 22.7k
Improve pacing, latency, and add tweakable options #106221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improve pacing, latency, and add tweakable options #106221
Conversation
e17ebba
to
961faae
Compare
Does anybody understand why MSVC thinks this is unreachable code? RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
if (available_pacing_methods.has_flag(PACING_METHOD_ANDROID_SWAPPY)) {
return PACING_METHOD_ANDROID_SWAPPY;
}
if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
return PACING_METHOD_WAITABLE_SWAPCHAIN;
}
if (available_pacing_methods.has_flag(PACING_METHOD_SEQUENTIAL_SYNC) && p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
return PACING_METHOD_SEQUENTIAL_SYNC;
}
return PACING_METHOD_NONE;
} At first I thought it was smart enough to figure out > complains on this line < if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
return PACING_METHOD_WAITABLE_SWAPCHAIN;
} This is just wrong. Update: I tried changing it to this code: RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
RDD::PacingMethod method = PACING_METHOD_NONE;
if (available_pacing_methods.has_flag(PACING_METHOD_ANDROID_SWAPPY)) {
method = PACING_METHOD_ANDROID_SWAPPY;
}
if (available_pacing_methods.has_flag(PACING_METHOD_WAITABLE_SWAPCHAIN) && latency_mode != LATENCY_MODE_HIGH_THROUGHPUT) {
method = PACING_METHOD_WAITABLE_SWAPCHAIN;
}
if (available_pacing_methods.has_flag(PACING_METHOD_SEQUENTIAL_SYNC) && p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
method = PACING_METHOD_SEQUENTIAL_SYNC;
}
return method;
} And it complains on this line! Update 2: // Complains
RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
if (p_sequential_sync && latency_mode <= LATENCY_MODE_LOW) {
return PACING_METHOD_SEQUENTIAL_SYNC;
}
return PACING_METHOD_NONE;
}
// Does NOT complain
RDD::PacingMethod RenderingDevice::get_current_pacing_method(bool p_sequential_sync) const {
if (p_sequential_sync) {
return PACING_METHOD_SEQUENTIAL_SYNC;
}
return PACING_METHOD_NONE;
} |
961faae
to
7344985
Compare
d2e535c
to
1a38bba
Compare
doc/classes/RenderingDevice.xml
Outdated
Godot will prefer maximizing FPS (frames per second), with no consideration for latency. | ||
This setting is ideal for apps that have no user interaction, like servers or headless processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godot will prefer maximizing FPS (frames per second), with no consideration for latency. | |
This setting is ideal for apps that have no user interaction, like servers or headless processes. | |
Godot will prefer maximizing FPS (frames per second), with no consideration for latency. This setting is ideal for apps that have no user interaction, like servers or headless processes. |
The high throughput mode may also be useful for Movie Maker mode, but I'll need to benchmark that beforehand to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: High Throughput makes Godot behave just the way it it does before this PR. This PR (potentially) negatively affects throughput to improve latency. But this is undesired for tasks that need to run without user interaction, hence high_throughput mode. But it won't improve performance compared to Godot before the PR.
RD *device = RD::get_singleton(); | ||
if (device) { | ||
const int latency_mode = int(GLOBAL_GET("rendering/rendering_device/vsync/latency_mode")) + 1; | ||
device->set_latency_mode((RD::LatencyMode)latency_mode); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder if we should have an editor setting for the latency mode, like we do for V-Sync mode already. Most of the time, you'll want the editor to be using a low-latency setting to ensure good UI responsiveness.
In particular, I foresee some users willing to go for the low_extreme
latency mode, which isn't available in the project settings but could be made available in the editor settings. Of course, if we decide to make it possible, we'll have to add a warning label that's displayed below the FPS counter in the 3D editor's View Frame Time panel.
doc/classes/Performance.xml
Outdated
</constant> | ||
<constant name="FRAME_PACING_EVALUATED_SYNC_MODE" value="62" enum="Monitor"> | ||
The mode decided by Godot that we should be in for each frame based on Total Time. "1" means we should be in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we should be in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL]. | ||
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is. | |
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is. The higher the monitor refresh rate, the higher the system requirements are for the line to be flat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally on Linux + KDE X11 with compositing off + NVIDIA, it works as expected in all rendering methods. With the default settings, this gets rid of 1 frame of latency on Forward+/Mobile compared to master
.
However, latency is still 1 frame higher than OpenGL (which achieves the same latency in master
and this PR). You can achieve the same latency as OpenGL by using the low_extreme
latency mode, but this comes at the cost of throughput. I wonder if we can do anything to match OpenGL latency without compromising on throughput.
Code looks good to me.
Monitors on the 3D Platformer demo, running around the level:
60 Hz monitor
240 Hz monitor
480 Hz monitor
The higher your monitor refresh rate, the less likely the low-latency mode will be used as part of automatic detection (due to frametime variations). However, it probably won't make much of a noticeable difference when each frame is only 2.1 millisesconds.
On Compatibility, some of the monitors will always be 0:
We don't have an explicit way of marking them as unsupported, but we should investigate implementing something like that in the future.
- Add CPU_GPU_SYNC_AUTO - Remove redundant calls to get_ticks_usec() - Add waitable swapchain - Add rendering/rendering_device/vsync/latency_mode which supports 4 options: low_extreme (only available through the GDScript API and command line interface. Cannot be set by default) low (default) medium high_throughput - Add PacingMethod which describes which method is being used (None, AUTO, Waitable Swapchains, Android Swappy). - Add CLI parameter --latency-mode to override the latency mode (low_extreme, low, etc). - Add debug CLI parameter --pacing-mode-mask <mask> (mask is a hex number, where valid combinations are bits OR'ed from PacingMethod enum). This prevents Godot from using certain pacing modes; which are useful for debugging (or troubleshooting a bug in pacing methods). - Add Monitors to debug and understand by AUTO decides to use SEQ or PAR. Even if Waitable Swapchains are being used; these monitor values are very useful for detecting jitter and stutter. Fixes an unrelated bug when NAVIGATION_2D_DISABLED or NAVIGATION_3D_DISABLED are defined Co-authored-by: Danni <[email protected]> Co-authored-by: Matias N. Goldberg <[email protected]>
1a38bba
to
d3369a8
Compare
Thanks for your work on this! Tested on Windows 11, NVIDIA GeForce RTX 3070, driver version 572.60. Seems to have a few unexpected issues.
It's unclear to me what exactly the different latency modes are doing behind the scenes, and when I would want to pick one over the other. It seems like the latency modes and framepacing are going hand in hand somehow, but how framepacing is affected is unclear to me. I would rather have these separately adjustable especially since manual framepacing sometimes does more harm than good when compositing is involved. I think enforcing a reasonable config is fine, as long as there's some form of transparency. Part of the reason the current presentation/framepacing story is such a mess is because of everyone (drivers, compositors, apps, tweak tools) trying to be a framerate limiter (that, and users making ill-advised changes to their system configuration). That is, stacking more and more solutions on top of each other instead of starting over with the basics and only adding complexity as necessary. Early versions of the Unity Linux runtime forced V-Sync off and limited the framerate if a compositor was detected, and the result felt pretty choppy (worse, there was no way for the user to override this if their system supported direct scanout for fullscreen apps). Ideally everything should just be direct scanout with the app trusting that the swapchain will do its job, unless the user intentionally limits the FPS to a certain number in game settings, either to save power or have a more stable framerate. Sadly direct scanout isn't always available (a recent Windows 11 update broke windowed optimizations when I have both monitors enabled), so I think we should take the time to investigate how best to handle this situation, but leave such framepacing as an optional tweakable on its own in case it causes problems.
I haven't been able to get latency to match sequential OpenGL with either this PR or #100031. And the only way I can get this PR to the same latency as mine is via Granted, a latency of 2 frames is still pretty good, but with the level of performance hit, I would expect 1 frame like OpenGL can achieve. So there's probably something else causing the latency here. |
Tried D3D12 to see if waitable swapchain works there, but I can't seem to launch Godot reliably. Sometimes I get a crash on D3D12 initialization in
Attempting to close normally throws an exception in
This is on Windows 11 24H2, latest nVidia drivers (576.80), and the following monitor setup: Monitor 1: 1920x1080, 239.96hz, G-Sync Compatible enabled Monitor 1 is the primary, and where I tried to launch Godot. MPO is enabled. Both monitors are set to 100% DPI scale. Neither monitor has HDR. D3D12 seems to work fine in |
@@ -2628,6 +2635,11 @@ Error RenderingDeviceDriverD3D12::swap_chain_resize(CommandQueueID p_cmd_queue, | |||
ERR_FAIL_COND_V(!SUCCEEDED(res), ERR_CANT_CREATE); | |||
} | |||
|
|||
if (creation_flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) { | |||
swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(frames.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe SetMaximumFrameLatency()
is meant to control the effective swapchain size, i.e. how many backbuffers can be queued on the swapchain before present calls will block, with 1 being a double-buffered swapchain, and 2 being triple-buffered. This line is setting it to the CPU submission queue size, which doesn't seem correct. By changing this line to:
swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(CLAMP(p_desired_framebuffer_count - 1, 1, 3)));
I was able to get 2 frames of latency. Otherwise, 3 was the best I could manage, which is a regression compared to #100031.
That sounds like our bug. I will check it out.
NVIDIA or Windows 11 bug. I'll have to talk it with NV. It definitely stems from your multimonitor setup. Disabling one monitor will fix it, but this shouldn't happen at all.
Agreed!
OpenGL was left out from this PR because there were various issues with different driver/HW causing massive stutter.
That sounds weird. As if something's wrong with your installation or the way it was compiled. (I mean, DXGI.dll and pfn_DCompositionCreateDevice should be included with Win 11) and there were not significant deviations from master.
I'll have to think about it. It's true that SetMaximumFrameLatency() and swapchain count are extremely related.
A very low swapchain count makes GPU wait on Presentation (i.e. GPU wait on GPU) while a very low |
I think it's specific to the presentation mode being "Composed Flip". I don't have the problem with "Hardware Composed: Independent Flip". Aside: it's true that you can't really get tearing in this mode, but having an uncapped framerate is still useful for a few things (benchmarking, and it also kind of acts like "mailbox" V-Sync). It's not great for energy use but that's orthogonal to what we're doing here.
It's definitely a weird setup. The second monitor is a drawing tablet I just sorta repurposed as a second monitor for displaying widgets and watching videos while doing other things. Whenever I wake my system from sleep, there's usually some point where the graphics driver will freeze and then crash, killing Godot and most other game engines and causing issues in other apps due to the graphics reset. But even with this setup, "Hardware Composed: Independent Flip" used to work in windowed mode. It worked in Windows 11 23H2. But since the update to 24H2, if I have both monitors on, it only does "Composed Flip", as if it thinks the window is spread across both monitors even though it's only on the first display. If I turn off the second monitor, windowed optimizations work again. Having said that, I did try turning off the second monitor for this test and I still get the Well, okay, I did some more testing and the errors only happen if the driver is layering the swapchain on DXGI. If I force native presentation in the nVidia Control Panel, the errors go away. So it seems like nVidia hasn't hooked this extension up to their layered DXGI feature (might be easier said than done though, as Vulkan and DXGI both handle this a bit differently).
Oh, by that comment I meant that I can still only get latency in RenderingDevice down to 2 frames, while sequential OpenGL can manage 1 frame. It's something I'd like to investigate. I was hoping that waitable swapchain would be the last part of that puzzle, but it looks like the solution sits elsewhere.
It doesn't always happen. And the crash isn't on the DLL load itself, but the DXGI initialization that happens as a result of loading the DLL. Almost like there's some memory corruption going on. I'll poke this further and see if I can figure out what's going on.
Even without By the way, if you have a line of communication with nVidia, I'd like to hear what they think about using |
Supersedes #105496
Supersedes #105435
It merges both PRs into one unified solution.
Fixes an unrelated bug when NAVIGATION_2D_DISABLED or NAVIGATION_3D_DISABLED are defined.
This is a heavily modified version of #100031
Reasons
Waitable Swapchains (#105496) were written after the AUTO code (#105435). But it became abundantly clear that waitable swapchains where many times superior at doing the same job as AUTO.
However not all GPUs support the necessary Vulkan extension on all platforms (notably AMD doesn't support it on Windows); which is why AUTO code was kept around to be used as a fallback solution when waitable swapchains are not available.
The way the user interacts with these settings has been simplified.
New Features
This PR adds:
rendering/rendering_device/vsync/latency_mode
which supports 4 options:PacingMethod
which describes which method is being used (None, AUTO, Waitable Swapchains, Android Swappy).--latency-mode
to override the latency mode (low_extreme, low, etc).--pacing-mode-mask <mask>
(mask is a hex number, where valid combinations are bits OR'ed fromPacingMethod
enum). This prevents Godot from using certain pacing modes; which are useful for debugging (or troubleshooting a bug in pacing methods).Changes from previous PRs
latency_mode
setting.low_extreme
will forceCPU_GPU_SYNC_SEQUENTIAL
. This behavior is consistent with Waitable Swapchains when usinglow_extreme
. Because of how aggressive this setting can be, it's not possible to set it by default.Documentation
I have written documentation about these new features and troubleshooting pacing issues; which will be pushed to godot-docs but it's not ready yet.
This has been a monumental testing task; and this PR should close most of the tickets that are complaining about pacing issues.
After thorough testing; I've come to the conclussion that most (if not all?) pacing issues that would remain are not caused by Godot, but rather by external SW or HW. Which is why I'm writing the documentation to troubleshoot all those issues.
Testing
Latency tester written by KeyboardDanni is very good.
The improvements are massive. Before this PR on some systems latency could be as bad as 5 frames. With this new setting it's anywhere between 0 and 3 depending on settings, OS and hardware.
Documentation
See godotengine/godot-docs#10952