Skip to content

Improve pacing, latency, and add tweakable options #106221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions core/config/project_settings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1641,6 +1641,7 @@ ProjectSettings::ProjectSettings() {

GLOBAL_DEF_RST(PropertyInfo(Variant::INT, "rendering/rendering_device/vsync/frame_queue_size", PROPERTY_HINT_RANGE, "2,3,1"), 2);
GLOBAL_DEF_RST(PropertyInfo(Variant::INT, "rendering/rendering_device/vsync/swapchain_image_count", PROPERTY_HINT_RANGE, "2,4,1"), 3);
GLOBAL_DEF(PropertyInfo(Variant::INT, "rendering/rendering_device/vsync/latency_mode", PROPERTY_HINT_ENUM, "low,medium,high_throughput"), 0);
GLOBAL_DEF(PropertyInfo(Variant::INT, "rendering/rendering_device/staging_buffer/block_size_kb", PROPERTY_HINT_RANGE, "4,2048,1,or_greater"), 256);
GLOBAL_DEF(PropertyInfo(Variant::INT, "rendering/rendering_device/staging_buffer/max_size_mb", PROPERTY_HINT_RANGE, "1,1024,1,or_greater"), 128);
GLOBAL_DEF(PropertyInfo(Variant::INT, "rendering/rendering_device/staging_buffer/texture_upload_region_size_px", PROPERTY_HINT_RANGE, "1,256,1,or_greater"), 64);
Expand Down
29 changes: 28 additions & 1 deletion doc/classes/Performance.xml
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,34 @@
<constant name="NAVIGATION_3D_OBSTACLE_COUNT" value="58" enum="Monitor">
Number of active navigation obstacles in the [NavigationServer3D].
</constant>
<constant name="MONITOR_MAX" value="59" enum="Monitor">
<constant name="FRAME_PACING_TOTAL_TIME" value="59" enum="Monitor">
Value used by Godot when PACING_METHOD_SEQUENTIAL_SYNC is available and no other better latency-reduction method is available. to determine whether we should be in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL] or in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL] mode. It is the sum of CPU Time + GPU Time. If the value is consistently high enough, Godot will determine to use PARALLEL, otherwise it will prefer SEQUENTIAL.
[b]Note:[/b] this value attempts to be bereft of any additional time caused from waiting for V-Sync, therefore it will not match any other timing value (e.g. actual FPS, time taken by physics, etc). It is an estimation of how long the system would take if CPU and GPU were to be processing a frame serially, without the added delay of waiting for V-Sync.
[b]Note:[/b] When using these monitors, it's best to set the Editor to a simple view like the Script tab to avoid the 2D/3D view from consuming system resources that could interfere with readings. Or better yet, run the Editor profiler in another machine.
</constant>
<constant name="FRAME_PACING_CPU_TIME" value="60" enum="Monitor">
How long CPU took to process the frame, bereft of waiting delays caused by V-Sync. This value is an approximation and might not match any other timing value. If this value is added to GPU Time, you get Total Time. Useful to know where to focus optimization efforts.
</constant>
<constant name="FRAME_PACING_GPU_TIME" value="61" enum="Monitor">
How long GPU took to process the frame, bereft of waiting delays caused by V-Sync. This value is an approximation and will not match any other timing value. If this value is added to CPU Time, you get Total Time. Useful to know where to focus optimization efforts.
</constant>
<constant name="FRAME_PACING_EVALUATED_SYNC_MODE" value="62" enum="Monitor">
The mode decided by Godot that we should be in for each frame based on Total Time. "1" means we should be in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we should be in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL].
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is. The higher the monitor refresh rate, the higher the system requirements are for the line to be flat.
</constant>
<constant name="FRAME_PACING_ACTUAL_SYNC_MODE" value="63" enum="Monitor">
The [b]actual[/b] mode the game currently is. "1" means we are in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we are in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL].
[b]Note:[/b] This value should be as flat as possible. Every time it switches between "1" and "2", the game may suffer a small stutter.
[b]Note:[/b] This value is ignored if PACING_METHOD_WAITABLE_SWAPCHAIN is available; or if [method RenderingDevice.get_latency_mode] is equal or higher than [constant RenderingDevice.LATENCY_MODE_MEDIUM]
</constant>
<constant name="FRAME_PACING_MISSED_HARD_TARGET" value="64" enum="Monitor">
The number of frames where the "Total Time" has exceeded the monitor's refresh rate or the max FPS (whichever is lower). This does not necessarily mean the game has missed a V-Blank (if the game is running in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], then total frame time should be lower than the sum of CPU Time + GPU Time; thus in practice the app may not have missed any V-Blank) but it indicates V-Blanks would've been missed if executing in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL]. The value is expressed in thousands.
For example one missed Hard Target will be shown as 1000. Two missed Hard Targets will be shown as 2000. This value decreases quickly over time. Missed Hard Targets weight heavily on Godot deciding to switch to PARALLEL to avoid degrading the experience further.
[b]Note:[/b] While in PARALLEL mode, this counter is always reset to 0 each new frame, thus while [constant FRAME_PACING_ACTUAL_SYNC_MODE] is 1, this value will be either 0 or 1000, where a flat 1000 line means the game is always failing to reach the target framerate.
[b]Note:[/b] Spikes in missed hard targets almost always means very visible stutter and thus should be avoided at all costs during gameplay. This value should be kept at 0 at all times. If the system isn't fast enough to keep the target framerate, this value should always be 1000 to keep pacing consistent.
[b]Note:[/b] Periodically failing this metric means you should optimize your content to run faster, avoid spikes, or increase [member ProjectSettings.rendering/rendering_device/vsync/latency_mode] to a higher latency mode.
</constant>
<constant name="MONITOR_MAX" value="65" enum="Monitor">
Represents the size of the [enum Monitor] enum.
</constant>
</constants>
Expand Down
5 changes: 5 additions & 0 deletions doc/classes/ProjectSettings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3211,6 +3211,11 @@
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions.
[b]Note:[/b] This property is only read when the project starts. There is currently no way to change this value at run-time.
</member>
<member name="rendering/rendering_device/vsync/latency_mode" type="int" setter="" getter="" default="0">
Sets the default latency mode. Lower is better for input-to-display latency, but it will sacrifice FPS (frames per second) in return. This setting can be changed at runtime via [method RenderingDevice.set_latency_mode] on [method RenderingServer.get_rendering_device]. See documentation for [constant RenderingDevice.LATENCY_MODE_LOW], [constant RenderingDevice.LATENCY_MODE_MEDIUM], and [constant RenderingDevice.LATENCY_MODE_HIGH_THROUGHPUT] for what each individual setting entails.
[b]Note:[/b] The setting [constant RenderingDevice.LATENCY_MODE_LOW_EXTREME] is not available through this property as it is strongly ill-advised to ship with this value as the default.
[b]Note:[/b] This property may be overridden with the [code]--latency-mode[/code] command-line argument. When this argument is used, this project setting is ignored.
</member>
<member name="rendering/rendering_device/vsync/swapchain_image_count" type="int" setter="" getter="" default="3">
The number of images the swapchain will consist of (back buffers + front buffer).
[code]2[/code] corresponds to double-buffering and [code]3[/code] to triple-buffering.
Expand Down
29 changes: 29 additions & 0 deletions doc/classes/RenderingDevice.xml
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,12 @@
Returns the frame count kept by the graphics API. Higher values result in higher input lag, but with more consistent throughput. For the main [RenderingDevice], frames are cycled (usually 3 with triple-buffered V-Sync enabled). However, local [RenderingDevice]s only have 1 frame.
</description>
</method>
<method name="get_latency_mode" qualifiers="const">
<return type="int" enum="RenderingDevice.LatencyMode" />
<description>
Returns the current latency mode used by this [RenderingDevice]. See [member ProjectSettings.rendering/rendering_device/vsync/latency_mode] for details.
</description>
</method>
<method name="get_memory_usage" qualifiers="const">
<return type="int" />
<param index="0" name="type" type="int" enum="RenderingDevice.MemoryType" />
Expand Down Expand Up @@ -782,6 +788,13 @@
[b]Note:[/b] Only the main [RenderingDevice] returned by [method RenderingServer.get_rendering_device] has a width. If called on a local [RenderingDevice], this method prints an error and returns [constant INVALID_ID].
</description>
</method>
<method name="set_latency_mode">
<return type="void" />
<param index="0" name="p_latency_mode" type="int" enum="RenderingDevice.LatencyMode" />
<description>
Sets the current latency mode for this [RenderingDevice]. See [member ProjectSettings.rendering/rendering_device/vsync/latency_mode] for details.
</description>
</method>
<method name="set_resource_name">
<return type="void" />
<param index="0" name="id" type="RID" />
Expand Down Expand Up @@ -2730,5 +2743,21 @@
<constant name="DRAW_IGNORE_ALL" value="720640" enum="DrawFlags" is_bitfield="true">
Ignore the previous contents of all attachments.
</constant>
<constant name="LATENCY_MODE_LOW_EXTREME" value="0" enum="LatencyMode">
The engine is willing to sacrifice a considerable amount of FPS (frames per second) to achieve the lowest possible latency. It's generally recommended to use [constant LATENCY_MODE_LOW] instead, as the FPS cost tends to be too high. It is strongly recommended this setting should only be set by the end user in user settings, and not be shipped by default.
[b]Note:[/b] Actually receiving low latency is not guaranteed, as it depends on various factors such as system speed, scene complexity and driver support.
[b]Note:[/b] Consider using [constant DisplayServer.VSYNC_ADAPTIVE] to reduce jitter and stutter while in this mode.
</constant>
<constant name="LATENCY_MODE_LOW" value="1" enum="LatencyMode">
The engine is willing to sacrifice some amount of FPS (frames per second) to achieve a generably enjoyable and acceptable low latency experience. This is the recommended setting.
[b]Note:[/b] Actually receiving low latency is not guaranteed, as it depends on various factors such as system speed, scene complexity and driver support.
[b]Note:[/b] Consider using [constant DisplayServer.VSYNC_ADAPTIVE] to reduce jitter and stutter while in this mode.
</constant>
<constant name="LATENCY_MODE_MEDIUM" value="2" enum="LatencyMode">
The engine is not willing to sacrifice much FPS (frames per second), but still maintaining a decent amount of latency. This setting is best for slow systems, or scenes that are too complex to run at decent FPS in lower latency modes. It's also useful as a workaround if the user is experiencing pacing (jitter, stutter) problems with lower latency settings.
</constant>
<constant name="LATENCY_MODE_HIGH_THROUGHPUT" value="3" enum="LatencyMode">
The engine will prefer maximizing FPS (frames per second), with no consideration for latency. This setting is ideal for apps that have no user interaction, like servers or headless processes.
</constant>
</constants>
</class>
16 changes: 16 additions & 0 deletions doc/classes/RenderingServer.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1592,6 +1592,12 @@
Tries to free an object in the RenderingServer. To avoid memory leaks, this should be called after using an object as memory management does not occur automatically when using RenderingServer directly.
</description>
</method>
<method name="get_actual_cpu_gpu_sync_mode" qualifiers="const">
<return type="int" enum="RenderingServer.CPUGPUSyncMode" />
<description>
See [constant Performance.FRAME_PACING_ACTUAL_SYNC_MODE].
</description>
</method>
<method name="get_current_rendering_driver_name" qualifiers="const">
<return type="String" />
<description>
Expand Down Expand Up @@ -5870,6 +5876,16 @@
<constant name="GLOBAL_VAR_TYPE_MAX" value="29" enum="GlobalShaderParameterType">
Represents the size of the [enum GlobalShaderParameterType] enum.
</constant>
<constant name="CPU_GPU_SYNC_PARALLEL" value="0" enum="CPUGPUSyncMode">
Indicates the renderer is prioritizing higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency. This default setting is suitable for most 3D applications, especially on mobile and lower-performance desktop hardware.
[b]Note:[/b] This is part of a fallback mechanism to reduce latency when PACING_METHOD_WAITABLE_SWAPCHAIN is not available.
</constant>
<constant name="CPU_GPU_SYNC_SEQUENTIAL" value="1" enum="CPUGPUSyncMode">
Indicates the renderer is prioritizing lower display latency by severely limiting how far the CPU is allowed to get ahead of the GPU when queuing frames. This can greatly help with input lag, at the cost of significantly reduced framerate in most scenes. This setting is useful for games and applications with simple graphics where responsive input is important. Your results may vary based on platform, drivers, and scene contents.
[b]Note:[/b] This is part of a fallback mechanism to reduce latency when PACING_METHOD_WAITABLE_SWAPCHAIN is not available.
[b]Note:[/b] Important FPS drops are expected while in this mode. It prioritizes low latency over framerate.
[b]Note:[/b] Stutter can be reduced if using [constant DisplayServer.VSYNC_ADAPTIVE]. But it risks always degenerating to [constant DisplayServer.VSYNC_DISABLED] if the system is too slow.
</constant>
<constant name="RENDERING_INFO_TOTAL_OBJECTS_IN_FRAME" value="0" enum="RenderingInfo">
Number of objects rendered in the current 3D scene. This varies depending on camera position and rotation.
</constant>
Expand Down
55 changes: 55 additions & 0 deletions drivers/d3d12/rendering_device_driver_d3d12.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2510,6 +2510,11 @@ void RenderingDeviceDriverD3D12::_swap_chain_release_buffers(SwapChain *p_swap_c
p_swap_chain->render_targets.clear();
p_swap_chain->render_targets_info.clear();

if (p_swap_chain->waitable_object) {
CloseHandle(p_swap_chain->waitable_object);
p_swap_chain->waitable_object = nullptr;
}

for (RDD::FramebufferID framebuffer : p_swap_chain->framebuffers) {
framebuffer_free(framebuffer);
}
Expand Down Expand Up @@ -2567,6 +2572,7 @@ Error RenderingDeviceDriverD3D12::swap_chain_resize(CommandQueueID p_cmd_queue,
case DisplayServer::VSYNC_ENABLED: {
sync_interval = 1;
present_flags = 0;
creation_flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
} break;
case DisplayServer::VSYNC_DISABLED: {
sync_interval = 0;
Expand All @@ -2577,6 +2583,7 @@ Error RenderingDeviceDriverD3D12::swap_chain_resize(CommandQueueID p_cmd_queue,
default:
sync_interval = 1;
present_flags = 0;
creation_flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
break;
}

Expand Down Expand Up @@ -2628,6 +2635,11 @@ Error RenderingDeviceDriverD3D12::swap_chain_resize(CommandQueueID p_cmd_queue,
ERR_FAIL_COND_V(!SUCCEEDED(res), ERR_CANT_CREATE);
}

if (creation_flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) {
swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(frames.size()));
Copy link
Contributor

@KeyboardDanni KeyboardDanni Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe SetMaximumFrameLatency() is meant to control the effective swapchain size, i.e. how many backbuffers can be queued on the swapchain before present calls will block, with 1 being a double-buffered swapchain, and 2 being triple-buffered. This line is setting it to the CPU submission queue size, which doesn't seem correct. By changing this line to:

swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(CLAMP(p_desired_framebuffer_count - 1, 1, 3)));

I was able to get 2 frames of latency. Otherwise, 3 was the best I could manage, which is a regression compared to #100031.

swap_chain->waitable_object = swap_chain->d3d_swap_chain->GetFrameLatencyWaitableObject();
}

#ifdef DCOMP_ENABLED
if (surface->composition_device.Get() == nullptr) {
using PFN_DCompositionCreateDevice = HRESULT(WINAPI *)(IDXGIDevice *, REFIID, void **);
Expand Down Expand Up @@ -2746,6 +2758,49 @@ void RenderingDeviceDriverD3D12::swap_chain_free(SwapChainID p_swap_chain) {
memdelete(swap_chain);
}

Error RenderingDeviceDriverD3D12::swap_chain_wait_for_present(DisplayServer::WindowID p_window, SwapChainID p_swap_chain, uint32_t p_max_frame_delay) {
SwapChain *swap_chain = (SwapChain *)(p_swap_chain.id);
if (swap_chain->waitable_object != NULL) {
UINT timeout = 1000u;

HRESULT res;

{
UINT current_frame_latency = 0u;
res = swap_chain->d3d_swap_chain->GetMaximumFrameLatency(&current_frame_latency);

ERR_FAIL_COND_V_MSG(!SUCCEEDED(res), FAILED, "GetMaximumFrameLatency failed with error " + vformat("0x%08ux", (uint64_t)res) + ".");

if (p_max_frame_delay != current_frame_latency) {
swap_chain->d3d_swap_chain->SetMaximumFrameLatency(UINT(p_max_frame_delay));
}
}

do {
res = WaitForSingleObjectEx(swap_chain->waitable_object, timeout, FALSE);
} while (res == WAIT_IO_COMPLETION);

if (res == WAIT_TIMEOUT) {
ERR_FAIL_COND_V_MSG(!SUCCEEDED(res), ERR_TIMEOUT, "swap_chain_wait_for_present timeout exceeded.");
} else if (res == (HRESULT)WAIT_FAILED) {
DWORD error = GetLastError();
ERR_FAIL_COND_V_MSG(!SUCCEEDED(res), FAILED, "WaitForSingleObjectEx failed with error " + vformat("0x%08ux", (uint64_t)error) + ".");
} else if (res != WAIT_OBJECT_0) {
ERR_FAIL_COND_V_MSG(!SUCCEEDED(res), FAILED, "WaitForSingleObjectEx returned " + vformat("0x%08ux", (uint64_t)res) + ".");
}
return OK;
} else {
return ERR_UNAVAILABLE;
}
}

BitField<RDD::PacingMethod> RenderingDeviceDriverD3D12::get_available_pacing_methods() const {
BitField<PacingMethod> methods = 0;
methods.set_flag(PACING_METHOD_SEQUENTIAL_SYNC);
methods.set_flag(PACING_METHOD_WAITABLE_SWAPCHAIN);
return methods;
}

/*********************/
/**** FRAMEBUFFER ****/
/*********************/
Expand Down
4 changes: 4 additions & 0 deletions drivers/d3d12/rendering_device_driver_d3d12.h
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,7 @@ class RenderingDeviceDriverD3D12 : public RenderingDeviceDriver {

struct SwapChain {
ComPtr<IDXGISwapChain3> d3d_swap_chain;
HANDLE waitable_object;
RenderingContextDriver::SurfaceID surface = RenderingContextDriver::SurfaceID();
UINT present_flags = 0;
UINT sync_interval = 1;
Expand All @@ -489,6 +490,9 @@ class RenderingDeviceDriverD3D12 : public RenderingDeviceDriver {
virtual RenderPassID swap_chain_get_render_pass(SwapChainID p_swap_chain) override;
virtual DataFormat swap_chain_get_format(SwapChainID p_swap_chain) override;
virtual void swap_chain_free(SwapChainID p_swap_chain) override;
virtual Error swap_chain_wait_for_present(DisplayServer::WindowID p_window, SwapChainID p_swap_chain, uint32_t p_max_frame_delay) override final;

virtual BitField<PacingMethod> get_available_pacing_methods() const override final;

/*********************/
/**** FRAMEBUFFER ****/
Expand Down
4 changes: 4 additions & 0 deletions drivers/gles3/storage/utilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,10 @@ void Utilities::capture_timestamp(const String &p_name) {
frames[frame].timestamp_count++;
}

void Utilities::capture_timestamps_sync_mode_auto_end() {
// Not implemented for OpenGL.
}

void Utilities::_capture_timestamps_begin() {
// frame is incremented at the end of the frame so this gives us the queries for frame - 2. By then they should be ready.
if (frames[frame].timestamp_count) {
Expand Down
1 change: 1 addition & 0 deletions drivers/gles3/storage/utilities.h
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ class Utilities : public RendererUtilities {

virtual void capture_timestamps_begin() override;
virtual void capture_timestamp(const String &p_name) override;
virtual void capture_timestamps_sync_mode_auto_end() override;
virtual uint32_t get_captured_timestamps_count() const override;
virtual uint64_t get_captured_timestamps_frame() const override;
virtual uint64_t get_captured_timestamp_gpu_time(uint32_t p_index) const override;
Expand Down
3 changes: 3 additions & 0 deletions drivers/metal/rendering_device_driver_metal.h
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,9 @@ class API_AVAILABLE(macos(11.0), ios(14.0), tvos(14.0)) RenderingDeviceDriverMet
virtual DataFormat swap_chain_get_format(SwapChainID p_swap_chain) override final;
virtual void swap_chain_set_max_fps(SwapChainID p_swap_chain, int p_max_fps) override final;
virtual void swap_chain_free(SwapChainID p_swap_chain) override final;
virtual Error swap_chain_wait_for_present(DisplayServer::WindowID p_window, SwapChainID p_swap_chain, uint32_t p_max_frame_delay) override final;

virtual BitField<PacingMethod> get_available_pacing_methods() const override final;

#pragma mark - Frame Buffer

Expand Down
Loading