Skip to content

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Aug 22, 2025

Objective

Fixes #1907. Spiritual successor to #4740, #12090, and #18163. Third time's the charm!

Bevy currently creates a 50%/25%/25% split between the Compute, AsyncCompute, and IO task pools, meaning that any given operation can only be scheduled onto that subset of threads. This is suboptimal in the cases where apps are not using any IO or Async Compute, or vice versa, where available parallelism would be under utilized due to the split not reflecting the actual use case. This PR aims to fix that under utilization.

Solution

This is based on #20649 and #20331. Unlike #12090 and #18163, this does not introduce any blocking thread pool (albiet those do still exist since we use async-fs / unblock in bevy_assets), and, in fact, does not make a major departure from the current status quo:

  • Do away with the IO and Async task pools and allocate all of the threads to the now consolidated TaskPool.
  • Provide a set of task priorities:
    • RunNow - When woken, these tasks are scheduled to immediately run after the currently running task yields.
    • Compute - Replaces the ComputeTaskPool's purpose.
    • AsyncIO - New. Dedicated to IO tasks that yield regularly with fairly low amounts of compute when woken. Ideal for network IO.
    • BlockingCompute - Replaces the AsyncTaskPool's purpose.
    • BlockingIO - Replaces the IoTaskPool's purpose.
  • Priority groups are limited by semaphores inside the executor. New or freshly woken tasks in groups that are at the limit are pushed back onto the queue to run later. These limits, by default, apply to all groups with priorities below Compute. This allows any task to freely use any thread in the task pool when scheduled, and prevents the lower priority groups from starving out higher priority tasks.

This isn't strictly reliant on #20649, but it does make it easier to implement. The diff with that PR can be seen here.
This PR will remain in draft until #20649 or some equivalent is merged.

Testing

Right now, just bsaic example testing.

Future Work

  • Find ways to prioritize scheduling latency sensitive tasks onto higher performance cores (e.g. Intel's P-Cores) and lower priority background tasks onto more power efficient CPU cores (e.g. Intel's E-Cores).

Co-Authored By: Mike Hsu [email protected]

@james7132 james7132 added this to the 0.18 milestone Aug 22, 2025
@james7132 james7132 added C-Performance A change motivated by improving speed, memory usage or compile times S-Blocked This cannot move forward until something else changes A-Tasks Tools for parallel and async work D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way labels Aug 22, 2025
Copy link
Contributor

The generated examples/README.md is out of sync with the example metadata in Cargo.toml or the example readme template. Please run cargo run -p build-templated-pages -- update examples to update it, and commit the file change.

@james7132 james7132 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Aug 22, 2025
@alice-i-cecile alice-i-cecile added the M-Needs-Release-Note Work that should be called out in the blog due to impact label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide M-Needs-Release-Note Work that should be called out in the blog due to impact S-Blocked This cannot move forward until something else changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using ComputeTaskPool for a par_for_each query only uses half of available logical cores
2 participants