-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Remote entity reservation v9 #18670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remote entity reservation v9 #18670
Conversation
free and pending are now more distinct
That's alright. I was asking because I didn't follow all of your discussions and design work leading up to this PR, and I wanted to be sure I had the justification right before approving. |
|
(My approval doesn't hinge on any of this stuff. I just wanted to address the other parts of your comment separately.)
Good to know, but IIRC components-as-entities had only been blocked on deciding whether or not to let Correct me if I'm wrong but this seems only loosely related. Asset loaders don't have access to the world, so they can't initialize queries or systems (and that shouldn't change in the future because queries and systems will be entities that need to be instantiated). In any case, handling component entities being despawned shouldn't be a tough problem. Bevy's ECS has immutable components and reactivity, so it'd be easy enough to create some Reflection can be significantly empowered (and simplified) if component registrations are normal entities and you can attach your own components to them, so I don't see it as something we want to avoid.
flecs' entity metadata is closer to how bevy currently is (last I checked anyway). Its equivalent of the (flecs has a I don't know if flecs directly supports "remote" reservation of entities or if Sander even considers it necessary. (I don't recall it being a feature but if he's said otherwise then ignore me lol.) |
Thanks, this is an amazing writeup! |
Yeah; I don't mean that remote reservation is a prerequisite for components as entities; just that it lets us start reworking the A few of those questions (that I still have anyway) are: 1) Is panicking the best way to handle a removed/despawned component entity? Why not yank the component from all archetypes and continue? 2) Can we make the resource singleton entity different from the component info singleton entity? I don't see why not and that would let us punt
Yeah, this is 100% a separate issue. I just want to do the id/fragmenting portion of components as entities soon. There's already prs for query by value, fragmenting components, etc that would really benefit from this.
That's a solid option. I don't mean to dig up old questions if we've already reached a consensus here. I just don't know what it is. Maybe worth laying out a plan explicitly somewhere. If one exists, I haven't been able to find it.
Very much looking forward to this!
Yeah, that makes sense. I only glanced at it, and yeah; doesn't seem to support remote reservation. |
Eh, I mean, it's not like today's Bevy lets you unregister a component. Configurable cleanup is a feature itself, so I wouldn't tie them together.
I think the matter of
I don't have a list, but low-hanging fruit would be components that represent "implements If we spawn entities to represent plugins and link them to their components with Things like that. |
|
For y'all who haven't been following this for a while, I want to draw attention to this comment from a while ago. It raises a potential performance concern about a benchmarking blindspot. My assessment is that it's fine, but I want this to be out in the open. There's also a doc out here that gives some more background if you're interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably need to read all this a few more times before giving it a proper review.
|
@urben1680 Thanks for the review! But to save you time, this pr might not be the solution to remote reservation. It certainly could be, but I'm waiting on #19451 and a decision on entity paging. Depending on what happens there, we might do this pr or something more efficient that those things open up. I'll keep you posted though! |
I've found loom to be fairly incomplete, and verging on unmaintained. I would recommend shuttle instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First part of what I suspect will be a very long review. I like what I see, but I think some of the safety comments could be more explicit about memory ordering requirements.
| @@ -0,0 +1,963 @@ | |||
| use bevy_platform::{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think module-level docs are the way to go. If I had my way, every module would have a module-doc.
Something short along these lines would suffice.
| use bevy_platform::{ | |
| //! This module contains the code of allocating entities across threads, and defines the `Allocator` type. | |
| use bevy_platform::{ |
| use super::{Entity, EntityRow, EntitySetIterator}; | ||
|
|
||
| /// This is the item we store in the free list. | ||
| /// Effectively, this is a `MaybeUninit<Entity>` where uninit is represented by `Entity::PLACEHOLDER`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these docs could be better. The MaybeUninit analogy isn't exactly applicable: it sounds like slots are always initalized, just not nessicarally filled with a meaningfull entity ID value. This shouln't block merge though.
| /// This is an atomic interface to [`FreeCountState`]. | ||
| struct FreeCount(AtomicU64); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do on platforms without 64-bit atomics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fall back to portable atomics, as in bevy_platform.
…nd despawning (#19451) # Objective This is the next step for #19430 and is also convinient for #18670. For context, the way entities work on main is as a "allocate and use" system. Entity ids are allocated, and given a location. The location can then be changed, etc. Entities that are free have an invalid location. To allocate an entity, one must also set its location. This introduced the need for pending entities, where an entity would be reserved, pending, and at some point flushed. Pending and free entities have an invalid location, and others are assumed to have a valid one. This paradigm has a number of downsides: First, the entities metadata table is inseparable from the allocator, which makes remote reservation challenging. Second, the `World` must be flushed, even to do simple things, like allocate a temporary entity id. Third, users have little control over entity ids, only interacting with conceptual entities. This made things like `Entities::alloc_at` clunky and slow, leading to its removal, despite some users still having valid need of it. So the goal of this PR is to: - Decouple `Entities` from entity allocation to make room for other allocators and resolve `alloc_at` issues. - Decouple entity allocation from spawning to make reservation a moot point. - Introduce constructing and destructing entities, in addition to spawn/despawn. - Change `reserve` and `flush` patterns to `alloc` and `construct` patterns. It is possible to break this up into multiple prs, as I originally intended, but doing so would require lots of temporary scaffolding that would both hurt performance and make things harder to review. ## Solution This solution builds on #19433, which changed the representation of invalid entity locations from a constant to `None`. There's quite a few steps to this, each somewhat controversial: ### Entities with no location This pr introduces the idea of entity rows both with and without locations. This corresponds to entities that are constructed (the row has a location) and not constructed (the row has no location). When a row is free or pending, it is not constructed. When a row is outside the range of the meta list, it still exists; it's just not constructed. This extends to conceptual entities; conceptual entities may now be in one of 3 states: empty (constructed; no components), normal (constructed; 1 or more components), or null (not constructed). This extends to entity pointers (`EntityWorldMut`, etc): These now can point to "null"/not constructed entities. Depending on the privilege of the pointer, these can also construct or destruct the entity. This also changes how `Entity` ids relate to conceptual entities. An `Entity` now exists if its generation matches that of its row. An `Entity` that has the right generation for its row will claim to exist, even if it is not constructed. This means, for example, an `Entity` manually constructed with a large index and generation of 0 *will* exist if it has not been allocated yet. ### `Entities` is separate from the allocator This pr separates entity allocation from `Entities`. `Entities` is now only focused on tracking entity metadata, etc. The new `EntitiesAllocator` on `World` manages all allocations. This forces `Entities` to not rely on allocator state to determine if entities exist, etc, which is convinient for remote reservation and needed for custom allocators. It also paves the way for allocators not housed within the `World`, makes some unsafe code easier since the allocator and metadata live under different pointers, etc. This separation requires thinking about interactions with `Entities` in a new way. Previously, the `Entities` set the rules for what entities are valid and what entities are not. Now, it has no way of knowing. Instead, interaction with `Entities` are more like declaring some information for it to track than changing some information it was already tracking. To reflect this, `set` has been split up into `declare` and `update`. ### Constructing and destructing As mentioned, entities that have no location (not constructed) can be constructed at any time. This takes on exactly the same meaning as the previous `spawn_non_existent`. It creates/declares a location instead of updating an old one. As an example, this makes spawning an entity now literately just allocate a new id and construct it immediately. Conversely, entities that are constructed may be destructed. This removes all components and despawns related entities, just like `despawn`. The only difference is that destructing does not free the entity id for reuse. Between constructing and destructing, all needs for `alloc_at` are resolved. If you want to keep the id for custom reuse, just destruct instead of despawn! Despawn, now just destructs the entity and frees it. Destructing a not constructed entity will do nothing. Constructing an already constructed entity will panic. This is to guard against users constructing a manually formed `Entity` that the allocator could later hand out. However, public construction methods have proper error handling for this. Despawning a not constructed entity just frees its id. ### No more flushing All places that once needed to reserve and flush entity ids now allocate and construct them instead. This improves performance and simplifies things. ### Flow chart  (Thanks @ItsDoot) ## Testing - CI - Some new tests - A few deleted (no longer applicable) tests - If you see something you think should have a test case, I'll gladly add it. ## Showcase Here's an example of constructing and destructing ```rust let e4 = world.spawn_null(); world .entity_mut(e4) .construct((TableStored("junk"), A(0))) .unwrap() .destruct() .construct((TableStored("def"), A(456))) .unwrap(); ``` ## Future Work - [x] More expansive docs. This should definitely should be done, but I'd rather do that in a future pr to separate writing review from code review. If you have more ideas for how to introduce users to these concepts, I'd like to see them. As it is, we don't do a very good job of explaining entities to users. Ex: `Entity` doesn't always correspond to a conceptual entity. - [ ] Try to remove panics from `EntityWorldMut`. There is (and was) a lot of assuming the entity is constructed there (was assuming it was not despawned). - [ ] A lot of names are still centered around spawn/despawn, which is more user-friendly than construct/destruct but less precise. Might be worth changing these over. - [ ] Making a centralized bundle despawner would make sense now. - [ ] Of course, build on this for remote reservation and, potentially, for paged entities. ## Performance <details> <summary>Benchmarks</summary> ```txt critcmp main pr19451 -t 1 group main pr19451 ----- ---- ------- add_remove/sparse_set 1.13 594.7±6.80µs ? ?/sec 1.00 527.4±8.01µs ? ?/sec add_remove/table 1.08 799.6±15.53µs ? ?/sec 1.00 739.7±15.10µs ? ?/sec add_remove_big/sparse_set 1.10 614.6±6.50µs ? ?/sec 1.00 557.0±19.04µs ? ?/sec add_remove_big/table 1.03 2.8±0.01ms ? ?/sec 1.00 2.7±0.02ms ? ?/sec added_archetypes/archetype_count/100 1.01 30.9±0.50µs ? ?/sec 1.00 30.5±0.44µs ? ?/sec added_archetypes/archetype_count/1000 1.00 638.0±19.77µs ? ?/sec 1.03 657.0±73.61µs ? ?/sec added_archetypes/archetype_count/10000 1.02 5.5±0.14ms ? ?/sec 1.00 5.4±0.09ms ? ?/sec all_added_detection/50000_entities_ecs::change_detection::Sparse 1.02 47.9±1.22µs ? ?/sec 1.00 46.8±0.40µs ? ?/sec all_added_detection/50000_entities_ecs::change_detection::Table 1.02 45.4±1.89µs ? ?/sec 1.00 44.6±0.78µs ? ?/sec build_schedule/1000_schedule 1.02 942.6±11.53ms ? ?/sec 1.00 925.2±10.35ms ? ?/sec build_schedule/100_schedule 1.01 5.8±0.12ms ? ?/sec 1.00 5.7±0.12ms ? ?/sec build_schedule/100_schedule_no_constraints 1.03 803.1±28.93µs ? ?/sec 1.00 781.1±50.11µs ? ?/sec build_schedule/500_schedule_no_constraints 1.00 5.6±0.31ms ? ?/sec 1.08 6.0±0.27ms ? ?/sec busy_systems/01x_entities_03_systems 1.00 24.4±1.35µs ? ?/sec 1.01 24.7±1.35µs ? ?/sec busy_systems/03x_entities_03_systems 1.00 38.1±1.70µs ? ?/sec 1.04 39.7±1.49µs ? ?/sec busy_systems/03x_entities_09_systems 1.01 111.4±2.27µs ? ?/sec 1.00 109.9±2.46µs ? ?/sec busy_systems/03x_entities_15_systems 1.00 174.8±2.56µs ? ?/sec 1.01 176.6±4.22µs ? ?/sec contrived/03x_entities_09_systems 1.00 59.0±2.92µs ? ?/sec 1.01 59.8±3.03µs ? ?/sec contrived/03x_entities_15_systems 1.00 97.5±4.87µs ? ?/sec 1.01 98.8±4.69µs ? ?/sec contrived/05x_entities_09_systems 1.00 75.3±3.76µs ? ?/sec 1.01 76.4±4.11µs ? ?/sec despawn_world/10000_entities 1.32 344.8±4.47µs ? ?/sec 1.00 261.4±4.91µs ? ?/sec despawn_world/100_entities 1.22 4.3±0.04µs ? ?/sec 1.00 3.5±0.54µs ? ?/sec despawn_world/1_entities 1.01 169.6±7.88ns ? ?/sec 1.00 167.8±11.45ns ? ?/sec despawn_world_recursive/10000_entities 1.20 1723.0±53.82µs ? ?/sec 1.00 1437.0±26.11µs ? ?/sec despawn_world_recursive/100_entities 1.16 17.9±0.10µs ? ?/sec 1.00 15.5±0.16µs ? ?/sec despawn_world_recursive/1_entities 1.01 372.8±15.68ns ? ?/sec 1.00 367.7±16.90ns ? ?/sec ecs::entity_cloning::hierarchy_many/clone 1.03 227.9±24.67µs 1559.9 KElem/sec 1.00 221.1±29.74µs 1607.8 KElem/sec ecs::entity_cloning::hierarchy_many/reflect 1.00 406.2±23.46µs 875.2 KElem/sec 1.02 413.9±22.45µs 858.9 KElem/sec ecs::entity_cloning::hierarchy_tall/clone 1.01 12.2±0.34µs 4.0 MElem/sec 1.00 12.0±1.41µs 4.1 MElem/sec ecs::entity_cloning::hierarchy_tall/reflect 1.02 15.3±0.39µs 3.2 MElem/sec 1.00 15.0±2.14µs 3.2 MElem/sec ecs::entity_cloning::single/clone 1.02 659.0±100.01ns 1481.8 KElem/sec 1.00 643.3±101.49ns 1517.9 KElem/sec ecs::entity_cloning::single/reflect 1.03 1135.2±72.17ns 860.2 KElem/sec 1.00 1098.3±65.99ns 889.1 KElem/sec empty_archetypes/for_each/10 1.02 8.1±0.57µs ? ?/sec 1.00 8.0±0.37µs ? ?/sec empty_archetypes/for_each/100 1.01 8.1±0.34µs ? ?/sec 1.00 8.1±0.28µs ? ?/sec empty_archetypes/for_each/1000 1.03 8.4±0.25µs ? ?/sec 1.00 8.2±0.29µs ? ?/sec empty_archetypes/iter/100 1.01 8.1±0.29µs ? ?/sec 1.00 8.0±0.34µs ? ?/sec empty_archetypes/iter/1000 1.02 8.5±0.31µs ? ?/sec 1.00 8.4±0.62µs ? ?/sec empty_archetypes/iter/10000 1.01 10.6±1.22µs ? ?/sec 1.00 10.5±0.49µs ? ?/sec empty_archetypes/par_for_each/10 1.01 8.8±0.49µs ? ?/sec 1.00 8.7±0.31µs ? ?/sec empty_archetypes/par_for_each/100 1.00 8.7±0.48µs ? ?/sec 1.04 9.0±0.34µs ? ?/sec empty_archetypes/par_for_each/10000 1.01 21.2±0.41µs ? ?/sec 1.00 20.9±0.44µs ? ?/sec empty_commands/0_entities 1.72 3.7±0.01ns ? ?/sec 1.00 2.1±0.02ns ? ?/sec empty_systems/100_systems 1.00 82.9±3.29µs ? ?/sec 1.07 88.3±3.77µs ? ?/sec empty_systems/2_systems 1.01 8.2±0.71µs ? ?/sec 1.00 8.2±0.38µs ? ?/sec empty_systems/4_systems 1.00 8.2±0.72µs ? ?/sec 1.03 8.4±0.71µs ? ?/sec entity_hash/entity_set_build/10000 1.10 45.9±1.60µs 207.7 MElem/sec 1.00 41.6±0.39µs 229.0 MElem/sec entity_hash/entity_set_build/3162 1.06 12.7±0.77µs 236.7 MElem/sec 1.00 12.0±0.75µs 250.6 MElem/sec entity_hash/entity_set_lookup_hit/10000 1.02 14.5±0.30µs 658.3 MElem/sec 1.00 14.2±0.07µs 672.6 MElem/sec entity_hash/entity_set_lookup_hit/3162 1.01 4.4±0.03µs 682.7 MElem/sec 1.00 4.4±0.01µs 691.3 MElem/sec entity_hash/entity_set_lookup_miss_gen/10000 1.01 61.3±4.12µs 155.6 MElem/sec 1.00 60.6±1.47µs 157.3 MElem/sec entity_hash/entity_set_lookup_miss_gen/3162 1.00 9.5±0.02µs 316.3 MElem/sec 1.01 9.7±0.88µs 311.7 MElem/sec entity_hash/entity_set_lookup_miss_id/100 1.00 145.5±1.49ns 655.4 MElem/sec 1.03 149.8±1.59ns 636.7 MElem/sec entity_hash/entity_set_lookup_miss_id/10000 1.85 63.9±3.57µs 149.3 MElem/sec 1.00 34.6±3.81µs 275.8 MElem/sec entity_hash/entity_set_lookup_miss_id/316 1.00 562.0±9.58ns 536.2 MElem/sec 1.02 573.9±1.27ns 525.1 MElem/sec entity_hash/entity_set_lookup_miss_id/3162 1.03 9.1±0.10µs 330.7 MElem/sec 1.00 8.9±0.24µs 339.0 MElem/sec event_propagation/four_event_types 1.12 541.5±3.84µs ? ?/sec 1.00 482.7±4.64µs ? ?/sec event_propagation/single_event_type 1.07 769.5±10.21µs ? ?/sec 1.00 715.9±15.16µs ? ?/sec event_propagation/single_event_type_no_listeners 1.56 393.4±2.89µs ? ?/sec 1.00 251.4±3.68µs ? ?/sec events_iter/size_16_events_100 1.01 64.0±0.18ns ? ?/sec 1.00 63.4±0.23ns ? ?/sec events_iter/size_4_events_100 1.02 64.8±0.90ns ? ?/sec 1.00 63.4±0.24ns ? ?/sec events_iter/size_4_events_1000 1.01 586.5±8.00ns ? ?/sec 1.00 579.1±4.93ns ? ?/sec events_send/size_16_events_100 1.00 142.7±24.34ns ? ?/sec 1.03 147.1±28.36ns ? ?/sec events_send/size_16_events_10000 1.01 12.2±0.13µs ? ?/sec 1.00 12.1±0.12µs ? ?/sec fake_commands/10000_commands 1.43 63.3±8.21µs ? ?/sec 1.00 44.1±0.16µs ? ?/sec fake_commands/1000_commands 1.40 6.2±0.01µs ? ?/sec 1.00 4.4±0.02µs ? ?/sec fake_commands/100_commands 1.38 629.4±1.69ns ? ?/sec 1.00 457.1±0.84ns ? ?/sec few_changed_detection/50000_entities_ecs::change_detection::Table 1.00 57.7±0.86µs ? ?/sec 1.07 61.6±1.19µs ? ?/sec few_changed_detection/5000_entities_ecs::change_detection::Sparse 1.05 5.4±0.53µs ? ?/sec 1.00 5.1±0.56µs ? ?/sec few_changed_detection/5000_entities_ecs::change_detection::Table 1.00 4.3±0.30µs ? ?/sec 1.18 5.1±0.35µs ? ?/sec insert_commands/insert 1.11 402.5±10.75µs ? ?/sec 1.00 363.6±8.07µs ? ?/sec insert_commands/insert_batch 1.00 174.9±3.03µs ? ?/sec 1.02 177.9±5.74µs ? ?/sec insert_simple/base 1.04 564.1±23.01µs ? ?/sec 1.00 544.3±60.70µs ? ?/sec insert_simple/unbatched 1.32 929.3±180.10µs ? ?/sec 1.00 704.1±132.88µs ? ?/sec iter_fragmented/base 1.02 280.0±2.86ns ? ?/sec 1.00 274.0±4.85ns ? ?/sec iter_fragmented/foreach 1.00 97.3±0.42ns ? ?/sec 1.03 100.6±3.44ns ? ?/sec iter_fragmented/foreach_wide 1.04 2.7±0.04µs ? ?/sec 1.00 2.6±0.03µs ? ?/sec iter_fragmented_sparse/base 1.00 5.6±0.05ns ? ?/sec 1.04 5.8±0.06ns ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Sparse 1.00 737.7±27.38µs ? ?/sec 1.01 747.5±30.01µs ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Table 1.02 678.3±25.13µs ? ?/sec 1.00 662.1±19.63µs ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Sparse 1.09 76.0±9.35µs ? ?/sec 1.00 70.0±3.29µs ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Table 1.03 64.7±3.40µs ? ?/sec 1.00 62.8±1.80µs ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Table 1.02 7.6±0.12µs ? ?/sec 1.00 7.5±0.16µs ? ?/sec multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Sparse 1.00 1003.5±12.38ns ? ?/sec 1.01 1013.7±32.64ns ? ?/sec multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Sparse 1.03 187.1±21.18ns ? ?/sec 1.00 181.9±22.86ns ? ?/sec multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Sparse 1.00 52.8±8.19ns ? ?/sec 1.03 54.3±8.06ns ? ?/sec multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Table 1.00 46.8±2.23ns ? ?/sec 1.03 48.0±2.48ns ? ?/sec no_archetypes/system_count/0 1.00 16.3±0.17ns ? ?/sec 1.02 16.6±0.16ns ? ?/sec no_archetypes/system_count/100 1.02 851.5±9.32ns ? ?/sec 1.00 832.9±7.93ns ? ?/sec none_changed_detection/5000_entities_ecs::change_detection::Sparse 1.00 3.4±0.04µs ? ?/sec 1.02 3.5±0.05µs ? ?/sec nonempty_spawn_commands/10000_entities 1.89 261.1±6.99µs ? ?/sec 1.00 137.8±8.47µs ? ?/sec nonempty_spawn_commands/1000_entities 1.90 26.4±3.18µs ? ?/sec 1.00 13.9±2.38µs ? ?/sec nonempty_spawn_commands/100_entities 1.87 2.6±0.07µs ? ?/sec 1.00 1388.8±97.31ns ? ?/sec observe/trigger_simple 1.09 347.5±1.51µs ? ?/sec 1.00 317.7±2.62µs ? ?/sec observe/trigger_targets_simple/10000_entity 1.04 696.5±15.50µs ? ?/sec 1.00 672.0±13.88µs ? ?/sec par_iter_simple/with_0_fragment 1.01 34.4±0.51µs ? ?/sec 1.00 33.9±0.53µs ? ?/sec par_iter_simple/with_1000_fragment 1.04 45.5±0.93µs ? ?/sec 1.00 43.9±1.85µs ? ?/sec par_iter_simple/with_100_fragment 1.03 36.2±0.50µs ? ?/sec 1.00 35.1±0.44µs ? ?/sec par_iter_simple/with_10_fragment 1.03 37.5±0.97µs ? ?/sec 1.00 36.5±0.74µs ? ?/sec param/combinator_system/8_dyn_params_system 1.00 10.4±0.73µs ? ?/sec 1.01 10.5±0.79µs ? ?/sec param/combinator_system/8_piped_systems 1.05 8.0±0.65µs ? ?/sec 1.00 7.6±0.57µs ? ?/sec query_get/50000_entities_sparse 1.06 136.7±0.35µs ? ?/sec 1.00 128.6±0.44µs ? ?/sec query_get_many_10/50000_calls_sparse 1.02 1649.4±77.80µs ? ?/sec 1.00 1614.4±78.91µs ? ?/sec query_get_many_2/50000_calls_sparse 1.00 191.3±3.66µs ? ?/sec 1.01 193.3±0.75µs ? ?/sec query_get_many_2/50000_calls_table 1.00 243.9±0.55µs ? ?/sec 1.05 257.2±8.62µs ? ?/sec query_get_many_5/50000_calls_sparse 1.00 585.9±7.70µs ? ?/sec 1.03 600.6±5.99µs ? ?/sec query_get_many_5/50000_calls_table 1.00 673.7±7.44µs ? ?/sec 1.07 722.3±10.77µs ? ?/sec run_condition/no/1000_systems 1.00 23.7±0.06µs ? ?/sec 1.06 25.1±0.07µs ? ?/sec run_condition/no/100_systems 1.00 1460.5±4.28ns ? ?/sec 1.03 1510.1±3.69ns ? ?/sec run_condition/no/10_systems 1.00 201.5±0.53ns ? ?/sec 1.04 209.1±2.37ns ? ?/sec run_condition/yes/1000_systems 1.00 1225.7±22.58µs ? ?/sec 1.02 1253.7±24.90µs ? ?/sec run_condition/yes/100_systems 1.02 89.4±3.43µs ? ?/sec 1.00 88.0±3.96µs ? ?/sec run_condition/yes_using_query/1000_systems 1.00 1288.3±26.57µs ? ?/sec 1.03 1323.0±24.73µs ? ?/sec run_condition/yes_using_query/100_systems 1.00 108.8±2.51µs ? ?/sec 1.03 112.3±3.09µs ? ?/sec run_condition/yes_using_resource/100_systems 1.03 99.0±3.37µs ? ?/sec 1.00 96.2±4.80µs ? ?/sec run_empty_schedule/MultiThreaded 1.03 15.3±0.10ns ? ?/sec 1.00 14.9±0.03ns ? ?/sec run_empty_schedule/Simple 1.01 15.2±0.15ns ? ?/sec 1.00 15.0±0.25ns ? ?/sec sized_commands_0_bytes/10000_commands 1.57 52.6±0.41µs ? ?/sec 1.00 33.5±0.10µs ? ?/sec sized_commands_0_bytes/1000_commands 1.57 5.3±0.01µs ? ?/sec 1.00 3.4±0.00µs ? ?/sec sized_commands_0_bytes/100_commands 1.56 536.5±4.83ns ? ?/sec 1.00 343.6±1.12ns ? ?/sec sized_commands_12_bytes/10000_commands 1.22 63.0±0.53µs ? ?/sec 1.00 51.5±6.06µs ? ?/sec sized_commands_12_bytes/1000_commands 1.25 5.7±0.01µs ? ?/sec 1.00 4.6±0.05µs ? ?/sec sized_commands_12_bytes/100_commands 1.27 579.3±1.28ns ? ?/sec 1.00 455.4±0.85ns ? ?/sec sized_commands_512_bytes/10000_commands 1.11 248.4±85.81µs ? ?/sec 1.00 224.3±52.11µs ? ?/sec sized_commands_512_bytes/1000_commands 1.09 22.8±0.18µs ? ?/sec 1.00 21.0±0.17µs ? ?/sec sized_commands_512_bytes/100_commands 1.13 1852.2±11.21ns ? ?/sec 1.00 1635.3±4.91ns ? ?/sec spawn_commands/10000_entities 1.04 844.2±11.96µs ? ?/sec 1.00 811.5±13.25µs ? ?/sec spawn_commands/1000_entities 1.05 84.9±3.66µs ? ?/sec 1.00 80.5±4.13µs ? ?/sec spawn_commands/100_entities 1.06 8.6±0.12µs ? ?/sec 1.00 8.1±0.12µs ? ?/sec spawn_world/10000_entities 1.03 413.2±25.20µs ? ?/sec 1.00 400.9±49.97µs ? ?/sec spawn_world/100_entities 1.02 4.1±0.62µs ? ?/sec 1.00 4.1±0.69µs ? ?/sec spawn_world/1_entities 1.04 42.2±3.23ns ? ?/sec 1.00 40.6±6.81ns ? ?/sec world_entity/50000_entities 1.18 88.3±0.42µs ? ?/sec 1.00 74.7±0.16µs ? ?/sec world_get/50000_entities_sparse 1.02 182.2±0.32µs ? ?/sec 1.00 179.5±0.84µs ? ?/sec world_get/50000_entities_table 1.01 198.3±0.46µs ? ?/sec 1.00 196.2±0.63µs ? ?/sec world_query_for_each/50000_entities_sparse 1.00 32.7±0.12µs ? ?/sec 1.01 33.1±0.46µs ? ?/sec ``` </details> This roughly doubles command spawning speed! Despawning also sees a 20-30% improvement. Dummy commands improve by 10-50% (due to not needing an entity flush). Other benchmarks seem to be noise and are negligible. It looks to me like a massive performance win! --------- Co-authored-by: Alice Cecile <[email protected]> Co-authored-by: Christian Hughes <[email protected]> Co-authored-by: urben1680 <[email protected]> Co-authored-by: Chris Russell <[email protected]> Co-authored-by: Trashtalk217 <[email protected]> Co-authored-by: James Liu <[email protected]> Co-authored-by: Carter Anderson <[email protected]>
Much improved safety comments and reduced some overkill safety concerns.
|
I have revived this PR after #19451 as the last step in a basic remote reservation setup. Benches show roughly as expected. Here's the raw numbers on my M2 Max: group pre post
----- ---------------------- ------------------------
added_archetypes/archetype_count/1000 1.17 639.5±20.10µs ? ?/sec 1.00 548.8±138.84µs ? ?/sec
build_schedule/100_schedule_no_constraints 1.28 1031.1±64.49µs ? ?/sec 1.00 806.4±63.57µs ? ?/sec
despawn_world/10000_entities 1.06 269.6±21.12µs ? ?/sec 1.00 254.5±3.47µs ? ?/sec
despawn_world/1_entities 1.00 170.4±11.21ns ? ?/sec 1.57 266.9±13.26ns ? ?/sec
despawn_world_recursive/1_entities 1.00 318.5±51.60ns ? ?/sec 1.43 456.5±56.81ns ? ?/sec
ecs::bundles::insert_many::insert_many/only_last 1.00 236.0±9.92µs ? ?/sec 1.12 263.7±10.63µs ? ?/sec
ecs::bundles::spawn_many_zst::spawn_many_zst/static 1.16 122.1±41.23µs ? ?/sec 1.00 105.6±29.18µs ? ?/sec
ecs::bundles::spawn_one_zst::spawn_one_zst/static 1.00 182.8±16.54µs ? ?/sec 1.40 256.7±44.68µs ? ?/sec
ecs::entity_cloning::hierarchy_tall/clone 1.13 14.3±1.36µs 3.4 MElem/sec 1.00 12.7±1.36µs 3.8 MElem/sec
ecs::entity_cloning::hierarchy_tall/reflect 1.14 17.6±1.85µs 2.8 MElem/sec 1.00 15.5±0.47µs 3.1 MElem/sec
ecs::entity_cloning::hierarchy_wide/clone 1.12 11.6±0.29µs 4.2 MElem/sec 1.00 10.3±0.92µs 4.7 MElem/sec
ecs::entity_cloning::hierarchy_wide/reflect 1.08 14.8±1.39µs 3.3 MElem/sec 1.00 13.7±2.40µs 3.5 MElem/sec
ecs::entity_cloning::single/clone 1.17 777.9±67.36ns 1255.4 KElem/sec 1.00 663.3±92.48ns 1472.3 KElem/sec
empty_archetypes/iter/10000 1.00 9.3±0.19µs ? ?/sec 1.06 9.9±0.27µs ? ?/sec
empty_archetypes/par_for_each/10 1.00 8.1±0.32µs ? ?/sec 1.10 8.9±1.03µs ? ?/sec
empty_archetypes/par_for_each/100 1.00 8.3±0.19µs ? ?/sec 1.06 8.9±0.26µs ? ?/sec
empty_archetypes/par_for_each/1000 1.09 12.3±0.18µs ? ?/sec 1.00 11.3±0.10µs ? ?/sec
empty_archetypes/par_for_each/10000 1.12 22.9±0.33µs ? ?/sec 1.00 20.5±0.23µs ? ?/sec
empty_commands/0_entities 1.13 2.4±0.02ns ? ?/sec 1.00 2.1±0.04ns ? ?/sec
entity_hash/entity_set_lookup_hit/1000 1.20 1686.1±26.00ns 565.6 MElem/sec 1.00 1403.6±11.46ns 679.4 MElem/sec
entity_hash/entity_set_lookup_hit/10000 1.18 23.5±0.41µs 405.4 MElem/sec 1.00 20.0±1.60µs 476.7 MElem/sec
entity_hash/entity_set_lookup_hit/3162 1.15 5.8±0.12µs 517.8 MElem/sec 1.00 5.1±0.06µs 595.0 MElem/sec
entity_hash/entity_set_lookup_miss_gen/3162 1.00 13.6±1.66µs 222.3 MElem/sec 1.15 15.6±3.83µs 193.6 MElem/sec
entity_hash/entity_set_lookup_miss_id/10000 1.00 40.9±5.18µs 233.0 MElem/sec 1.27 51.8±4.65µs 183.9 MElem/sec
events_send/size_16_events_100 1.08 158.4±22.22ns ? ?/sec 1.00 146.4±20.18ns ? ?/sec
fake_commands/10000_commands 1.00 49.0±0.58µs ? ?/sec 1.07 52.2±0.80µs ? ?/sec
fake_commands/1000_commands 1.00 4.9±0.06µs ? ?/sec 1.07 5.2±0.09µs ? ?/sec
fake_commands/100_commands 1.00 504.3±5.33ns ? ?/sec 1.05 531.9±7.36ns ? ?/sec
insert_simple/unbatched 1.00 885.9±234.43µs ? ?/sec 1.11 983.6±158.83µs ? ?/sec
iter_simple/foreach_wide 2.76 46.5±0.42µs ? ?/sec 1.00 16.8±0.14µs ? ?/sec
iter_simple/wide 1.07 39.6±0.26µs ? ?/sec 1.00 36.9±0.26µs ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Sparse 1.00 130.3±2.20µs ? ?/sec 1.15 150.4±17.32µs ? ?/sec
nonempty_spawn_commands/1000_entities 1.00 15.0±1.00µs ? ?/sec 1.05 15.8±2.58µs ? ?/sec
observe/observer_custom/10000_entity 1.07 606.2±16.28µs ? ?/sec 1.00 565.9±37.26µs ? ?/sec
observe/observer_lifecycle_insert 1.00 493.8±5.71µs ? ?/sec 1.10 541.5±19.95µs ? ?/sec
run_condition/yes/1000_systems 1.00 1186.4±45.52µs ? ?/sec 1.06 1262.9±34.60µs ? ?/sec
run_condition/yes_using_resource/1000_systems 1.05 1303.5±22.58µs ? ?/sec 1.00 1240.5±31.14µs ? ?/sec
sized_commands_0_bytes/10000_commands 1.00 36.9±0.47µs ? ?/sec 1.08 39.9±0.58µs ? ?/sec
sized_commands_0_bytes/1000_commands 1.00 3.7±0.05µs ? ?/sec 1.08 4.0±0.09µs ? ?/sec
sized_commands_0_bytes/100_commands 1.00 384.9±10.48ns ? ?/sec 1.06 406.6±5.75ns ? ?/sec
sized_commands_12_bytes/100_commands 1.00 449.8±6.41ns ? ?/sec 1.06 475.6±6.78ns ? ?/secThere are certainly a few outliers in the benches. I've changed around a few things, so I think some fresh review is next. |
"despawn_world/10000_entities" is actually 6% faster with this PR?
|
fixes #18003
Objective
It's version 9 of the same objective lol. For assets as entities, we need entities to be able to be reserved from any thread. Ideally, this can be done without depending on an async context, blocking, or waiting. Any of these compromises could hurt asset performance or discontinue the completely non-blocking nature of the asset system.
As a bonus, this PR makes allocating entities only need
&Entitiesinstead of&mut.Entities::flushis now completely optional, meaning none of theEntitiesmethods depends on the flush at all, and there is protection against flushing an entity twice.(If you're curious, v9 actually branched from v8. v8 was focused on #18577 (never flush entities), but this still includes flushing.)
Solution
Organizationally, I split off the underlying
EntityAllocatorfromEntities. This makes it easier to read, etc, now that it's more involved.The basic problem is that we need to be able to allocate an entity from any thread at any time. We also need to be able to free an entity. So at the allocator level, that's 3 operations:
free,alloc(For when you knowfreeisn't being called), andremote_alloc(can be called any time). None of these can require mutable access.The biggest challenge is having a list of entities that are
freeand waiting to be re-used. The list needs to be fully functional without mutable access, needs to be resizable, and needs to be pinned in memory. I ended up using a strategy similar toSplitVec. That dependency requiresstd, and knowing the max capacity ahead of time lets us simplify the implementation, so I made my own implementation here.Testing
No new tests right now. It might be worth using loom at some point, but that requires an additional dependency, test-specific loom feature flags, and giving this treatment to multiple crates, especially bevy_platform.
Future work
#18577 is still a good end game here IMO. Ultimately, (just like @maniwani said would happen), I decided that doing this all at once would be both too challenging and add too much complexity. However, v9 makes "never flush" much, much more approachable for the future. The biggest issues I ran into were that lots of places hold a reference to an entity's
Archetype(but the entity now might not have an archetype) and that checking archetypes everywhere might actually be less performant than flushing. Maybe.We can also potentially speed up a lot of different processes now that
alloccan be called without mutable access andfree(etc.) can be called without needing toflushfirst.Costs
Benchmarks
Interpreting benches:
In most places, v9 is on par with or even faster than main. Some notable exceptions are the "sized_commands" and "fake_commands" sections, but the regression there is purely due to
Entities::flushbeing slower, but we make up for that elsewhere. These commands don't actually do anything though, so this is not relevant to actual use cases. The benchmarks just exist to stress testCommandQueue.The only place where v9 causes a significant and real-world applicable regression is "spawn_commands", where v9 is roughly 15% slower than main. This is something that can be changed later now that
allocdoesn't need mutable access. I expect we can change this 15% regression to a 15% improvement given that "spawn_world" is roughly 20% faster on v9 than on main. For users that need really fast spawn commands though, they are already using some form of batch spawning or direct world access.Other regressions seem to be either minimal, unrealistic, easily corrected in the future, or wrong. I feel confident saying "wrong" since running them back to back can sometimes yield different results. I'm on a M2 Max, so there might be some things jumping from perf cores to efficiency cores or something. (I look forward to standardized benchmarking hardware.)
Wins: I was worried that, without "never flush", this would be an overall regression, but I am relived that that is not the case. Some very common operations, "insert_simple/unbatched" for example, are way faster on this branch than on main. Basically, on main,
allocalso addsEntityMetafor the entity immediately, but on this branch, we only do so inset. That seems to improve temporal cache locality and leads to this roughly 220% improvement. "added_arhcetype" sees 20%-80% improvements too, etc. "iter_simple/foreach_wide" also sees a 270% improvement.I think in practice, v9 will out perform main for real-world schedules. And I think moving towards "never flush" (even for only a few operations, like
Commands::spawn) will improve performance even more.