Skip to content

Conversation

@Yarwin
Copy link
Contributor

@Yarwin Yarwin commented Dec 13, 2025

What problem does this PR solve?

Currently without experimental threads when user ask for a borrow/mut, we get a cell with user data ("rust part"), lock its state (state being, pretty much, a refcount) with mutex and check if it can be safely handled to the user (i.e. we make sure that if you get &T, it refers to safe &T – i.e. there isn't any active &mut T – and vice versa for&mut T. We allow for re-entrancy with base_mut(), but that's not important for now).

Without experimental-threads ("panicking" access) we don't allow to access binding from other threads than the main one at all (i.e. no data race is possible), while with experimental-threads ("blocking" access) enabled we handle locking/blocking via the thread tracker (i.e. no data race is possible because every thread needs to wait to acquire their borrow) thus this additional lock is just silly.

(and it makes rapier sad!)

Acknowledges etc

Rapier uses version of GdCell without mutex which has been working great despite violating stacked borrows (see: Unsafe Code Guidelines: Stacked Borrows) with MutGuard (both in lilyz and mine versions 😅) and probably data race for non-blocking version 😒.

Miri was Mad
test cell::test::allow_inaccessible_mut_mut ... error: Undefined Behavior: trying to retag from <146066> for Unique permission at alloc47455[0x0], but that tag does not exist in the borrow stack for this location
   --> godot-cell/src/cell.rs:324:14
    |
324 |         drop(guard1);
    |              ^^^^^^
    |              |
    |              this error occurs as part of retag (of a reference/box inside this compound value) at alloc47455[0x0..0x30]
    |              errors for retagging in fields are fairly new; please reach out to us (e.g. at <https://rust-lang.zulipchat.com/#narrow/stream/269128-miri>) if you find this error troubling
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <146066> was created by a Unique retag at offsets [0x0..0x30]
   --> godot-cell/src/cell.rs:306:26
    |
306 |         let mut guard1 = cell.borrow_mut().unwrap();
    |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <146066> was later invalidated at offsets [0x0..0x30] by a Unique retag
   --> godot-cell/src/guards.rs:220:35
    |
220 |         let cell_state = unsafe { state.as_mut() }.unwrap();
    |                                   ^^^^^^^^^^^^^^
    = note: BACKTRACE (of the first span) on thread `cell::test::all`:
    = note: inside `cell::test::allow_inaccessible_mut_mut` at godot-cell/src/cell.rs:324:14: 324:20

Said custom version can be found there: https://github.com/Yarwin/gdext/tree/custom-rapier-branch.

Version present in this PR obviously avoids this problem by using UnsafeCell for panicking Guards.

Other stuff

It took me a while, since I was experimenting with other approaches for cells (for example AtomicRefCell with atomic usize – allows for access from multiple threads, panics if multiple borrow_mut are requested. Solves some problems – for example editor accessing bindings from non-main thread to check properties – but might cause giant headache when it comes to re-entrancy).

We won't change our approach for a while I guess, so we can remove this silly mutex in a meanwhile 🤷.


  • Remove Mutex from GdCellInner since it doesn't serve any purpose – without experimental-threads we don't allow to access bindings from threads other than the main one, while blocking cell used with experimental-threads has its own blocking mechanism.
  • Use NonNull instead of *mut T for CellState.

@Yarwin Yarwin added c: core Core components performance Performance problems and optimizations labels Dec 13, 2025
@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1442

@Bromeon
Copy link
Member

Bromeon commented Dec 13, 2025

Rapier uses version of GdCell without mutex which has been working great despite violating stacked borrows (see: Unsafe Code Guidelines: Stacked Borrows) with MutGuard (both in lilyz and mine versions 😅).

Using UnsafeCell obviously avoids this (theoretical?) problem.

Miri was Mad

This sounds a lot like "we might introduce UB but let's just pray" 😬

Are you sure this is sound? If it is, we should find a way to make it work with miri, no? That's pretty much its purpose, after all...

@Yarwin Yarwin marked this pull request as draft December 13, 2025 11:37
@Yarwin
Copy link
Contributor Author

Yarwin commented Dec 13, 2025

Draft – failing CI is expected but MutGuardBlocking was using CellState's Mutex for blocking (which can cause data races which is UB). I'll probably wrap it in Mutex

@Yarwin
Copy link
Contributor Author

Yarwin commented Dec 13, 2025

Are you sure this is sound? If it is, we should find a way to make it work with miri, no? That's pretty much its purpose, after all...

Yes, current version – presented in this PR – is sound and Miri doesn't report anything. The troublesome version is present on custom branch for rapier: https://github.com/Yarwin/gdext/tree/custom-rapier-branch.

@Yarwin
Copy link
Contributor Author

Yarwin commented Dec 13, 2025

GdCellInner no longer implements Sync and blocking guards (including a new one, InaccessibleGuardBlocking) use thread_tracker to provide thread safety when using GdCell.

I thought about using Pin<Arc<Mutex<T>>> for GdCellBlocking's inner but it is kinda pointless 🤔 – practically state is going to be guarded by thread_tracker anyway.

We might want to change name of panicking Cell module to single_threaded.

Output of running miri against current branch:

$ cargo +nightly miri test -p godot-cell --
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.02s
     Running unittests src/lib.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/godot_cell-5969cd65ba219090)

running 7 tests
test borrow_state::test::poisoned_unset_shared_ref ... ok
test cell::test::allow_inaccessible_mut_mut ... ok
test cell::test::allow_shared_shared ... ok
test cell::test::different_inaccessible ... ok
test cell::test::prevent_mut_mut ... ok
test cell::test::prevent_mut_shared ... ok
test cell::test::prevent_shared_mut ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.85s

     Running tests/mock/main.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/mock-8c8972fec9dd3efe)

running 7 tests
test blocking::calls_parallel ... ok
test blocking::calls_parallel_many_parallel ... ok
test blocking::calls_parallel_many_serial ... ok
test blocking::no_mut_panic_on_main ... ok
test blocking::non_blocking_reborrow ... ok
test panicking::all_calls_work ... ok
test panicking::call_works ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 15.82s

   Doc-tests godot_cell

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@Yarwin Yarwin marked this pull request as ready for review December 13, 2025 14:49
Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Gave it a preliminary review.

Lots of parts are not really clear to me -- either they rely on implicit assumptions or lack documentation such as safety statements. There are also several code constructs that look suspicious or overly complex. I'm not usually that pedantic about it, but godot-cell is such a central crate that it's important everything is as clear as possible.

}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were some decent tests, can we at least run them in experimental-threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are still included in tests/mock/blocking.rs


#[doc(hidden)]
pub fn can_drop(&self) -> bool {
let state = unsafe { self.state.get().as_mut() }.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs safety statement. Please keep any safe operation outside this statement.

let mut guard = state.lock().unwrap();

let current_ptr = guard.get_ptr();
let cell_state = unsafe { state.get().as_mut() }.unwrap();
Copy link
Member

@Bromeon Bromeon Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines 58 to 59
unsafe { self.state.get().as_mut() }
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd occurrence of this -- maybe extract to unsafe fn. Still needs safety statement for each call, and separate that from other operations instead of one mega chain combining safe/unsafe ops.

Comment on lines 96 to 105
unsafe {
(&raw mut (*ptr).value).write(UnsafeCell::new(value));
}

cell
// SAFETY: Box::pin(...) is equivalent to Box::into_pin(Box::...) therefore our freshly initialized
// `val` is and will stay valid (i.e. will refer to the same place after pinning).
unsafe {
let val = (&raw const (*ptr).value).as_ref().unwrap();
(&raw mut (*ptr).state).write(UnsafeCell::new(CellState::new(val)));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very convoluted, is there no more straightforward way? Why not at least keep a pointer around instead of 3 times obtaining via &raw?

Are you certain that write is needed here?

ptr could also have a clearer name, since there are so many pointers flying around 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix'd, splitting it into three separate parts was rather poor choice.

I still construct a GdCellInner out of uninitialized box – initialize_ptr function was a little too error prone (and it is easy to introduce UB due to refactor if one would construct fields before the box/pinning).

Comment on lines 114 to 119
{
let state = unsafe { &mut *self.state.get() };
state.borrow_state.increment_shared()?;
}

let state = unsafe { &*self.state.get() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the scope if you call get() immediately again?

Is there a chance of aliasing after the { ... } block, or is an exclusive reference otherwise problematic?

Comment on lines 171 to 174
// SAFETY: `T` is Sync, so we can return references to it on different threads.
// It is also Send, so we can return mutable references to it on different threads.
// Additionally, all internal state is synchronized via a mutex, so we won't have race conditions when trying to use it from multiple threads.
unsafe impl<T: Send + Sync> Sync for GdCellBlocking<T> {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For standard std::sync::Mutex, the Sync trait is implemented as follows:

  impl<T: ?Sized + Send> Sync for Mutex<T>

T must be Send for Mutex to be Sync. This ensures that the protected data can be accessed safely from multiple threads without causing data races or other unsafe behavior.

Doesn't the same reasoning apply here? I.e. T needs to be Send rather than Send + Sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I updated the comment – while it could be true for GdCell it certainly doesn't make sense for GdCellBlocking. Logically – T is send, while GdCellBlocking provides Sync – and T can't be sync, since it would allow messing with it outside the GdCellBlocking.

@Yarwin Yarwin force-pushed the feature/remove-mutex-from-gdcellinner branch from 3d5b385 to 3ae4d55 Compare December 14, 2025 08:34
Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a bit of repetitiveness on accessing the inner reference, could become its own method(s). Might or might not reduce the number of unsafe blocks needed 🤔

Comment on lines 171 to 172
// SAFETY: `T` must be only Send and the only way to access underlying `T` from multiple thread is via `GdCellBlocking`.
// This ensures that the protected data can be accessed safely from multiple threads without causing data races or other unsafe behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAFETY clause should focus on information that makes this impl safe and avoid reduncancies.
I'm not quite sure which one it is here:

  • "the only way to access underlying T from multiple thread is via GdCellBlocking" -> is this something that needs to be externally upheld? Or is it guaranteed by the GdCellBlocking API? If the latter, it should maybe be specified how (e.g. through mut guard).

  • "T must be only Send" -> clear from signature. Unless you mean it must not be Sync, but then that should be stated explicitly (but I don't see why that would be a requirement).

  • " This ensures ... without causing data races or other unsafe behavior." -> that's general soundness in Rust and a consequence of implementing it correctly, not the requirement.

It's not technically wrong what you wrote, but keeping it focused on the exact reasons has less risk of watering down what exactly contributes to this being sound, and as such helps the overall clarity.

state_lock.decrement_current_thread_shared_count();

// SAFETY: guard is dropped exactly once, here.
// SAFETY: guard is dropped exactly once, here, while state is guarded by `_tracker_guard` preventing access from any other thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great safety clause, for example 👍 concise and to the point.

Comment on lines +118 to +119
///
/// See [`panicking::InaccessibleGuard`](crate::panicking::InaccessibleGuard) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to omit this part since you just referenced it.

Comment on lines 134 to 151
/// Drop self if possible, otherwise returns self again.
///
/// Used currently in the mock-tests, as we need a thread safe way to drop self. Using the normal drop
/// logic may poison state, however it should not cause any UB either way.
///
/// See [`panicking::InaccessibleGuard::try_drop`](crate::panicking::InaccessibleGuard::try_drop) for more details.
pub fn try_drop(self) -> Result<(), ManuallyDrop<Self>> {
let mut manual = ManuallyDrop::new(self);
let can_drop = {
let _tracker_guard = manual.state.lock();
manual.inner.can_drop()
};

if !can_drop {
Err(manual)
} else {
// SAFETY: We ensured that guard can be safely dropped by `can_drop`.
unsafe { ManuallyDrop::drop(&mut manual) };
Ok(())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs it's not quite clear why ManuallyDrop<Self> is returned on error, and not Self directly 🤔

Comment on lines 94 to 103
unsafe {
let value_ptr = &raw mut (*uninitialized_cell.as_mut_ptr()).value;
// SAFETY: Box::pin(...) is equivalent to Box::into_pin(Box::...) therefore `value_ptr`
// is and will stay valid (i.e. will refer to the same place after pinning).
value_ptr.write(UnsafeCell::new(value));
(&raw mut (*uninitialized_cell.as_mut_ptr()).state)
.write(UnsafeCell::new(CellState::new(value_ptr.as_ref().unwrap())));

cell
Box::into_pin(uninitialized_cell.assume_init())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split into individual unsafe blocks and extract all safe operations outside. It will be more verbose, but with a per-unsafe // SAFETY statement also more clear.

For example, as_mut_ptr(), UnsafeCell::new(), Box::into_pin() etc. are all safe.

/// happen at the same time.
pub fn is_currently_bound(self: Pin<&Self>) -> bool {
let state = self.state.lock().unwrap();
let state = unsafe { &*self.state.get() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safety statement missing.

If it gets too repetitive and there's the option to make a safe abstraction, e.g. method, due to invariants of the struct... then that's also an option.

/// Similar to [`Self::is_currently_bound`] but only counts mutable references and ignores shared references.
pub(crate) fn is_currently_mutably_bound(self: Pin<&Self>) -> bool {
let state = self.state.lock().unwrap();
let state = unsafe { &*self.state.get() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

Comment on lines +239 to +274
#[allow(clippy::mut_from_ref)]
pub(crate) unsafe fn borrow_state(cell_state: &UnsafeCell<Self>) -> &mut BorrowState {
&mut cell_state.get().as_mut().unwrap().borrow_state
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another referencing of cell_state.get() happening, should probably be its own method at this point 🙂

@Yarwin Yarwin force-pushed the feature/remove-mutex-from-gdcellinner branch from 3ae4d55 to 83cef71 Compare December 15, 2025 18:03
---------

- Remove Mutex from `GdCellInner` since it doesn't serve any purpose – without `experimental-threads` we don't allow to access bindings from threads other than the main one, while blocking cell used with `experimental-threads` has its own blocking mechanism.
- Use NonNull<T> instead of *mut T for `CellState`.
---------

- Use `thread_tracker` to guard GdCellInner.
- Implement `InaccessibleGuardBlocking` which block thread upon dropping.
@Yarwin Yarwin force-pushed the feature/remove-mutex-from-gdcellinner branch from 83cef71 to e2c3610 Compare December 15, 2025 18:16
---------

- Post-CR fixes (to squash later).
- Properly lock cell state while dereferencing RefGuards.
- Safety comments.
@Yarwin Yarwin force-pushed the feature/remove-mutex-from-gdcellinner branch from e2c3610 to 05c2b16 Compare December 15, 2025 18:27
@Yarwin
Copy link
Contributor Author

Yarwin commented Dec 15, 2025

Don't merge it in a meanwhile, I need to give it good testing to make sure everything is alright (should be)

@Bromeon Bromeon changed the title Remove Mutex from GdCellInner. Remove Mutex from GdCellInner Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: core Core components performance Performance problems and optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants