Skip to content

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Aug 27, 2025

The checkMailbox callback can occur after the thread has terminated. In this case calling into native code can trigger the makeAbortWrapper wrapper that is put around each native function which then results in a "program has already aborted!" error being thrown.

Once solution to this is to make sure that the function which are called do not have makeAbortWrapper applied to them.

This was the technique I used in #18754, but the list of functions became stale when emscripten_proxy_execute_task_queue was removed in #18852.

A better solution is to wrap to whole function in callUserCallback, which takes care of checking if the runtime is alive before calling into native code.

Fixes: #20067

@sbc100 sbc100 requested review from tlively and juj August 27, 2025 18:58
// MINIMAL_RUNTIME doesn't support the runtimeKeepalive stuff, but under
// some circumstances it supportes `runtimeExited`
#if EXIT_RUNTIME
if (runtimeExited) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we are building in -sEXIT_RUNTIME=0 mode, and some code calls abort(), and then this async timeout for checkMailbox() triggers? Wouldn't that result in this safeguard passing right through into calling _pthread_self() again, since in EXIT_RUNTIME=0 mode this check was compiled out altogether?

_pthread_self() is a C compiled function, so if the Worker is not hosting a pthread and hence doesn't have an active program stack, then even entering that function will not be safe (even though it might return 0;), since it could e.g. do a stack bump into corrupted space.

Copy link
Collaborator Author

@sbc100 sbc100 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be a problem for MINIMAL_RUNTIME, since it doesn't track ABORT like the normal runtime does.

Do we have any other signal to know of the runtime is alive other than ABORT, runtimeExited and/or pthread_self() == null? i.e. could we do any better?

I've love to consolidate those 3 myself into single "is_runtime_ok_or_valid_right_now" things, so adding yet another piece of state seems like the wrong direction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be a problem for MINIMAL_RUNTIME, since it doesn't track ABORT like the normal runtime does.

I mean just in regular runtime, if one builds with -sEXIT_RUNTIME=0, then the above if (runtimeExited) build won't be compiled in, and if a pthread calls abort, then that would set ABORT = true; in the Worker and shut down its runtime? But this code wouldn't catch it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other implementation of callUserCallback for the regular runtime. It does include that check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, right - missed that this was the MINIMAL_RUNTIME path. That makes sense. LGTM.

@juj
Copy link
Collaborator

juj commented Aug 27, 2025

Thanks for looking into this. I'm happy with this as long as #25066 passes. Although if the checks are narrower, it might need to be expanded to cover both -sEXIT_RUNTIME=0/1 options to make sure both cases are ok.

The `checkMailbox` callback can occur after the thread has terminated.
In this case calling into native code can trigger the `makeAbortWrapper`
wrapper that is put around each native function which then results in a
"program has already aborted!" error being thrown.

Once solution to this is to make sure that the function which are called
do not have `makeAbortWrapper` applied to them.

This was the technique I used in emscripten-core#18754, but the list of functions
became stale when emscripten_proxy_execute_task_queue was removed in emscripten-core#18852.

A better solution is to wrap to whole function in callUserCallback,
which takes case of checking if the runtime is alive before calling into
native code.

Fixes: emscripten-core#20067
@sbc100 sbc100 merged commit 7cd7ece into emscripten-core:main Aug 28, 2025
30 checks passed
@sbc100 sbc100 deleted the fix_checkMailbox branch August 28, 2025 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_abort_on_exceptions_pthreads is flaky
3 participants