Skip to content

Conversation

@cwoffenden
Copy link
Collaborator

@cwoffenden cwoffenden commented Dec 4, 2025

This is a complete rethink of the lock test to do something locks would actually be used for: maintain atomicity of a complex data structure. We have this dummy struct with three members:

typedef struct {
  uint32_t val0;
  uint32_t val1;
  uint32_t val2;
} Dummy;

A series of calculations are performed thousands of times from both the main and audio thread, involving all the members. Without locks the steps are interleaved, resulting in the wrong result at the end.

It simulates a queue, for example, or any other container. Define DISABLE_LOCKS to run lock-free, and it will fail (with a shorter number of iterations sometimes it would get lucky, but running intensively for a few seconds ensures failure every time).

The unlock is kept to a short 10ms and will assert if failed to acquire: The unlock was changed to 1s, with 10ms looking too short for CI:

int have = emscripten_lock_busyspin_wait_acquire(&testLock, 1000);

Currently that's 4M calls.

dummy->val2 += dummy->val0 * dummy->val1;
dummy->val0 /= 4;
dummy->val1 /= 3;
dummy->val2 /= 2;
Copy link
Collaborator

@sbc100 sbc100 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid these dummy calculation completely, avoiding the use of magic number by just doing something like this:

// This assertion will fail if the the assignment operators on the main thread vs the worker are interleaved
assert(dummy->val0 == dummy->val1 == dummy->val3);
int newval = dummy->val0 + 1;
dummy->val0 = newval;
dummy->val1 = newval;
dummy->val2 = newval;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll run a test with assigning the same value multiple times (with enough variance each time, a simple increment isn’t enough). It needed to be intensive and for long enough before the clashes started.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about and shared 1024 block:

char g_shared[1024];

With each thread doing:

memset(g_shared, THREAD_ID, 1024):
for (int i = 0; i < 1024; i++) assert(g_shared[i] == THREAD_ID);

Surely that would be enough to detect any kind of interleaving?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you could just increate 1024 to larger number to make the interleaving more likely?

Copy link
Collaborator Author

@cwoffenden cwoffenden Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the test without magic numbers.

It doesn't clash enough, reverting. Interestingly, doing this doesn't clash:

    int dummy_temp0  = dummy->val0;
    dummy->val0 += dummy->val1 * 7;
    dummy->val1 += dummy->val2 * 7;
    dummy->val2 += dummy_temp0 * 7;

I'll need to come back to it later to remove the magic numbers. Testing with --repeat=1000 I can get 100% passes, and then 100% fails when defining DISABLE_LOCKS, and on multiple machines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers removed by calculating them in a single thread, then comparing the multi-thread result.

@cwoffenden cwoffenden requested a review from sbc100 December 9, 2025 16:33
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test still seems somewhat convoluted to me, but perhaps that is just how has to be.

@juj WDYT?

@cwoffenden
Copy link
Collaborator Author

The test still seems somewhat convoluted to me, but perhaps that is just how has to be.

Too few iterations can accidentally work, and same for running for too short a time. Same with the set-up making sure both threads are running, without synchronisation it can end up running the two threads serially.

@juj
Copy link
Collaborator

juj commented Dec 9, 2025

The test looks definitely simpler than before, nice work on the PR. The previous test looks quite a bit like a kitchen sink test.

Maybe the most complexity in the test doesn't come from what the dummy calculation is, but rather the synchronization to ensure that main thread and audio worklet thread are computing at the same time. If there was a way to simplify to remove some of the switch-case states, that might help make the test read cleaner.

Though LGTM to me. Was the conclusion with the Chrome Audio Worklet deadlock hang bug that it was a browser issue, or a non-issue of some kind?

@cwoffenden
Copy link
Collaborator Author

Maybe the most complexity in the test doesn't come from what the dummy calculation is, but rather the synchronization to ensure that main thread and audio worklet thread are computing at the same time.

Exactly that, and it's all too easy to run them sequentially. The timeout can end up delaying for a long period before starting (on Chrome), and the worklet won't start until audio playback begins, so this ensures both are ready.

Was the conclusion with the Chrome Audio Worklet deadlock hang bug that it was a browser issue, or a non-issue of some kind?

It seems to be on purpose from Chrome's side, and my conclusion was the test was creating the perfect conditions to trigger some timeout abuse mitigation (CPU hammering whilst spinning for long periods, plus no sound output whilst spinning in the worklet too, resulting in callback delays being stretched out, then the test timing out).

This test scraps the long spinning and ping-poinging between threads for this simpler, more realistic atomic update*. --repeat=1000 with DISABLE_LOCKS defined should fail every time (and pass every time otherwise).

*based on what we do with shipping code and a message queue, which is what kept me coming back to solve this.

@cwoffenden
Copy link
Collaborator Author

cwoffenden commented Dec 9, 2025

Hmm, looks like the 10ms timeout is too short on a CI VM for a lock acquire (lengthened and left running for 5000 repeats without problems).

@cwoffenden cwoffenden force-pushed the cw-aw-rethink-locks branch 2 times, most recently from 1b71c91 to d2e6b57 Compare December 10, 2025 05:56
@cwoffenden
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants