-
-
Notifications
You must be signed in to change notification settings - Fork 33k
test: fix timeout in test-wasi-pthread for CI stability #59522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review requested:
|
FWIW the failure was unrelated to timeouts, it was caused by
|
I thought it was time out my bad. |
@joyeecheung should i change the 5 seconds timeout to 1 second time out again? |
Did you mean that you want to just remove the flaky status in the PR? I don't think it's already proven that the flake is not flaky, so if you just remove it, it can fail the CI again. One passing CI does not mean that it would never reproduce again, that's the nature of flakes - it occasionally fail unrelated PRs but frequent enough to make the CI unusable because people keep seeing failures unrelated to their changes and have to run the CI many times to get a passing one. |
I understood what you I'll remove that change and just keep the change of 1 second sleep to 5 second if it helps |
@joyeecheung i think this could fix the wasi flaky error i added the retry logic |
test/fixtures/wasi-preview-1.js
Outdated
], | ||
workerData, | ||
}); | ||
let worker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file don't seem very relevant to me, ERR_WORKER_INIT_FAILED
is emitted when a Node.js worker thread can't be initialized - mostly when the V8 heap cannot be initialized, but the flake happens when a native pthread can't be created, the test is structured in a way to load that WASM module that spawns the pthread and it's the module that's crashing, not the driver worker. Those are two different initializations and we are not seeing failures of worker thread initializations in the CI, only the pthread creation failures, so there's no need to change anything in how the driver worker is spawn.
test/fixtures/wasi-preview-1.js
Outdated
@@ -106,7 +128,7 @@ assert.strictEqual(wasiPreview1.wasiImport, | |||
throw new Error(e); | |||
}); | |||
|
|||
const r = Atomics.wait(result, 0, 0, 1000); | |||
const r = Atomics.wait(result, 0, 0, 5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed - again the failures don't seem related to the driver worker.
break; // Success | ||
} | ||
|
||
// If it's a resource issue (EAGAIN/ENOMEM), retry with a small delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to reproduce locally to see exactly what error this is?
@joyeecheung I was able regenerate the same error and was able to test it again and tried to fix it. I think you can review it now |
@joyeecheung can you please review this pr now pleasee |
Please refrain from continuous pinging. Looking into flakes is something that I only do at my spare time. |
I'm sorry I didn't know my bad🙏 |
FWIW I don't think I've seen this in the CI (https://ci.nodejs.org/job/node-test-commit-arm-debug/ looks mostly green with only other failures every now and then) so that might've already been fixed by the split. |
gotcha I can close this PR and try to see if there is anything else I can work. Thank you @joyeecheung for letting me know |
Summary
This PR fixes the root cause of the
test-wasi-pthread
flakiness by adding retry logic for resource exhaustion scenarios, rather than just increasing timeouts.Problem
As identified by @joyeecheung, the issue was not timeouts but
pthread_create
returning errors due to resource constraints in CI environmentsAssertion failed: r == 0 (c/pthread.c: main: 17)
Solution
pthread_create
failures (EAGAIN/ENOMEM)