-
Notifications
You must be signed in to change notification settings - Fork 72
CBL-7699: Flaky test, Database BackgroundDB torture test #2406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Code Coverage Results:
|
85067ae to
b120683
Compare
There is a race between _housekeeper = nullptr in CollectionImpl::stopHousekeeping and enqueue in Housekeeper::doExpirationAsync. This PR attempts to fix it.
b120683 to
b91fbc8
Compare
|
This test stands at top in terms of frequency of random failures in GH or Jenkins. I have never seen it happening on my working Mac. However, I can repro and see the new code being exercised by adding a sleep of 30ms as the first statement in Housekeeper::doExpirationAsync. (not 100% but after several runs) |
snej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too complicated.
The problem is that the timer can fire concurrently with being stopped -- its callback (the call to doExpirationAsync) could be running after _expiryTimer.stop() returns, and so the Housekeeper could be freed.
I think this can be fixed by (a) making the timer a unique_ptr, and (b) deleting it, not just stopping it, in Housekeeper::_stop. The Timer's callback is guaranteed not to be running after its destructor returns.
|
Make sense. Do you think we should make the housekeeper re-startable after stop(), or simply document it as not re-startable? |
68758da to
015bdf1
Compare
015bdf1 to
9f00a4c
Compare
| void _start(); | ||
| void _stop(); | ||
|
|
||
| bool _isStopped() const { return !_expiryTimer; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is needed to deal with another race between A. releasing of Database point, and B. using of BackgroundDB in actor's methods of Housekeeper. We bail out in the actor's methods if _isStopped()==true.
Another approach is to retain the database at the time to enqueue. Maybe this is overkill. What do you think?
There is a race between _housekeeper = nullptr in CollectionImpl::stopHousekeeping and enqueue in Housekeeper::doExpirationAsync. This PR attempts to fix it.