Skip to content

Conversation

asfernandes
Copy link
Member

No description provided.


for (unsigned i = 0; i < ITERATION_COUNT; ++i)
{
std::lock_guard globalMutexGuard(globalMutex);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct pattern to use the lock manager with ASTs?
Without both globalMutex and localMutex, problems appears.
I can't see why globalMutex should be necessary.
On the other hand, localMutex seems necessary by design, as enqueue method locks something but only a moment later the lock id is saved in the lock object, which is the object registered as parameter of the AST routine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't run it yet, so just a few notes after reading the code:

  • so far I also see no need in globalMutex
  • localMutex is necessary as it protect threadData internals
  • localMutex should be unlocked/locked in Callbacks::checkoutRun()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the localMutex checkout, but globalMutex is still required to not crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two problems.

First one is the worker thread shutdown sequence:

  • locks are deleted before shutdownOwner() and leave possibility for AST's to access already freed object,
  • shutdownOwner() is called with callbacks and unlocked mutex and, most important, it waits for AST's delivery inside lock manager (here callback's mutex is used). This allows AST's to dequeue already dequeued lock request.

Correct sequence is to dequeue locks, call shutdownOwner() with mutex locked, and then delete locks finally.
Also, AST handler should know that lock is already dequeued and don't call dequeue() second time.

Second issue is related with enqueue():

  • there is possibility that AST handler will run after lock request is granted but before its id is known (assigned) to the external lock object. In our case AST handler call dequeue() with lockId == 0.

Attached patch fixed all this issues: lm_ast_test.patch

PS I've added threadId and ownerHandle to the ThreadData to make troubleshooting more easily.

Copy link
Member

@hvlad hvlad Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed patch, first one missed lock->locked = true;
lm_ast_test.patch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Now some doubts related to LCK interfaces.
LCK_lock internal and its usage seems to not have such treatment of check lck_id in AST or check blocking after lock is acquired.
So, how AST are not missed when engine code uses LCK_lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LCK is (almost) simple wrapper around lock manager API.
Similar checks and flags are implemented within the objects that own the locks. Many objects uses flags like blocking.
Race condition when lock is acquired first time and released by AST before lock_id is assigned is not common and was fixed in the past. It was with monitoring lock, iirc

#include <latch>
#include <memory>
#include <memory>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second include of the same file

@asfernandes asfernandes merged commit 51073d4 into master Sep 3, 2025
42 of 46 checks passed
@asfernandes asfernandes deleted the work/lockmanager-ast-test branch September 3, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants