-
Couldn't load subscription status.
- Fork 475
Store graph listener inside the context instead of the node graph #2952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store graph listener inside the context instead of the node graph #2952
Conversation
start_if_not_started in GraphListenerstart_if_not_started in GraphListener
|
Seems that the one failing test |
|
Why the new interface, would it not be sufficient to move the lock down in start_if_not_started() ? |
My original thinking was that the behavior I factored out seemed like it could be its own functionality and that If others agree that the check for |
start_if_not_started in GraphListenershutdown_mutex_ in GraphListener::start_if_not_started
|
I guess the shutdown handler needs to be registered after the start. If not, it could be called (In the super race from hell) before the listener thread is started and you'll end up in a weird state. |
|
ok, after speaking with @wjwwood at today's client library WG meeting, I think we've got a solid idea for how to move forward. Rather than having the |
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
Signed-off-by: Skyler Medeiros <[email protected]>
06b4b29 to
0440c31
Compare
shutdown_mutex_ in GraphListener::start_if_not_started|
Along with the known flaky test, the only other failing test in CI appears to be another extremely rare flake. from my clean rolling workspace: |
|
Pulls: #2952 |
Signed-off-by: Skyler Medeiros <[email protected]>
|
@skyegalaxy thanks for fixing this up. lgtm with just a nit, which is just comment block. i do not think we need to re-run the CI after the comment fix. |
Done! 0cbaecf |
Signed-off-by: Skyler Medeiros <[email protected]>
7e70733 to
0cbaecf
Compare
|
@fujitatomoya thanks for reviewing and merging my PR! Since we caught this in Jazzy, Is this worth backporting to older distros as well? |
|
@Mergifyio backport kilted jazzy |
✅ Backports have been created
|
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]> (cherry picked from commit 715a983) # Conflicts: # rclcpp/src/rclcpp/context.cpp
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]> (cherry picked from commit 715a983) # Conflicts: # rclcpp/src/rclcpp/context.cpp
…s2#2952) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <[email protected]> * run ament_uncrustify Signed-off-by: Skyler Medeiros <[email protected]> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <[email protected]> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <[email protected]> * whitespace Signed-off-by: Skyler Medeiros <[email protected]> * fix typo Signed-off-by: Skyler Medeiros <[email protected]> * move graph listener into the context Signed-off-by: Skyler Medeiros <[email protected]> * make cpplint happy Signed-off-by: Skyler Medeiros <[email protected]> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <[email protected]> --------- Signed-off-by: Skyler Medeiros <[email protected]>
Description
This PR fixes a lock order inversion bug detected by Thread Sanitizer between
GraphListener::shutdown_mutex_andContext::on_shutdown_callbacks_mutex_, by storing and instantiating theGraphListenerinside of theContextinstead ofNodeGraph. With this change, we avoid locking the context weak pointer and inverting the order of lock acquisition on startup and shutdown.Usage of
is_shutdown_->load()was also replaced with the existingis_shutdown()method inGraphListenerfor consistency.Fixes # (2946)
Is this user-facing behavior change?
no
Did you use Generative AI?
no
Additional Information
This code appears to have remained unchanged for a long time, and I was able to reproduce this tsan error on iron, jazzy and rolling with this example. Tsan identified a lock order inversion as well as one potential data race in other parts of the code. With this change, the lock order inversion warning is gone, no additional tsan warnings were introduced, and all unit tests passed locally when these changes were built against both
jazzyandrolling.