Skip to content

Thread issue causing crashes on ROS2 Jazzy (but not Humble) #121

@Geibinger

Description

@Geibinger

Environment

  • OS: Ubuntu 24.04
  • ROS2: Jazzy (issue does NOT occur on Humble)
  • Branch: humble branch of this repo
  • Container: Reproduced in osrf/ros:jazzy-desktop

Problem

When running the official examples on ROS2 Jazzy, the executor crashes with exit code -4:

Having the jazzy-desktop container as a base, I built the package after installing ros-jazzy-generate-parameter-library and ros-jazzy-behaviortree-cpp. Then cloning and building this repo allowed me to try the samples:

ros2 launch btcpp_ros2_samples sample_bt_executor.launch.xml

in one terminal, then in the other:

ros2 action send_goal /behavior_server btcpp_ros2_interfaces/action/ExecuteTree "{target_tree: SleepActionSample}"

Here is the output from the executor:

[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [sample_bt_executor-1]: process started with pid [1414157]
[INFO] [sleep_server-2]: process started with pid [1414158]
[sample_bt_executor-1] [INFO] [1763731488.804993226] [bt_action_server]: Starting Action Server: behavior_server
[sample_bt_executor-1] [INFO] [1763731488.809427037] [bt_action_server]: Loaded ROS Plugin: libdummy_nodes_dyn.so
[sample_bt_executor-1] [INFO] [1763731488.809579223] [bt_action_server]: Loaded ROS Plugin: libcrossdoor_nodes_dyn.so
[sample_bt_executor-1] [INFO] [1763731488.809748611] [bt_action_server]: Loaded ROS Plugin: libmovebase_node_dyn.so
[sample_bt_executor-1] [INFO] [1763731488.812012495] [bt_action_server]: Loaded ROS Plugin: libsleep_plugin.so
[sample_bt_executor-1] [INFO] [1763731488.812275659] [bt_action_server]: Loaded BehaviorTree: cross_door.xml
[sample_bt_executor-1] [INFO] [1763731488.812329190] [bt_action_server]: Loaded BehaviorTree: sleep_action.xml
[sample_bt_executor-1] [INFO] [1763731488.812367832] [bt_action_server]: Loaded BehaviorTree: door_closed.xml
[sample_bt_executor-1] [INFO] [1763731490.762410319] [bt_action_server]: Received goal request to execute Behavior Tree: SleepActionSample
[sample_bt_executor-1] [1763731490.766]: Sequence                  IDLE -> RUNNING
[sample_bt_executor-1] Robot says: start
[sample_bt_executor-1] [1763731490.766]: SaySomething              IDLE -> SUCCESS
[sample_bt_executor-1] [1763731490.766]: sleepA                    IDLE -> RUNNING
[sleep_server-2] [INFO] [1763731490.766555989] [sleep_action_server]: Received goal request with sleep time 2000
[sleep_server-2] [INFO] [1763731490.766761915] [sleep_action_server]: Executing goal
[sleep_server-2] [INFO] [1763731490.766889635] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731490.967053364] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731491.167007710] [sleep_action_server]: Publish feedback
[ERROR] [sample_bt_executor-1]: process has died [pid 1414157, exit code -4, cmd '/ws/install/btcpp_ros2_samples/lib/btcpp_ros2_samples/sample_bt_executor --ros-args --params-file /ws/btcpp_ros2_samples/share/btcpp_ros2_samples/config/sample_bt_executor.yaml'].
[sleep_server-2] [INFO] [1763731491.367092511] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731491.567558818] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731491.767298470] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731491.967016201] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731492.166999441] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731492.367449688] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731492.568751315] [sleep_action_server]: Publish feedback
[sleep_server-2] [INFO] [1763731492.768100093] [sleep_action_server]: Goal succeeded

The action client cannot be terminated and seems to be in a locked state:

Waiting for an action server to become available...
Sending goal:
     target_tree: SleepActionSample
payload: ''

Goal accepted with ID: ce38f34071e748eaa971ef41ff3bea27

^CCanceling goal...
^C^C^C^C^C^C

Note that the samples work without issue on the humble container with the same procedure.

Root Cause

After investigation, I found race conditions in the ROS2 action callbacks:

  • Callbacks (feedback, result, goal_response) run in separate threads without mutex protection
  • They access shared state (goal_handle_, result_, on_feedback_state_change_) concurrently with the tick loop
  • goal_handle_ can be dereferenced in callbacks after being reset in tick()

The issue appears on Jazzy but not Humble, probably due to changes in the ROS2 executor implementation between versions.

Proposed Fix

I have working changes that add:

  1. Mutex protection to all ROS2 action callbacks
  2. Null pointer checks before dereferencing goal_handle_
  3. Non-blocking behavior in tick() by moving spin_some() before mutex acquisition
  4. Similar fixes for topic subscriber callbacks

With these changes, the examples run successfully on Jazzy.

Files affected:

  • behaviortree_ros2/include/behaviortree_ros2/bt_action_node.hpp
  • behaviortree_ros2/include/behaviortree_ros2/bt_topic_sub_node.hpp

Question

Should I open a pull request with these changes? I am unsure what the current state of this repository is for the Jazzy release.

here is the fork where the changes can be seen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions