3.13.7 Shovel's rabbit_shovel_dyn_worker_sup_sup can fail without being restarted (exceed supervisor restart intensity?) #14791
-
Describe the bugEnvironment
We observed that a dynamic shovel supervisor process ( 2025-10-23 05:31:49.714145+00:00 [info] <0.211347036.1> Waiting for Mnesia tables for 30000 ms, 9 retries left After this point: rabbitmqctl eval 'supervisor2:which_children(rabbit_shovel_dyn_worker_sup_sup).'
# => {:noproc, {:gen_server, :call, [:rabbit_shovel_dyn_worker_sup_sup, :which_children, :infinity]}}
### Reproduction steps
Create a dynamic shovel in vhost / (e.g. “Move from Queque_error”).
Trigger a cluster sync event (e.g., restart one node or cause Mnesia table resync).
Attempt to delete and recreate the shovel on the affected node (rabbit@rabbit-01).
Observe noproc errors in logs and that the supervisor process no longer exists.
Apply the same shovel on another node (rabbit@rabbit-02) → shovel works correctly.
### Expected behavior
The shovel supervisor should remain running or automatically recover after Mnesia sync or parameter update.
New shovels should start normally on all nodes without requiring a restart.
### Additional context
On one node, the shovel supervisor crashes and stays dead.
New shovel definitions cannot start workers.
The same definition works fine on a different cluster node.
Only restarting the rabbitmq_shovel application or node recovers functionality. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
@kubrakaraman6 RabbitMQ 3.13.x is out of community support. But the Same Fundamental Problem Applies to 4.x?That's correct, 4.x still uses This is one of the reasons why in Tanzu RabbitMQ What that means for Those who want distributed shovels will have to get a Tanzu RabbitMQ license. This is the collective price we pay for core RabbitMQ being open source and a very significant amount of effort being invested into the open source edition. |
Beta Was this translation helpful? Give feedback.
-
This means that in your specific case, it likely was only the local This could potentially be a matter of configuring the supervisor tree restart intensity settings around And it just happens to be the case that Tanzu RabbitMQ Be that as it may, a significant shovel or |
Beta Was this translation helpful? Give feedback.
@kubrakaraman6 RabbitMQ 3.13.x is out of community support.
But the Same Fundamental Problem Applies to 4.x?
That's correct, 4.x still uses
mirrored_supervisorand it hosts all shovels on a single node.Like all Mnesia-based features,
mirrored_supervisor's fundamental limitations are considered to be unfixable by our team. You can only replace it with something else, like Mnesia was replaced by/with Khepri and new4.2.0clusters will use Khepri by default.This is one of the reasons why in Tanzu RabbitMQ
4.2.0, we have a new plugin that offers distributed shovels that, as the name suggests, distributes shovels on all nodes. As you can imagine, the key corporate sponsor of RabbitMQ is not …