-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
Before Creating the Bug Report
-
I found a bug, not just asking a question, which should be created in GitHub Discussions.
-
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
-
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
windows11
RocketMQ version
RocketMQ version 4.6.1, but the latest version has the same bug.
JDK Version
jdk1.8.361
Describe the Bug
In the transaction message check service, once an exception of any kind throws, the current round timer task will end at once without pushing the half and op queue offset forward.
As a result, the next round timer task will start at the same half offset and op offset. We named the current half offset X.
At the same time, the mq clients are sending and committing transaction messages all the time, so the broker has generated so many op messages that the next round timer task can not reach the op queue ending in MAX_PROCESS_TIME_LIMIT(default 60000) millseconds. So the X half message will not be checked back, and its property TRANSACTION_CHECK_TIMES can not grows by 1.
And the next next round timer task do the same.
So finally, the check back timer task will do the nonsense loop on the X offset one round and one round again, until forever.
Steps to Reproduce
Send transaction messages with multi threads all the time.
Retrun commit or unknow ocasionally.
In order to speed up the situation happens, we can make an excetpion in check service artificially.
What Did You Expect to See?
- Loops only max check times for the current X half offset when an exception happens.
- Push forward the halp and op queue offset should be in finally block.
What Did You See Instead?
The check service will do the nonsense loop on the X offset one round and one round again, until forever.
Additional Context
No response