-
Notifications
You must be signed in to change notification settings - Fork 183
Closed
Milestone
Description
Background
In a scenario where 1 CN is streaming to 1 BN (only available), the BN starts to slowly fall behind and eventually the block being streamed is no longer in the block buffer (because writerMode = FILE_AND_GRPC and streamMode = BOTH).
The CN then gets into a repeating pattern of ending the stream, trying to stream the current block, receiving behind, sending too far behind, restarting stream with 0ms delay.
This leads to excessive log spam (will be fixed in separate issue with log statements at DEBUG level) and excessive exception throwing as streaming threads attempt to stream while stream is being closed.
For example:
2025-09-23 16:15:41.946 DEBUG 731 BlockNodeConnectionManager - [10.68.30.112:40840/ACTIVE] Processing block 249 (isBlockProofSent=false, totalBlockRequests=292, currentRequestIndex=221)
2025-09-23 16:15:41.952 DEBUG 731 BlockNodeConnectionManager - [10.68.30.112:40840/ACTIVE] Processing block 249 (isBlockProofSent=false, totalBlockRequests=292, currentRequestIndex=222)
2025-09-23 16:15:41.957 DEBUG 676 BlockBufferService - Block buffer status: idealMaxBufferSize=150, blocksChecked=132, blocksPruned=1, blocksPendingAck=130, blockRange=[250, 379], saturation=86.6667%
2025-09-23 16:15:41.958 DEBUG 708 BlockNodeConnectionManager - [10.68.30.112:40840/ACTIVE] Block 249 not found in buffer (latestBlock=380); connection will be closed
2025-09-23 16:15:41.958 DEBUG 503 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Closing connection...
2025-09-23 16:15:41.958 DEBUG 517 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Closing request pipeline for block node
2025-09-23 16:15:41.958 DEBUG 649 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Stream completed (stream close was in progress)
2025-09-23 16:15:41.958 DEBUG 523 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Request pipeline successfully closed
2025-09-23 16:15:41.958 DEBUG 172 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Connection state transitioned from ACTIVE to CLOSED
2025-09-23 16:15:41.958 DEBUG 210 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Cancelled periodic stream reset
2025-09-23 16:15:41.958 DEBUG 566 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Jumping to block -1
2025-09-23 16:15:41.958 DEBUG 793 BlockNodeConnectionManager - Marking request to jump to block -1
2025-09-23 16:15:41.958 DEBUG 509 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Connection successfully closed
2025-09-23 16:15:41.958 WARN 323 BlockNodeConnectionManager - [10.68.30.112:40840/CLOSED] Closing and rescheduling connection for reconnect attempt
2025-09-23 16:15:41.959 DEBUG 426 BlockNodeConnectionManager - [10.68.30.112:40840/CLOSED] Scheduling reconnection for node in 0 ms
2025-09-23 16:15:41.965 DEBUG 440 BlockNodeConnectionManager - [10.68.30.112:40840/CLOSED] Successfully scheduled reconnection task
2025-09-23 16:15:41.965 DEBUG 837 BlockNodeConnectionManager - [10.68.30.112:40840/UNINITIALIZED] Running connection task...
2025-09-23 16:15:41.967 DEBUG 172 BlockNodeConnection - [10.68.30.112:40840/PENDING] Connection state transitioned from UNINITIALIZED to PENDING
2025-09-23 16:15:41.967 DEBUG 172 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Connection state transitioned from PENDING to ACTIVE
2025-09-23 16:15:41.967 DEBUG 195 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Scheduled periodic stream reset every PT24H
2025-09-23 16:15:41.975 DEBUG 769 BlockNodeConnectionManager - Jumping to block 380
2025-09-23 16:15:42.004 DEBUG 644 BlockNodeConnection - [10.68.30.112:40840/CLOSED] onComplete invoked but connection is already closed
2025-09-23 16:15:42.102 DEBUG 328 FileBlockItemWriter - Started new block in FileBlockItemWriter 381
2025-09-23 16:15:42.102 DEBUG 51 GrpcBlockItemWriter - Started new block in GrpcBlockItemWriter 381
2025-09-23 16:15:42.108 DEBUG 383 FileBlockItemWriter - Closed block in FileBlockItemWriter 380
2025-09-23 16:15:42.108 DEBUG 225 BlockState - [Block 380] closed at 2025-09-23T16:15:42.108497011Z
2025-09-23 16:15:42.108 DEBUG 85 GrpcBlockItemWriter - Closed block in GrpcBlockItemWriter 380
2025-09-23 16:15:42.208 DEBUG 273 BlockBufferIO - Block 317 (items: 76866) written to file: /opt/hgcapp/blockStreams/buffer/1758644114033/block-317.bin (bytes: 83195689)
2025-09-23 16:15:42.302 DEBUG 731 BlockNodeConnectionManager - [10.68.30.112:40840/ACTIVE] Processing block 380 (isBlockProofSent=false, totalBlockRequests=305, currentRequestIndex=0)
2025-09-23 16:15:42.316 DEBUG 731 BlockNodeConnectionManager - [10.68.30.112:40840/ACTIVE] Processing block 380 (isBlockProofSent=false, totalBlockRequests=305, currentRequestIndex=1)
2025-09-23 16:15:42.323 DEBUG 326 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Received EndOfStream response (block=248, responseCode=BEHIND)
2025-09-23 16:15:42.324 DEBUG 393 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Block node is behind and block state is not available.
2025-09-23 16:15:42.326 DEBUG 503 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Closing connection...
2025-09-23 16:15:42.326 DEBUG 517 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Closing request pipeline for block node
2025-09-23 16:15:42.327 DEBUG 649 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Stream completed (stream close was in progress)
2025-09-23 16:15:42.327 DEBUG 523 BlockNodeConnection - [10.68.30.112:40840/ACTIVE] Request pipeline successfully closed
2025-09-23 16:15:42.327 DEBUG 172 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Connection state transitioned from ACTIVE to CLOSED
2025-09-23 16:15:42.327 DEBUG 210 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Cancelled periodic stream reset
2025-09-23 16:15:42.327 DEBUG 566 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Jumping to block -1
2025-09-23 16:15:42.327 DEBUG 793 BlockNodeConnectionManager - Marking request to jump to block -1
2025-09-23 16:15:42.327 DEBUG 509 BlockNodeConnection - [10.68.30.112:40840/CLOSED] Connection successfully closed
2025-09-23 16:15:42.327 WARN 323 BlockNodeConnectionManager - [10.68.30.112:40840/CLOSED] Closing and rescheduling connection for reconnect attempt
2025-09-23 16:15:42.327 DEBUG 426 BlockNodeConnectionManager - [10.68.30.112:40840/CLOSED] Scheduling reconnection for node in 0 ms
2025-09-23 16:15:42.327 DEBUG 682 BlockNodeConnectionManager - Exception caught in block stream worker loop Attempting to send data frame in invalid state: HALF_CLOSED_LOCAL
Acceptance Criteria
- Introduce exp backoff for the above cases and ensure all cases have backoff.
- Add HAPI tests to Block Node XTS suite exercising this behavior.
- Any additional Unit Tests
Dependencies
No response
Definition of Ready (DoR) Checklist
- Clear acceptance criteria
- Clear and detailed description
- Dependencies identified
- Links to documentation
- Should be completable in 2-3 Days
- Initial draft of Low-level design document
- At least high level test plan
- Groomed/Estimated
Definition of Done (DoD) Checklist
- Acceptance Criteria complete
- No Codacy issues greater than minor (in new code)
- JavaDocs updated/created
- Code commented
- Unit tests created/updated
- 80% test code coverage (in new code)
- Happy Path and major negative cases in HAPI tests as applicable