Skip to content

Conversation

@SpyCheese
Copy link
Member

No description provided.

@SpyCheese SpyCheese marked this pull request as draft October 22, 2025 20:10
@github-actions
Copy link

Findings

  • validator/validator-group.cpp:799-854
    merge_collated_data() retries forever when the candidate body is unavailable. When o_block_data is empty it calls the DB, and if that fails, immediately schedules another validator‑session query with try_disk = false. If the session no longer has that candidate (e.g. the node joins mid‑round or restarts and the session already discarded the block), get_accepted_candidate keeps returning an error and merge_collated_data recursively re‑queues itself without any back‑off or exit condition. The block’s seqno is therefore never inserted into collated_data_merged_, collated_data_merged_upto_ never advances, and every future call to wait_collated_data_merged() for higher seqnos blocks forever. Once this happens the validator cannot collate new blocks anymore and keeps hammering the session with failing RPCs. Consider either skipping dedup for such blocks or adding a terminal error/back‑off path so the pipeline can continue.

  • validator/validator.h:255 & validator/manager-disk.hpp:137-207
    The new download_block_candidate RPC is left unimplemented for non–FullNode manager callbacks. The default implementation in ValidatorManagerInterface::Callback does nothing and never resolves the supplied promise. Components such as CollatorNodeSession::try_merge_collated_data_from_net() (which calls ValidatorManager::send_get_block_candidate_request) will therefore hang indefinitely on configurations that use the disk-based manager or any other callback that hasn’t been updated. Even in test/integration builds, enabling merge_collated_data will wedge because the promise is never completed. All callback implementations need to either support the new RPC or explicitly fail the promise so callers can degrade gracefully.

Open Questions

  • None.

Next Steps

  1. Decide how merge_collated_data should behave when block data cannot be retrieved (e.g. skip, delay with backoff, or disable the feature for that session).
  2. Implement download_block_candidate (or at least reject the promise) for every ValidatorManagerInterface::Callback implementation so callers aren’t left waiting forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants