Skip to content

Conversation

@peterargue
Copy link
Contributor

Expands on #8108

The original implementation had 2 issues discovered when we deployed it to live nodes:

  1. When indexing from execution data, the syncer loads data faster than the indexer can write it back to storage. This caused the indexer's pending collection queue to overflow.
  2. The syncer's request missing collections method first scanned all blocks from the last full block to the latest finalized to build a list of missing collections, then submitted them all at once to the network. When a node is very far behind, and has a slow disk, this process can take a very long time causing indexing to pause.

Addressed 1 by making the syncer block when indexing from execution data.
Addressed 2 by refactoring the syncer to process each block sequentially.

@peterargue peterargue requested a review from a team as a code owner November 4, 2025 23:12
DefaultMissingCollsRequestInterval = 1 * time.Minute
DefaultMissingCollectionRequestInterval = 1 * time.Minute

// DefaultMissingCollsForBlockThreshold is the threshold number of blocks with missing collections
Copy link
Contributor Author

@peterargue peterargue Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this because we had redundant checks. if there are MissingCollsForBlockThreshold blocks with missing collections, then the last full block is at least that many blocks behind the last finalized. Updated to only check the difference between last finalized and last full


initialCatchupComplete := false
for {
err := s.requestMissingCollections(ctx, !initialCatchupComplete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good.

I have one concern for syncer's requestMissingCollections :

The syncing asks the requester to request missing collections, after a few seconds, the execution data indexer indexed the execution data, then the syncer is able to index the collections, and the indexer will move forward the last full block height. However, the requester didn't know about the fact that the collections have been received, so I think over time, it will build up a long list of zombie collections in the fetch list, we need to check how the requester deals with it, I'm afraid those zombie collections will affect the actual missing collection requests from being sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. the requester should eventually download the collections, but it is wasted effort. it may be worth adding a timeout for AN collection syncing

@peterargue peterargue merged commit a1ab0db into peter/collection-indexer-v0.43 Nov 5, 2025
@peterargue peterargue deleted the peter/collection-indexer-rework-v0.43 branch November 5, 2025 01:47
peterargue added a commit that referenced this pull request Nov 5, 2025
…v0.43

[Access] Refactor the collection indexing and syncing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants