[Access] Refactor the collection indexing and syncing #8114

peterargue · 2025-11-04T23:12:14Z

Expands on #8108

The original implementation had 2 issues discovered when we deployed it to live nodes:

When indexing from execution data, the syncer loads data faster than the indexer can write it back to storage. This caused the indexer's pending collection queue to overflow.
The syncer's request missing collections method first scanned all blocks from the last full block to the latest finalized to build a list of missing collections, then submitted them all at once to the network. When a node is very far behind, and has a slow disk, this process can take a very long time causing indexing to pause.

Addressed 1 by making the syncer block when indexing from execution data.
Addressed 2 by refactoring the syncer to process each block sequentially.

peterargue · 2025-11-04T23:18:09Z

engine/access/ingestion/collections/syncer.go

-	DefaultMissingCollsRequestInterval = 1 * time.Minute
+	DefaultMissingCollectionRequestInterval = 1 * time.Minute

-	// DefaultMissingCollsForBlockThreshold is the threshold number of blocks with missing collections


removed this because we had redundant checks. if there are MissingCollsForBlockThreshold blocks with missing collections, then the last full block is at least that many blocks behind the last finalized. Updated to only check the difference between last finalized and last full

zhangchiqing · 2025-11-04T23:52:42Z

engine/access/ingestion/collections/syncer.go

+
+	initialCatchupComplete := false
+	for {
+		err := s.requestMissingCollections(ctx, !initialCatchupComplete)


Changes looks good.

I have one concern for syncer's requestMissingCollections :

The syncing asks the requester to request missing collections, after a few seconds, the execution data indexer indexed the execution data, then the syncer is able to index the collections, and the indexer will move forward the last full block height. However, the requester didn't know about the fact that the collections have been received, so I think over time, it will build up a long list of zombie collections in the fetch list, we need to check how the requester deals with it, I'm afraid those zombie collections will affect the actual missing collection requests from being sent.

that's a good point. the requester should eventually download the collections, but it is wasted effort. it may be worth adding a timeout for AN collection syncing

…v0.43 [Access] Refactor the collection indexing and syncing

[Access] Refactor the collection indexing and syncing

10ea9e0

peterargue requested a review from a team as a code owner November 4, 2025 23:12

peterargue commented Nov 4, 2025

View reviewed changes

fix access_test

dc48833

zhangchiqing reviewed Nov 4, 2025

View reviewed changes

Kay-Zee approved these changes Nov 4, 2025

View reviewed changes

zhangchiqing approved these changes Nov 4, 2025

View reviewed changes

peterargue merged commit a1ab0db into peter/collection-indexer-v0.43 Nov 5, 2025

peterargue deleted the peter/collection-indexer-rework-v0.43 branch November 5, 2025 01:47

peterargue added a commit that referenced this pull request Nov 5, 2025

Merge pull request #8114 from onflow/peter/collection-indexer-rework-…

14eb2c4

…v0.43 [Access] Refactor the collection indexing and syncing

peterargue mentioned this pull request Nov 5, 2025

[Access] Refactor the collection indexing and syncing #8120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Access] Refactor the collection indexing and syncing #8114

[Access] Refactor the collection indexing and syncing #8114

Uh oh!

peterargue commented Nov 4, 2025

Uh oh!

peterargue Nov 4, 2025 •

edited

Loading

Uh oh!

zhangchiqing Nov 4, 2025

Uh oh!

peterargue Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Access] Refactor the collection indexing and syncing #8114

[Access] Refactor the collection indexing and syncing #8114

Uh oh!

Conversation

peterargue commented Nov 4, 2025

Uh oh!

peterargue Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangchiqing Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

peterargue Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peterargue Nov 4, 2025 •

edited

Loading