Skip to content

Conversation

@zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Sep 30, 2025

Addressing #7939 (comment)

This PR:

  • Introduces a new StoredChunkDataPacks store and refactors NewChunkDataPacks to depend on it, wiring it through node startup and CLI tools (execution builder, read-badger, rollback cmd). This splits storage of chunk packs from the protocol DB.
  • Changes the write path: chunkDataPacks.Store(...) now returns a closure that writes the ChunkID→StoredChunkDataPack.ID mapping inside the protocol DB batch; only LockInsertOwnReceipt is held. Improves atomicity and clarifies failure modes.
  • Updates rollback to batch-remove multiple chunk data packs at once (BatchRemove(chunkIDs, writeBatch, chunkBatch)), simplifying error handling.
  • Verification requester API becomes more informative: RequestQualifierFunc now returns (bool, string) and MaxAttemptQualifier includes a reason when unqualified.

Base automatically changed from leo/refactor-insert-chunk-data-pack to master October 1, 2025 00:14
@zhangchiqing zhangchiqing force-pushed the leo/refactor-stored-chunk-data-pack branch from e3a3b6b to d22661a Compare October 1, 2025 00:22
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the iterations. PR looks great: very clear and well documented.

The only aspect that I am worried about being merged to master is the overlapping batch-writes for the chunk data pack removal (see my comment here). Any hotfix would do from my perspective that prevents accidental data corruption.

The remaining comments are largely just very minor suggestions to improve code clarity further.

Comment on lines 112 to 116
// Compared to the deprecated `codeChunkDataPack`, which stored chunkID -> storedChunkDataPack relationship:
// - `codeIndexChunkDataPackByChunkID` stores the chunkID->chunkDataPackID index, and
// - `codeChunkDataPack` stores chunkDataPackID -> storedChunkDataPack relationship.
// This breakup allows us to store chunk data packs in a different database in a concurrent safe way
codeIndexChunkDataPackByChunkID = 112
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Leo] I decided to keep using the existing codeChunkDataPack prefix for storing the new chunk data pack. That’s fine, since during the rollout I’ll be removing all existing chunk data pack entries from the database.

sounds good.

Some suggestions:

  1. what do you think about using the prefix code 99 for codeIndexChunkDataPackByChunkID? That way, it would be listed right before codeChunkDataPack
    • I think that would be beneficial for ease of documentation and understanding the code, if codeChunkDataPack and codeIndexChunkDataPackByChunkID as well as their combined documentation would be all together.
  2. to documentation still talks about "deprecated codeChunkDataPack" which no longer applies.

The resulting code could look something like the following (already updated documentation):

	// EXECUTION RESULTS: 
	//
	// The storage prefixes `codeChunkDataPack` and `codeIndexChunkDataPackByChunkID` are used primarily by execution nodes
	// to persist their own results for chunks they executed.
	//  - `codeIndexChunkDataPackByChunkID` stores the chunkID → chunkDataPackID index, and
	//  - `codeChunkDataPack` stores the chunk data pack by its own ID.
	// This breakup allows us to store chunk data packs in a different database in a concurrent safe way.
	codeIndexChunkDataPackByChunkID        = 99
	codeChunkDataPack                      = 100

	// legacy codes (should be cleaned up)
	codeCommit                             = 101

return RetrieveByKey(r, MakePrefix(codeChunkDataPack, chunkID), c)
// RetrieveStoredChunkDataPack retrieves a chunk data pack by stored chunk data pack ID.
// It returns [storage.ErrNotFound] if the chunk data pack is not found
func RetrieveStoredChunkDataPack(r storage.Reader, storeChunkDataPackID flow.Identifier, c *storage.StoredChunkDataPack) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simplicity, maybe we could rename those methods to InsertChunkDataPack and RetrieveChunkDataPack. The fact that we are dealing with the reduced data type StoredChunkDataPack for storage is in my opinion very well reflected by the method signature.

Comment on lines 21 to 22
// the actual chunk data pack is stored here, which is a separate storage from protocol DB
stored storage.StoredChunkDataPacks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions:

  1. I would prefer a more descriptive name. How about: cdpStorage
  2. I think it would be helpful to document that we assume that cdpStorage has its own caching built in.
Suggested change
// the actual chunk data pack is stored here, which is a separate storage from protocol DB
stored storage.StoredChunkDataPacks
// cdpStorage persists the actual chunk data packs, which is a separate storage from protocol DB.
// We assume that `cdpStorage` has its own caching already built in.
cdpStorage storage.StoredChunkDataPacks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this has turned out. I think from the business logic's perspective, it is really quite clear what happens at which state (with the help of some documentation). Well done, tanks for your iterations and patience. 👏

Comment on lines 146 to 147
// use badger instances directly instead of stroage interfaces so that the interface don't
// need to include the Remove methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doc could use an update, please.

Comment on lines 114 to 122
chunkDataPackIDs, err := chunkDataPacks.BatchRemove(chunkIDs, protocolDBBatch)
if err != nil {
return fmt.Errorf("could not remove chunk data packs at %v: %w", flagHeight, err)
}

err = storedChunkDataPacks.Remove(chunkDataPackIDs)
if err != nil {
return fmt.Errorf("could not commit chunk batch at %v: %w", flagHeight, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ repeated removal (?)

chunkDataPacks.BatchRemove internally also calls storedChunkDataPacks.Remove

Copy link
Contributor

@tim-barry tim-barry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good; mostly focused on documentation.
I believe since all usages of it have been removed, we can completely remove storage.LockInsertChunkDataPack.

Comment on lines 131 to 132

type ChunkDataPackHeader struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we are only using this type to generate the ID for ChunkDataPack, I think we can still mark this type as structwrite:immutable as well, and add a short comment about its current use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already been covered in Alex's documentation PR.

// to chunk data pack ID in the protocol database. This mapping persists that the Execution Node committed to the result
// represented by this chunk data pack. This function returns [storage.ErrDataMismatch] when a _different_ chunk data pack
// ID for the same chunk ID has already been stored (changing which result an execution Node committed to would be a
// slashable protocol violation). The caller must acquire [storage.LockInsertChunkDataPack] and hold it until the database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, all usages of storage.LockInsertChunkDataPack were removed - should this be storage.LockInsertOwnReceipt?

// to chunk data pack ID in the protocol database. This mapping persists that the Execution Node committed to the result
// represented by this chunk data pack. This function returns [storage.ErrDataMismatch] when a _different_ chunk data pack
// ID for the same chunk ID has already been stored (changing which result an execution Node committed to would be a
// slashable protocol violation). The caller must acquire [storage.LockInsertChunkDataPack] and hold it until the database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here I believe storage.LockInsertChunkDataPack should instead be storage.LockInsertOwnReceipt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think we should still use that lock, and I renamed it to storage.LockIndexChunkDataPackByID

Comment on lines +126 to +128
// Verify chunk data packs are removed from both protocol and chunk data pack DBs
for _, chunkID := range chunkIDs {
_, err := chunkDataPackStore.ByChunkID(chunkID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this only tests that they are removed from the protocol DB - the stored chunk data pack may still be present in stored. To verify both "protocol DB mappings and chunk data pack DB content" are removed as documented, we probably want to record the chunkDataPack IDs and directly query the stored DB for them after the removal.

@zhangchiqing zhangchiqing added this pull request to the merge queue Oct 15, 2025
Merged via the queue into master with commit fc546a5 Oct 15, 2025
57 checks passed
@zhangchiqing zhangchiqing deleted the leo/refactor-stored-chunk-data-pack branch October 15, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants