-
Notifications
You must be signed in to change notification settings - Fork 199
[Storage] Refactor stored chunk data pack #7983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…stored-chunk-data-pack
e3a3b6b to
d22661a
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iterations. PR looks great: very clear and well documented.
The only aspect that I am worried about being merged to master is the overlapping batch-writes for the chunk data pack removal (see my comment here). Any hotfix would do from my perspective that prevents accidental data corruption.
The remaining comments are largely just very minor suggestions to improve code clarity further.
storage/operation/prefix.go
Outdated
| // Compared to the deprecated `codeChunkDataPack`, which stored chunkID -> storedChunkDataPack relationship: | ||
| // - `codeIndexChunkDataPackByChunkID` stores the chunkID->chunkDataPackID index, and | ||
| // - `codeChunkDataPack` stores chunkDataPackID -> storedChunkDataPack relationship. | ||
| // This breakup allows us to store chunk data packs in a different database in a concurrent safe way | ||
| codeIndexChunkDataPackByChunkID = 112 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Leo] I decided to keep using the existing codeChunkDataPack prefix for storing the new chunk data pack. That’s fine, since during the rollout I’ll be removing all existing chunk data pack entries from the database.
sounds good.
Some suggestions:
- what do you think about using the prefix code
99forcodeIndexChunkDataPackByChunkID? That way, it would be listed right beforecodeChunkDataPack- I think that would be beneficial for ease of documentation and understanding the code, if
codeChunkDataPackandcodeIndexChunkDataPackByChunkIDas well as their combined documentation would be all together.
- I think that would be beneficial for ease of documentation and understanding the code, if
- to documentation still talks about "deprecated
codeChunkDataPack" which no longer applies.
The resulting code could look something like the following (already updated documentation):
// EXECUTION RESULTS:
//
// The storage prefixes `codeChunkDataPack` and `codeIndexChunkDataPackByChunkID` are used primarily by execution nodes
// to persist their own results for chunks they executed.
// - `codeIndexChunkDataPackByChunkID` stores the chunkID → chunkDataPackID index, and
// - `codeChunkDataPack` stores the chunk data pack by its own ID.
// This breakup allows us to store chunk data packs in a different database in a concurrent safe way.
codeIndexChunkDataPackByChunkID = 99
codeChunkDataPack = 100
// legacy codes (should be cleaned up)
codeCommit = 101
⋮| return RetrieveByKey(r, MakePrefix(codeChunkDataPack, chunkID), c) | ||
| // RetrieveStoredChunkDataPack retrieves a chunk data pack by stored chunk data pack ID. | ||
| // It returns [storage.ErrNotFound] if the chunk data pack is not found | ||
| func RetrieveStoredChunkDataPack(r storage.Reader, storeChunkDataPackID flow.Identifier, c *storage.StoredChunkDataPack) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for simplicity, maybe we could rename those methods to InsertChunkDataPack and RetrieveChunkDataPack. The fact that we are dealing with the reduced data type StoredChunkDataPack for storage is in my opinion very well reflected by the method signature.
storage/store/chunk_data_packs.go
Outdated
| // the actual chunk data pack is stored here, which is a separate storage from protocol DB | ||
| stored storage.StoredChunkDataPacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple suggestions:
- I would prefer a more descriptive name. How about:
cdpStorage - I think it would be helpful to document that we assume that
cdpStoragehas its own caching built in.
| // the actual chunk data pack is stored here, which is a separate storage from protocol DB | |
| stored storage.StoredChunkDataPacks | |
| // cdpStorage persists the actual chunk data packs, which is a separate storage from protocol DB. | |
| // We assume that `cdpStorage` has its own caching already built in. | |
| cdpStorage storage.StoredChunkDataPacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this has turned out. I think from the business logic's perspective, it is really quite clear what happens at which state (with the help of some documentation). Well done, tanks for your iterations and patience. 👏
cmd/util/cmd/rollback-executed-height/cmd/rollback_executed_height.go
Outdated
Show resolved
Hide resolved
| // use badger instances directly instead of stroage interfaces so that the interface don't | ||
| // need to include the Remove methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc could use an update, please.
| chunkDataPackIDs, err := chunkDataPacks.BatchRemove(chunkIDs, protocolDBBatch) | ||
| if err != nil { | ||
| return fmt.Errorf("could not remove chunk data packs at %v: %w", flagHeight, err) | ||
| } | ||
|
|
||
| err = storedChunkDataPacks.Remove(chunkDataPackIDs) | ||
| if err != nil { | ||
| return fmt.Errorf("could not commit chunk batch at %v: %w", flagHeight, err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunkDataPacks.BatchRemove internally also calls storedChunkDataPacks.Remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good; mostly focused on documentation.
I believe since all usages of it have been removed, we can completely remove storage.LockInsertChunkDataPack.
|
|
||
| type ChunkDataPackHeader struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we are only using this type to generate the ID for ChunkDataPack, I think we can still mark this type as structwrite:immutable as well, and add a short comment about its current use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has already been covered in Alex's documentation PR.
| // to chunk data pack ID in the protocol database. This mapping persists that the Execution Node committed to the result | ||
| // represented by this chunk data pack. This function returns [storage.ErrDataMismatch] when a _different_ chunk data pack | ||
| // ID for the same chunk ID has already been stored (changing which result an execution Node committed to would be a | ||
| // slashable protocol violation). The caller must acquire [storage.LockInsertChunkDataPack] and hold it until the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, all usages of storage.LockInsertChunkDataPack were removed - should this be storage.LockInsertOwnReceipt?
| // to chunk data pack ID in the protocol database. This mapping persists that the Execution Node committed to the result | ||
| // represented by this chunk data pack. This function returns [storage.ErrDataMismatch] when a _different_ chunk data pack | ||
| // ID for the same chunk ID has already been stored (changing which result an execution Node committed to would be a | ||
| // slashable protocol violation). The caller must acquire [storage.LockInsertChunkDataPack] and hold it until the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here I believe storage.LockInsertChunkDataPack should instead be storage.LockInsertOwnReceipt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think we should still use that lock, and I renamed it to storage.LockIndexChunkDataPackByID
| // Verify chunk data packs are removed from both protocol and chunk data pack DBs | ||
| for _, chunkID := range chunkIDs { | ||
| _, err := chunkDataPackStore.ByChunkID(chunkID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this only tests that they are removed from the protocol DB - the stored chunk data pack may still be present in stored. To verify both "protocol DB mappings and chunk data pack DB content" are removed as documented, we probably want to record the chunkDataPack IDs and directly query the stored DB for them after the removal.
…ata-pack Suggested documentation extensions for Chunk Data Pack PR #7983
Co-authored-by: Alexander Hentschel <[email protected]>
…ight.go Co-authored-by: Alexander Hentschel <[email protected]>
Co-authored-by: Alexander Hentschel <[email protected]>
Co-authored-by: Alexander Hentschel <[email protected]>
Addressing #7939 (comment)
This PR:
StoredChunkDataPacksstore and refactorsNewChunkDataPacksto depend on it, wiring it through node startup and CLI tools (execution builder, read-badger, rollback cmd). This splits storage of chunk packs from the protocol DB.ChunkID→StoredChunkDataPack.IDmapping inside the protocol DB batch; only LockInsertOwnReceipt is held. Improves atomicity and clarifies failure modes.