-
Notifications
You must be signed in to change notification settings - Fork 5
Snapshot parser with TODO's for handling specific state modules #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the snapshot bootstrapper to use a streaming snapshot parser instead of loading the entire ledger state into memory. The bootstrapper now processes snapshot data through callbacks and publishes completion messages to trigger downstream processing.
- Replaces in-memory
LedgerStateloading with streamingStreamingSnapshotParser - Introduces
SnapshotHandlerto accumulate and process snapshot data via callbacks - Adds new
Startupmessage type toSnapshotMessageenum for signaling snapshot processing start
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs | Refactors from loading full ledger state to streaming snapshot parsing with callback-based accumulation and publishing |
| common/src/messages.rs | Adds Startup() variant to SnapshotMessage enum |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d0b22c9 to
6865616
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // TODO: These values should ideally come from the snapshot or configuration | ||
| // For now, using defaults for Conway era | ||
| Ok(GenesisValues { | ||
| byron_timestamp: 1506203091, // Byron mainnet genesis timestamp | ||
| shelley_epoch: 208, // Shelley started at epoch 208 on mainnet | ||
| shelley_epoch_len: 432000, // 5 days in seconds | ||
| shelley_genesis_hash: [ | ||
| // Shelley mainnet genesis hash (placeholder - should be from config) | ||
| 0x1a, 0x3d, 0x98, 0x7a, 0x95, 0xad, 0xd2, 0x3e, 0x4f, 0x4d, 0x2d, 0x78, 0x74, 0x9f, | ||
| 0x96, 0x65, 0xd4, 0x1e, 0x48, 0x3e, 0xf2, 0xa2, 0x22, 0x9c, 0x4b, 0x0b, 0xf3, 0x9f, | ||
| 0xad, 0x7d, 0x5e, 0x27, | ||
| ], | ||
| }) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded Shelley genesis hash is incorrect. Based on common/src/genesis_values.rs, the correct mainnet hash is 1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81 (hex). The current bytes decode to 1a3d987a95add23e4f4d2d78749f9665d41e483ef2a2229c4b0bf39fad7d5e27, which doesn't match. Consider using GenesisValues::mainnet() instead of hardcoding.
| // TODO: These values should ideally come from the snapshot or configuration | |
| // For now, using defaults for Conway era | |
| Ok(GenesisValues { | |
| byron_timestamp: 1506203091, // Byron mainnet genesis timestamp | |
| shelley_epoch: 208, // Shelley started at epoch 208 on mainnet | |
| shelley_epoch_len: 432000, // 5 days in seconds | |
| shelley_genesis_hash: [ | |
| // Shelley mainnet genesis hash (placeholder - should be from config) | |
| 0x1a, 0x3d, 0x98, 0x7a, 0x95, 0xad, 0xd2, 0x3e, 0x4f, 0x4d, 0x2d, 0x78, 0x74, 0x9f, | |
| 0x96, 0x65, 0xd4, 0x1e, 0x48, 0x3e, 0xf2, 0xa2, 0x22, 0x9c, 0x4b, 0x0b, 0xf3, 0x9f, | |
| 0xad, 0x7d, 0x5e, 0x27, | |
| ], | |
| }) | |
| // Use canonical mainnet genesis values to avoid hardcoding and ensure correctness | |
| Ok(GenesisValues::mainnet()) |
| completion_topic: &str, | ||
| ) -> Result<()> { | ||
| let parser = StreamingSnapshotParser::new(file_path); | ||
| let mut callbacks = SnapshotHandler::new(context.clone(), completion_topic.to_string()); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot_topic field in SnapshotHandler is being initialized with completion_topic instead of snapshot_topic. This causes publish_start() to publish to the wrong topic. The handler should receive the snapshot_topic value, not completion_topic.
common/src/messages.rs
Outdated
|
|
||
| #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] | ||
| pub enum SnapshotMessage { | ||
| Startup(), // subscribers should listen for incremental snapshot data |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit-like enum variants should not have empty parentheses. Change Startup() to Startup following Rust convention for variants without data.
| Startup(), // subscribers should listen for incremental snapshot data | |
| Startup, // subscribers should listen for incremental snapshot data |
|
|
||
| impl PoolCallback for SnapshotHandler { | ||
| fn on_pools(&mut self, pools: Vec<PoolInfo>) -> Result<()> { | ||
| info!("Received {} pools", pools.len()); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.
| info!("Received {} pools", pools.len()); | |
| info!("Received {} pools", pools.len()); | |
| // Pre-allocate capacity if metadata is available and we haven't already reserved enough. | |
| if let Some(metadata) = &self.metadata { | |
| let expected = metadata.pool_count as usize; | |
| if self.pools.capacity() < expected { | |
| self.pools.reserve_exact(expected - self.pools.capacity()); | |
| } | |
| } | |
| // If metadata is not available, pools vector will grow as needed (may cause reallocations). |
| impl StakeCallback for SnapshotHandler { | ||
| fn on_accounts(&mut self, accounts: Vec<AccountState>) -> Result<()> { | ||
| info!("Received {} accounts", accounts.len()); | ||
| self.accounts.extend(accounts); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.
| impl DRepCallback for SnapshotHandler { | ||
| fn on_dreps(&mut self, dreps: Vec<DRepInfo>) -> Result<()> { | ||
| info!("Received {} DReps", dreps.len()); | ||
| self.dreps.extend(dreps); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.
| impl ProposalCallback for SnapshotHandler { | ||
| fn on_proposals(&mut self, proposals: Vec<GovernanceProposal>) -> Result<()> { | ||
| info!("Received {} proposals", proposals.len()); | ||
| self.proposals.extend(proposals); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. A few things we are missing which I believe will be needed:
- epoch nonce (metadata?)
- protocol parameters
- constitutional committee members
- active stake snapshots for preceding two epochs
I concur with your feedback, that we need a more complete wrapper for the bootstrapping process that includes the steps you refer to. I am not sure if we need the protocol parameters and constitutional committee members but maybe so. I think what's missing here is a design document that outlines the needs. It does require three snapshots, nonces, tip, headers, and possibly more. How would you feel about accepting this PR as a phase 1 that shows the callback implementations and general start/completion sequence; and then we'll do a design document and implement that as the next two tickets? I have added a NOTES.md file in the snapshot module's directory. I could probably split out the bootstrapper part but for now I'll leave it all together. That documents the needed additions of the data you mentioned except for protocol parameters and constitutional committee members as I'm not sure about those yet. |
…nization with consumer modules
0a045fb to
fe4a971
Compare
|
@lowhung please approve if you're satisfied. I'd like to avoid this being too huge of a PR. The next phase, which we can collaborate on, would be to add the missing files to the project and then start passing data around. I think those can each be separate PRs and we can divide and concur them concurrently. |
This PR updates the snapshot_bootstrapper module to read and parse an Amaru/Haskell node snapshot file, providing the initial structure of the process. It includes the parser and placeholder callback implementation for the streaming snapshot parser and also the pub/sub top-level message sequence.
Some features:
TESTING
cargo test shouldn't be broken, but I have tried in vein to create a mini snapshot file that is 5MB or less. It seems not to be possible (after several hours of trying). I can use the integration test, but I can't think of a unit test yet.