Snapshot parser with TODO's for handling specific state modules #333

buddhisthead · 2025-11-03T05:21:43Z

This PR updates the snapshot_bootstrapper module to read and parse an Amaru/Haskell node snapshot file, providing the initial structure of the process. It includes the parser and placeholder callback implementation for the streaming snapshot parser and also the pub/sub top-level message sequence.

Some features:

Progress Tracking: Logs every million UTXOs processed
Comprehensive Logging: Detailed statistics and metadata logging
Error Handling: Proper Result types and error propagation
Async Architecture: Non-blocking operation with proper context management
Message Publishing: Publishes GenesisComplete message to trigger next phase

TESTING
cargo test shouldn't be broken, but I have tried in vein to create a mini snapshot file that is 5MB or less. It seems not to be possible (after several hours of trying). I can use the integration test, but I can't think of a unit test yet.

Copilot

Pull Request Overview

This PR refactors the snapshot bootstrapper to use a streaming snapshot parser instead of loading the entire ledger state into memory. The bootstrapper now processes snapshot data through callbacks and publishes completion messages to trigger downstream processing.

Replaces in-memory LedgerState loading with streaming StreamingSnapshotParser
Introduces SnapshotHandler to accumulate and process snapshot data via callbacks
Adds new Startup message type to SnapshotMessage enum for signaling snapshot processing start

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs	Refactors from loading full ledger state to streaming snapshot parsing with callback-based accumulation and publishing
common/src/messages.rs	Adds `Startup()` variant to `SnapshotMessage` enum

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

common/src/messages.rs

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T15:51:10Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+        // TODO: These values should ideally come from the snapshot or configuration
+        // For now, using defaults for Conway era
+        Ok(GenesisValues {
+            byron_timestamp: 1506203091, // Byron mainnet genesis timestamp
+            shelley_epoch: 208,          // Shelley started at epoch 208 on mainnet
+            shelley_epoch_len: 432000,   // 5 days in seconds
+            shelley_genesis_hash: [
+                // Shelley mainnet genesis hash (placeholder - should be from config)
+                0x1a, 0x3d, 0x98, 0x7a, 0x95, 0xad, 0xd2, 0x3e, 0x4f, 0x4d, 0x2d, 0x78, 0x74, 0x9f,
+                0x96, 0x65, 0xd4, 0x1e, 0x48, 0x3e, 0xf2, 0xa2, 0x22, 0x9c, 0x4b, 0x0b, 0xf3, 0x9f,
+                0xad, 0x7d, 0x5e, 0x27,
+            ],
+        })


The hardcoded Shelley genesis hash is incorrect. Based on common/src/genesis_values.rs, the correct mainnet hash is 1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81 (hex). The current bytes decode to 1a3d987a95add23e4f4d2d78749f9665d41e483ef2a2229c4b0bf39fad7d5e27, which doesn't match. Consider using GenesisValues::mainnet() instead of hardcoding.

Suggested change

// TODO: These values should ideally come from the snapshot or configuration

// For now, using defaults for Conway era

Ok(GenesisValues {

byron_timestamp: 1506203091, // Byron mainnet genesis timestamp

shelley_epoch: 208, // Shelley started at epoch 208 on mainnet

shelley_epoch_len: 432000, // 5 days in seconds

shelley_genesis_hash: [

// Shelley mainnet genesis hash (placeholder - should be from config)

0x1a, 0x3d, 0x98, 0x7a, 0x95, 0xad, 0xd2, 0x3e, 0x4f, 0x4d, 0x2d, 0x78, 0x74, 0x9f,

0x96, 0x65, 0xd4, 0x1e, 0x48, 0x3e, 0xf2, 0xa2, 0x22, 0x9c, 0x4b, 0x0b, 0xf3, 0x9f,

0xad, 0x7d, 0x5e, 0x27,

],

})

// Use canonical mainnet genesis values to avoid hardcoding and ensure correctness

Ok(GenesisValues::mainnet())

Copilot · 2025-11-03T15:51:11Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+        completion_topic: &str,
+    ) -> Result<()> {
+        let parser = StreamingSnapshotParser::new(file_path);
+        let mut callbacks = SnapshotHandler::new(context.clone(), completion_topic.to_string());


The snapshot_topic field in SnapshotHandler is being initialized with completion_topic instead of snapshot_topic. This causes publish_start() to publish to the wrong topic. The handler should receive the snapshot_topic value, not completion_topic.

Copilot · 2025-11-03T15:51:11Z

common/src/messages.rs


 #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
 pub enum SnapshotMessage {
+    Startup(), // subscribers should listen for incremental snapshot data


Unit-like enum variants should not have empty parentheses. Change Startup() to Startup following Rust convention for variants without data.

Suggested change

Startup(), // subscribers should listen for incremental snapshot data

Startup, // subscribers should listen for incremental snapshot data

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

Copilot · 2025-11-03T15:51:12Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+
+impl PoolCallback for SnapshotHandler {
+    fn on_pools(&mut self, pools: Vec<PoolInfo>) -> Result<()> {
+        info!("Received {} pools", pools.len());


Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.

Suggested change

info!("Received {} pools", pools.len());

info!("Received {} pools", pools.len());

// Pre-allocate capacity if metadata is available and we haven't already reserved enough.

if let Some(metadata) = &self.metadata {

let expected = metadata.pool_count as usize;

if self.pools.capacity() < expected {

self.pools.reserve_exact(expected - self.pools.capacity());

}

}

// If metadata is not available, pools vector will grow as needed (may cause reallocations).

Copilot · 2025-11-03T15:51:12Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+impl StakeCallback for SnapshotHandler {
+    fn on_accounts(&mut self, accounts: Vec<AccountState>) -> Result<()> {
+        info!("Received {} accounts", accounts.len());
+        self.accounts.extend(accounts);


Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.

Copilot · 2025-11-03T15:51:12Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+impl DRepCallback for SnapshotHandler {
+    fn on_dreps(&mut self, dreps: Vec<DRepInfo>) -> Result<()> {
+        info!("Received {} DReps", dreps.len());
+        self.dreps.extend(dreps);


Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.

Copilot · 2025-11-03T15:51:12Z

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

+impl ProposalCallback for SnapshotHandler {
+    fn on_proposals(&mut self, proposals: Vec<GovernanceProposal>) -> Result<()> {
+        info!("Received {} proposals", proposals.len());
+        self.proposals.extend(proposals);


Continuously extending vectors without capacity pre-allocation can lead to multiple reallocations. Consider pre-allocating capacity based on metadata.utxo_count or other metadata fields when available, or document the expected memory growth pattern for large snapshots.

whankinsiv

This looks good to me. A few things we are missing which I believe will be needed:

epoch nonce (metadata?)
protocol parameters
constitutional committee members
active stake snapshots for preceding two epochs

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs

buddhisthead · 2025-11-05T16:01:06Z

This looks good to me. A few things we are missing which I believe will be needed:
* epoch nonce (metadata?)

* protocol parameters

* constitutional committee members

* active stake snapshots for preceding two epochs

I concur with your feedback, that we need a more complete wrapper for the bootstrapping process that includes the steps you refer to. I am not sure if we need the protocol parameters and constitutional committee members but maybe so. I think what's missing here is a design document that outlines the needs. It does require three snapshots, nonces, tip, headers, and possibly more.

How would you feel about accepting this PR as a phase 1 that shows the callback implementations and general start/completion sequence; and then we'll do a design document and implement that as the next two tickets?

I have added a NOTES.md file in the snapshot module's directory. I could probably split out the bootstrapper part but for now I'll leave it all together. That documents the needed additions of the data you mentioned except for protocol parameters and constitutional committee members as I'm not sure about those yet.

…nization with consumer modules

buddhisthead · 2025-11-05T20:36:16Z

@lowhung please approve if you're satisfied. I'd like to avoid this being too huge of a PR. The next phase, which we can collaborate on, would be to add the missing files to the project and then start passing data around. I think those can each be separate PRs and we can divide and concur them concurrently.

Copilot AI review requested due to automatic review settings November 3, 2025 05:21

Copilot AI reviewed Nov 3, 2025

View reviewed changes

common/src/messages.rs Outdated Show resolved Hide resolved

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs Outdated Show resolved Hide resolved

github-actions bot mentioned this pull request Nov 3, 2025

📊 Weekly Status - Week of 2025-11-03 #334

Open

buddhisthead force-pushed the cet-snap-parser branch from d0b22c9 to 6865616 Compare November 3, 2025 15:44

buddhisthead requested a review from Copilot November 3, 2025 15:48

Copilot AI reviewed Nov 3, 2025

View reviewed changes

buddhisthead requested a review from whankinsiv November 3, 2025 16:43

whankinsiv reviewed Nov 3, 2025

View reviewed changes

SupernaviX reviewed Nov 3, 2025

View reviewed changes

modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs Outdated Show resolved Hide resolved

buddhisthead added 10 commits November 5, 2025 11:25

Rebased on main

60e8609

Remove a message type that was just for protoyping

1815afd

Apply some co-pilot feedback that seemed nice

f444ae6

Add publish Start message for snapshot parsing/publishing for synchro…

d60382b

…nization with consumer modules

Format

26d6b01

Apply spelling/word changes suggested by co-pilot review

121d853

Apply clippy feedback

9bb2a4b

Clippy feedback is good.

9cd3e99

Use anyhow::Context for better error messages per PR feedback

011d406

Oh yeah, run the formatter

fe4a971

buddhisthead force-pushed the cet-snap-parser branch from 0a045fb to fe4a971 Compare November 5, 2025 19:28

Add documentation of the boostrapping process

5aa10b7

	Startup(), // subscribers should listen for incremental snapshot data
	Startup, // subscribers should listen for incremental snapshot data

-        info!("Received {} pools", pools.len());
+        info!("Received {} pools", pools.len());
+        // Pre-allocate capacity if metadata is available and we haven't already reserved enough.
+        if let Some(metadata) = &self.metadata {
+            let expected = metadata.pool_count as usize;
+            if self.pools.capacity() < expected {
+                self.pools.reserve_exact(expected - self.pools.capacity());
+            }
+        }
+        // If metadata is not available, pools vector will grow as needed (may cause reallocations).

Snapshot parser with TODO's for handling specific state modules #333

Are you sure you want to change the base?

Snapshot parser with TODO's for handling specific state modules #333

Conversation

buddhisthead commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

whankinsiv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

buddhisthead commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buddhisthead commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

buddhisthead commented Nov 5, 2025 •

edited

Loading