Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
56fcee6
badgerdb-sampler: Add badgerdb-sampler for BadgerDB v2
gw0 Nov 18, 2025
cb31e1c
badgerdb-sampler: Refactor and add support for BadgerDB v3
gw0 Nov 18, 2025
91c34a7
badgerdb-sampler: Add helper scripts
gw0 Nov 18, 2025
b7c29e6
badgerdb-sampler: Change JSON output parameters
gw0 Nov 18, 2025
dbbea60
badgerdb-sampler: Implement local symlink mirror fallback mode and un…
gw0 Nov 19, 2025
dc4d7ed
badgerdb-sampler: Improve logging
gw0 Nov 22, 2025
e3cfdab
badgerdb-sampler: Add decoding of runtime keys and CBOR values
gw0 Nov 22, 2025
1f17c2a
badgerdb-sampler: Update dependencies and helper scripts
gw0 Nov 22, 2025
b38fba2
badgerdb-sampler: Split code into consensus and runtime analysis
gw0 Nov 22, 2025
7c7ea0f
badgerdb-sampler: Extend runtime decoding
gw0 Nov 24, 2025
aa7131c
badgerdb-sampler: Refactor decoding functionality and add EVM parsing
gw0 Nov 24, 2025
024a3d5
badgerdb-sampler: Add decoding of EVM events, transactions and calls
gw0 Nov 24, 2025
2d75cdb
badgerdb-sampler: Add decoding of consensus transactions and events
gw0 Nov 25, 2025
8dc750a
badgerdb-sampler: Refactor consistent naming and improve error handling
gw0 Nov 25, 2025
38eecf2
badgerdb-sampler: Add decoding of consensus events and improve de/ser…
gw0 Nov 27, 2025
f459646
badgerdb-sampler: Improve database opening
gw0 Nov 27, 2025
53ff44e
badgerdb-sampler: Refactor raw fields, error handling, and decoding i…
gw0 Nov 29, 2025
f69055d
badgerdb-sampler: Add exact decoding of MKVS leafs and add support fo…
gw0 Dec 1, 2025
5ff6acc
badgerdb-sampler: Improve consistency and error details
gw0 Dec 3, 2025
27031d6
badgerdb-sampler: Improve building for multiple BadgerDB versions
gw0 Dec 3, 2025
a59c24b
badgerdb-sampler: Update README and improve helper scripts
gw0 Dec 3, 2025
23165bf
badgerdb-sampler: Various fixes
gw0 Dec 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions badgerdb-sampler/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Makefile for badgerdb-sampler

.PHONY: all build-all build-v2 build-v3 build-v4 clean tidy-all

# Default target
all: build-all

# Build all versions
build-all: build-v2 build-v3 build-v4

# Build BadgerDB v2 version
build-v2:
@echo "Building badgerdb-sampler-v2..."
go build -modfile=go_v2.mod -tags badgerv2 -o bin/badgerdb-sampler-v2
@echo "✓ Built: bin/badgerdb-sampler-v2"

# Build BadgerDB v3 version
build-v3:
@echo "Building badgerdb-sampler-v3..."
go build -modfile=go_v3.mod -tags badgerv3 -o bin/badgerdb-sampler-v3
@echo "✓ Built: bin/badgerdb-sampler-v3"

# Build BadgerDB v4 version
build-v4:
@echo "Building badgerdb-sampler-v4..."
go build -modfile=go_v4.mod -tags badgerv4 -o bin/badgerdb-sampler-v4
@echo "✓ Built: bin/badgerdb-sampler-v4"

# Clean build artifacts
clean:
@echo "Cleaning bin/..."
rm -f bin/badgerdb-sampler-v2 bin/badgerdb-sampler-v3 bin/badgerdb-sampler-v4
@echo "Cleaning outputs/..."
rm -rf outputs/*
@echo "✓ Cleaned"

# Tidy all module files
tidy-all:
@echo "Tidying all module files..."
GOFLAGS="-tags=badgerv2" go mod tidy -modfile=go_v2.mod
GOFLAGS="-tags=badgerv3" go mod tidy -modfile=go_v3.mod
GOFLAGS="-tags=badgerv4" go mod tidy -modfile=go_v4.mod
@echo "✓ All module files tidied"
123 changes: 123 additions & 0 deletions badgerdb-sampler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# BadgerDB Sampler

This **badgerdb-sampler** tool implements a comprehensive data extraction and decoding logic for BadgerDB databases used in Oasis node snapshots. This tool can extract and decode most data structures from consensus and runtime databases, even EVM data. It handles multiple BadgerDB versions (v2-v4) and data representations use by Oasis nodes (v20.x-v25.x). The decoding logic handles various data formats and reports decoding issues.

**Warning:** For experimental purposes only. Decoded data might be incorrect.

## Features

- **Multi-version BadgerDB Support**: Compatible with BadgerDB v2, v3, and v4
- **Comprehensive Decoding**: Handles consensus state, runtime state, and EVM-specific data
- **Error Tracking**: Collects and reports decoding errors with detailed error counts
- **Statistics Generation**: Provides key type distributions, database size, and sample counts
- **Module-aware Parsing**: Recognizes and routes data to appropriate decoders (evm, accounts, contracts, core)
- **EVM Event Decoding**: Includes event signature database for human-readable EVM event names
- **FUSE Filesystem Support**: Works with databases on FUSE mounts using intelligent fallback strategies
- **Read-only Access**: Can analyze databases currently in use by nodes via `BypassLockGuard`

## Supported Database Types

- `consensus-blockstore` - Block metadata and commit info
- `consensus-evidence` - Byzantine validator evidence
- `consensus-mkvs` - Consensus state Merkle tree
- `consensus-state` - Tendermint/CometBFT consensus state
- `runtime-history` - Runtime block history with CBOR-encoded data (includes EVM events/transactions)
- `runtime-mkvs` - Runtime state Merkle tree (includes EVM storage)

## Building

```bash
# Build all versions (recommended)
make build-all

# Clean build artifacts
make clean
```

## Usage

### Prerequisites

- Go 1.21 or higher
- BadgerDB databases from Oasis nodes (see https://snapshots.oasis.io/)

### Command Syntax

```bash
./bin/badgerdb-sampler-v{2,3,4} <database-type> <path-to-db> [output-json] [max-samples]
```

**Parameters:**
- `database-type`: One of the supported database types (see above)
- `path-to-db`: Path to the BadgerDB database directory
- `output-json`: Optional path to save JSON results (default: stdout only)
- `max-samples`: Optional maximum number of samples to collect (default: 1000)

### Examples

```bash
# Analyze v2 consensus blockstore (default 1000 samples, stdout only)
./bin/badgerdb-sampler-v2 consensus-blockstore /path/to/blockstore.badger.db

# Analyze v3 runtime MKVS with JSON output
./bin/badgerdb-sampler-v3 runtime-mkvs /path/to/mkvs_storage.badger.db ./outputs/testnet-20220303/emerald-runtime-mkvs.json

# Analyze runtime history with custom sample limit
./bin/badgerdb-sampler-v3 runtime-history /path/to/history.badger.db ./outputs/runtime-history.json 500

# Analyze currently running node's database (read-only)
./bin/badgerdb-sampler-v3 consensus-state /var/lib/oasis/node/consensus/state.badger.db ./outputs/consensus-state.json
```

## Output Format

The tool outputs JSON with comprehensive statistics and samples:

```json
{
"database_path": "/path/to/mkvs_storage.badger.db",
"database_type": "runtime-mkvs",
"badgerdb_version": "v3",
"database_size_bytes": 1234567890,
"sample_count": 1000,
"key_type_counts": {
"mkvs:node": 850,
"mkvs:root": 150
},
"error_counts": {
"failed to decode CBOR value: unexpected EOF": 5,
"unknown module prefix": 2
},
"samples": [
{
"key_type": "mkvs:node",
"key": { /* decoded key structure */ },
"value": { /* decoded value structure */ },
}
]
}
```

**Key Fields:**
- `key_type_counts`: Distribution of different key types in the database
- `error_counts`: Aggregated decoding errors across all samples
- `samples`: Array of individual key-value pairs with full decoding details

## Architecture

The tool uses a three-layer design:

1. **Database Access** (`db_v*.go`, `db_common.go`)
- Version-specific BadgerDB initialization via build tags
- FUSE workaround with temp directory and symlinks
- Read-only mode with lock bypass for in-use databases

2. **Decoding Logic** (`decode_*.go`)
- `decode_consensus.go`: Tendermint protobuf parsing
- `decode_runtime.go`: CBOR/MKVS parsing with module routing
- `decode_evm.go`: EVM-specific parsing (contracts, events, transactions)

3. **Type System** (`types_*.go`)
- Separated deserialization and output types
- EVM-specific output structures
- Event signature database for human-readable names
144 changes: 144 additions & 0 deletions badgerdb-sampler/db_common.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
package main

import (
"fmt"
"io"
"os"
"path/filepath"
"strings"
)

// createLocalMirror creates a temporary local directory with database files copied/symlinked
// to avoid mmap issues on FUSE filesystems. Returns temp path and cleanup function.
func createLocalMirror(fusePath string) (string, func(), error) {
// Create temp directory on local filesystem (not FUSE)
tmpDir, err := os.MkdirTemp("", "badgerdb-temp-*")
if err != nil {
return "", nil, fmt.Errorf("failed to create temp directory: %w", err)
}

cleanup := func() {
if err := os.RemoveAll(tmpDir); err != nil {
fmt.Fprintf(os.Stderr, "Warning: Failed to clean up temp directory %s: %v\n", tmpDir, err)
}
}

fmt.Fprintf(os.Stderr, " Creating local mirror in: %s\n", tmpDir)

entries, err := os.ReadDir(fusePath)
if err != nil {
cleanup()
return "", nil, fmt.Errorf("failed to read database directory: %w", err)
}

copiedCount := 0
linkedCount := 0
skippedCount := 0

for _, entry := range entries {
if entry.IsDir() {
continue
}
name := entry.Name()
srcPath := filepath.Join(fusePath, name)
dstPath := filepath.Join(tmpDir, name)

// Skip memtable files - these will be created fresh
if strings.HasSuffix(name, ".mem") {
fmt.Fprintf(os.Stderr, " Skipping memtable file: %s\n", name)
skippedCount++
continue
}

// Copy small metadata files (MANIFEST, DISCARD, etc.)
// Symlink large data files (*.sst, *.vlog)
if strings.HasSuffix(name, ".sst") || strings.HasSuffix(name, ".vlog") {
// Symlink large data files
if err := os.Symlink(srcPath, dstPath); err != nil {
cleanup()
return "", nil, fmt.Errorf("failed to symlink %s: %w", name, err)
}
linkedCount++
} else {
// Copy small metadata files
if err := copyFile(srcPath, dstPath); err != nil {
cleanup()
return "", nil, fmt.Errorf("failed to copy %s: %w", name, err)
}
copiedCount++
}
}

fmt.Fprintf(os.Stderr, " Mirror created: %d files copied, %d files symlinked, %d files skipped\n",
copiedCount, linkedCount, skippedCount)

return tmpDir, cleanup, nil
}

// copyFile copies a file from src to dst
func copyFile(src, dst string) error {
srcFile, err := os.Open(src)
if err != nil {
return err
}
defer srcFile.Close()

dstFile, err := os.Create(dst)
if err != nil {
return err
}
defer dstFile.Close()

if _, err := io.Copy(dstFile, srcFile); err != nil {
return err
}

return dstFile.Sync()
}

// removeMemtableFiles renames .mem files by appending .bak suffix and returns a restore function.
// This allows opening databases with corrupted memtables by letting BadgerDB create fresh memtable files.
func removeMemtableFiles(dbPath string) (backupDir string, restore func() error, err error) {
entries, err := os.ReadDir(dbPath)
if err != nil {
return "", nil, fmt.Errorf("failed to read database directory: %w", err)
}

// Rename all .mem files to .mem.bak
count := 0
for _, entry := range entries {
if !entry.IsDir() && strings.HasSuffix(entry.Name(), ".mem") {
oldPath := filepath.Join(dbPath, entry.Name())
newPath := oldPath + ".bak"
if err := os.Rename(oldPath, newPath); err != nil {
return "", nil, fmt.Errorf("failed to rename %s: %w", entry.Name(), err)
}
count++
}
}

fmt.Fprintf(os.Stderr, " Renamed %d memtable file(s) to *.mem.bak\n", count)

// Restore function renames .mem.bak files back to .mem
restore = func() error {
entries, err := os.ReadDir(dbPath)
if err != nil {
return fmt.Errorf("failed to read database directory: %w", err)
}
count := 0
for _, entry := range entries {
if !entry.IsDir() && strings.HasSuffix(entry.Name(), ".mem.bak") {
oldPath := filepath.Join(dbPath, entry.Name())
newPath := strings.TrimSuffix(oldPath, ".bak")
if err := os.Rename(oldPath, newPath); err != nil {
return fmt.Errorf("failed to restore %s: %w", entry.Name(), err)
}
count++
}
}
fmt.Fprintf(os.Stderr, " Restored %d memtable file(s) from *.mem.bak\n", count)
return nil
}

return dbPath, restore, nil
}
Loading