🚧 Work In Progress
This project is still under active development. The following documentation is AI-generated and requires future cleanup and validation.
This is a Rust rewrite of datafusion-sqlancer, originally implemented in Java. The rewrite aims to simplify implementation, enable better integration with existing DataFusion tooling, and make test oracles applicable to
sqllogictests. See this issue for more details on the motivation behind the Rust rewrite.
A comprehensive fuzzing tool for Apache DataFusion, designed to test SQL query execution and find potential bugs, crashes, or inconsistencies in the query engine.
To run the fuzzer with default settings:
cargo run --releaseTo run with a custom configuration:
cargo run --release -- --config datafusion-fuzzer.tomlTo run with command-line options:
cargo run --release -- --config datafusion-fuzzer.toml --rounds 5 --queries-per-round 20The fuzzer supports extensive configuration options to customize the fuzzing process.
You can configure DataFusion Fuzzer in two ways:
- Configuration file: Use a TOML file to specify detailed settings
- Command-line arguments: Override configuration file settings or use standalone
See datafusion-fuzzer.toml for an example configuration file:
# Fuzzing execution settings
seed = 42
rounds = 3
queries_per_round = 10
timeout_seconds = 30
# Logging settings
display_logs = true
enable_tui = false
# log_path = "logs/datafusion-fuzzer.log"
# Table generation parameters
max_column_count = 5
max_row_count = 100
max_expr_level = 3
max_table_count = 3Options:
-c, --config <FILE> Path to config file
-s, --seed <SEED> Random seed [default: 42]
-r, --rounds <ROUNDS> Number of rounds to run
-q, --queries-per-round <QUERIES> Number of queries per round
-t, --timeout <TIMEOUT> Query timeout in seconds
-l, --log-path <LOG_PATH> Path to log file
-h, --help Print help
-V, --version Print version
max_table_count: Maximum number of tables that can be selected in a single query (default: 3)max_column_count: Maximum number of columns per generated table (default: 5)max_row_count: Maximum number of rows per generated table (default: 100)max_expr_level: Maximum expression nesting level (default: 3)
- where
- sort + limit, offset
- aggregate
- having
- join
- union/union all/intersect/except
- views
- scalar subquery
- 'relation-like' subquery
- Operators
- Scalar functions
- Aggregate Functions
- Window Functions
- Complete Primitive types
- Time-related types
- Array types
- Struct/Json
- CLI
- Oracle interface