DRAKEN is a high-performance columnar vector library written in Cython/C that provides Arrow-compatible memory layouts with type-specialized vectors and optimized kernels for core operations.
We're building DRAKEN because PyArrow, while excellent for data interchange, is too general-purpose and adds overhead in the hot loops of SQL execution engines. DRAKEN strips that away to provide:
- Leaner buffers with predictable memory layouts
- Type-specialized vectors optimized for specific data types
- Tighter control over performance-critical kernels
- Zero-copy interoperability with Apache Arrow
- Purpose-built design for Python database kernels
DRAKEN serves as the internal container format for Opteryx, replacing PyArrow in execution paths while maintaining Arrow compatibility for I/O operations.
What makes DRAKEN unique: It's not a dataframe library like Polars or DuckDB, nor a general API like PyArrow β it's a purpose-built execution container designed specifically for high-performance columnar data processing in Python database engines.
- Type-specialized vectors:
Int64Vector,Float64Vector,StringVector,BoolVector - Morsel-based processing: Efficient batch data processing containers
- Arrow interoperability: Zero-copy conversion to/from Apache Arrow
- Compiled expression evaluators: High-performance evaluation of expression trees
- SIMD optimizations: Platform-specific performance optimizations
- Memory efficiency: Optimized memory layouts and null handling
- C/Cython implementation: High-performance core written in Cython/C
pip install drakengit clone https://github.com/mabel-dev/draken.git
cd draken
pip install -e .git clone https://github.com/mabel-dev/draken.git
cd draken
pip install -e ".[dev]"
make compile # Build Cython extensionsimport draken
import pyarrow as pa
# Create a vector from Arrow array (zero-copy)
arrow_array = pa.array([1, 2, 3, 4, 5], type=pa.int64())
vector = draken.Vector.from_arrow(arrow_array)
print(f"Vector length: {vector.length}")
print(f"Vector sum: {vector.sum()}")
# Working with different data types
bool_array = pa.array([True, False, None, True])
bool_vector = draken.Vector.from_arrow(bool_array)
float_array = pa.array([1.5, 2.5, None, 4.2])
float_vector = draken.Vector.from_arrow(float_array)
print(f"Float sum: {float_vector.sum()}")
# String operations
string_array = pa.array(['hello', 'world', None, 'draken'])
string_vector = draken.Vector.from_arrow(string_array)
# Convert back to Arrow (zero-copy)
arrow_result = vector.to_arrow()
print(f"Round-trip successful: {arrow_result.equals(arrow_array)}")DRAKEN provides type-specialized vector implementations:
Int64Vector: 64-bit integer valuesFloat64Vector: 64-bit floating-point valuesStringVector: Variable-length string valuesBoolVector: Boolean valuesVector: Base vector class with generic operations
import pyarrow as pa
import draken
# Vector creation from Arrow
arrow_array = pa.array([1, 2, 3, 4, 5], type=pa.int64())
vector = draken.Vector.from_arrow(arrow_array)
# Basic operations
print(vector.length) # Length
print(vector.sum()) # Sum aggregation
print(vector.min()) # Minimum value
print(vector.max()) # Maximum value
# Null handling
null_array = pa.array([1, None, 3, None, 5])
vector_with_nulls = draken.Vector.from_arrow(null_array)
print(vector_with_nulls.null_count) # Count of null values
# Comparison operations
result = vector.less_than(3) # Returns boolean vector
result = vector.equals(2) # Element-wise equality
# Convert back to Arrow (zero-copy)
arrow_result = vector.to_arrow()Draken provides a high-performance compiled expression evaluator for efficiently evaluating complex predicates over morsels:
import draken
import pyarrow as pa
from draken.evaluators import (
BinaryExpression,
ColumnExpression,
LiteralExpression,
evaluate
)
# Create a morsel
table = pa.table({
'x': [1, 2, 3, 4, 5],
'y': ['england', 'france', 'england', 'spain', 'england']
})
morsel = draken.Morsel.from_arrow(table)
# Build expression: x == 1 AND y == 'england'
expr1 = BinaryExpression('equals', ColumnExpression('x'), LiteralExpression(1))
expr2 = BinaryExpression('equals', ColumnExpression('y'), LiteralExpression('england'))
expr = BinaryExpression('and', expr1, expr2)
# Evaluate - returns boolean vector
result = draken.evaluate(morsel, expr)
print(list(result)) # [True, False, False, False, False]The compiled evaluator:
- Recognizes common expression patterns
- Generates optimized single-pass evaluation code
- Automatically caches compiled evaluators
- Provides clean API for SQL engine integration
See Compiled Evaluators Documentation for details.
DRAKEN is designed for high-performance scenarios where PyArrow's generality becomes a bottleneck:
- Memory efficiency: 20-40% lower memory usage vs PyArrow for typical workloads
- Processing speed: 2-5x faster for type-specific operations
- SIMD support: Automatic vectorization on x86_64 and ARM platforms
- Zero-copy operations: Minimal data copying between operations
Benchmarks coming soon
- Python 3.11+
- Cython 3.1.3+
- C++17 compatible compiler
- PyArrow (for interoperability)
# Install development dependencies
pip install -e ".[dev]"
# Compile Cython extensions
make compile
# Run tests
make test
# Run linting
make lint
# Generate coverage report
make coveragedraken/
βββ draken/
β βββ core/ # Core buffer and type definitions
β βββ vectors/ # Type-specialized vector implementations
β βββ morsels/ # Batch processing containers
β βββ interop/ # Arrow interoperability layer
βββ tests/ # Test suite
βββ docs/ # Documentation
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Run the test suite (
make test) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Linux: x86_64, ARM64
- macOS: x86_64, ARM64 (Apple Silicon)
- Windows: x86_64
- Python 3.11+
- PyArrow
- Cython (for building from source)
Licensed under the Apache License 2.0.
- Opteryx - Distributed SQL query engine using DRAKEN
- Apache Arrow - Cross-language development platform for in-memory data
DRAKEN builds upon the excellent work of the Apache Arrow project and is inspired by the need for specialized, high-performance columnar containers in analytical database engines.