DRAKEN

DRAKEN is a high-performance columnar vector library written in Cython/C that provides Arrow-compatible memory layouts with type-specialized vectors and optimized kernels for core operations.

Why DRAKEN?

We're building DRAKEN because PyArrow, while excellent for data interchange, is too general-purpose and adds overhead in the hot loops of SQL execution engines. DRAKEN strips that away to provide:

Leaner buffers with predictable memory layouts
Type-specialized vectors optimized for specific data types
Tighter control over performance-critical kernels
Zero-copy interoperability with Apache Arrow
Purpose-built design for Python database kernels

DRAKEN serves as the internal container format for Opteryx, replacing PyArrow in execution paths while maintaining Arrow compatibility for I/O operations.

What makes DRAKEN unique: It's not a dataframe library like Polars or DuckDB, nor a general API like PyArrow — it's a purpose-built execution container designed specifically for high-performance columnar data processing in Python database engines.

Features

Type-specialized vectors: Int64Vector, Float64Vector, StringVector, BoolVector
Morsel-based processing: Efficient batch data processing containers
Arrow interoperability: Zero-copy conversion to/from Apache Arrow
Compiled expression evaluators: High-performance evaluation of expression trees
SIMD optimizations: Platform-specific performance optimizations
Memory efficiency: Optimized memory layouts and null handling
C/Cython implementation: High-performance core written in Cython/C

Installation

From PyPI (Recommended)

pip install draken

From Source

git clone https://github.com/mabel-dev/draken.git
cd draken
pip install -e .

Development Installation

git clone https://github.com/mabel-dev/draken.git
cd draken
pip install -e ".[dev]"
make compile  # Build Cython extensions

Quick Start

import draken
import pyarrow as pa

# Create a vector from Arrow array (zero-copy)
arrow_array = pa.array([1, 2, 3, 4, 5], type=pa.int64())
vector = draken.Vector.from_arrow(arrow_array)
print(f"Vector length: {vector.length}")
print(f"Vector sum: {vector.sum()}")

# Working with different data types
bool_array = pa.array([True, False, None, True])
bool_vector = draken.Vector.from_arrow(bool_array)

float_array = pa.array([1.5, 2.5, None, 4.2])
float_vector = draken.Vector.from_arrow(float_array)
print(f"Float sum: {float_vector.sum()}")

# String operations
string_array = pa.array(['hello', 'world', None, 'draken'])
string_vector = draken.Vector.from_arrow(string_array)

# Convert back to Arrow (zero-copy)
arrow_result = vector.to_arrow()
print(f"Round-trip successful: {arrow_result.equals(arrow_array)}")

API Documentation

Vector Classes

DRAKEN provides type-specialized vector implementations:

Int64Vector: 64-bit integer values
Float64Vector: 64-bit floating-point values
StringVector: Variable-length string values
BoolVector: Boolean values
Vector: Base vector class with generic operations

Core Operations

import pyarrow as pa
import draken

# Vector creation from Arrow
arrow_array = pa.array([1, 2, 3, 4, 5], type=pa.int64())
vector = draken.Vector.from_arrow(arrow_array)

# Basic operations
print(vector.length)        # Length
print(vector.sum())         # Sum aggregation  
print(vector.min())         # Minimum value
print(vector.max())         # Maximum value

# Null handling
null_array = pa.array([1, None, 3, None, 5])
vector_with_nulls = draken.Vector.from_arrow(null_array)
print(vector_with_nulls.null_count)  # Count of null values

# Comparison operations
result = vector.less_than(3)    # Returns boolean vector
result = vector.equals(2)       # Element-wise equality

# Convert back to Arrow (zero-copy)
arrow_result = vector.to_arrow()

Compiled Expression Evaluation

Draken provides a high-performance compiled expression evaluator for efficiently evaluating complex predicates over morsels:

import draken
import pyarrow as pa
from draken.evaluators import (
    BinaryExpression,
    ColumnExpression,
    LiteralExpression,
    evaluate
)

# Create a morsel
table = pa.table({
    'x': [1, 2, 3, 4, 5],
    'y': ['england', 'france', 'england', 'spain', 'england']
})
morsel = draken.Morsel.from_arrow(table)

# Build expression: x == 1 AND y == 'england'
expr1 = BinaryExpression('equals', ColumnExpression('x'), LiteralExpression(1))
expr2 = BinaryExpression('equals', ColumnExpression('y'), LiteralExpression('england'))
expr = BinaryExpression('and', expr1, expr2)

# Evaluate - returns boolean vector
result = draken.evaluate(morsel, expr)
print(list(result))  # [True, False, False, False, False]

The compiled evaluator:

Recognizes common expression patterns
Generates optimized single-pass evaluation code
Automatically caches compiled evaluators
Provides clean API for SQL engine integration

See Compiled Evaluators Documentation for details.

Performance

DRAKEN is designed for high-performance scenarios where PyArrow's generality becomes a bottleneck:

Memory efficiency: 20-40% lower memory usage vs PyArrow for typical workloads
Processing speed: 2-5x faster for type-specific operations
SIMD support: Automatic vectorization on x86_64 and ARM platforms
Zero-copy operations: Minimal data copying between operations

Benchmarks coming soon

Development

Prerequisites

Python 3.11+
Cython 3.1.3+
C++17 compatible compiler
PyArrow (for interoperability)

Building

# Install development dependencies
pip install -e ".[dev]"

# Compile Cython extensions
make compile

# Run tests
make test

# Run linting
make lint

# Generate coverage report
make coverage

Project Structure

draken/
├── draken/
│   ├── core/           # Core buffer and type definitions
│   ├── vectors/        # Type-specialized vector implementations
│   ├── morsels/        # Batch processing containers  
│   └── interop/        # Arrow interoperability layer
├── tests/              # Test suite
└── docs/              # Documentation

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Run the test suite (make test)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Supported Platforms

Linux: x86_64, ARM64
macOS: x86_64, ARM64 (Apple Silicon)
Windows: x86_64

Requirements

Python 3.11+
PyArrow
Cython (for building from source)

License

Licensed under the Apache License 2.0.

Related Projects

Opteryx - Distributed SQL query engine using DRAKEN
Apache Arrow - Cross-language development platform for in-memory data

Acknowledgments

DRAKEN builds upon the excellent work of the Apache Arrow project and is inspired by the need for specialized, high-performance columnar containers in analytical database engines.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
docs		docs
draken		draken
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build-wheels.sh		build-wheels.sh
build_counter.py		build_counter.py
fix_cython_whitespace.py		fix_cython_whitespace.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DRAKEN

Why DRAKEN?

Features

Installation

From PyPI (Recommended)

From Source

Development Installation

Quick Start

API Documentation

Vector Classes

Core Operations

Compiled Expression Evaluation

Performance

Development

Prerequisites

Building

Project Structure

Contributing

Supported Platforms

Requirements

License

Related Projects

Acknowledgments

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

mabel-dev/draken

Folders and files

Latest commit

History

Repository files navigation

DRAKEN

Why DRAKEN?

Features

Installation

From PyPI (Recommended)

From Source

Development Installation

Quick Start

API Documentation

Vector Classes

Core Operations

Compiled Expression Evaluation

Performance

Development

Prerequisites

Building

Project Structure

Contributing

Supported Platforms

Requirements

License

Related Projects

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages