When building high-performance, full-stack, data-driven applications, I repeatedly hit the same barriers.
- Protobuf/gRPC: convenient, but introduces memory copies.
- FlightRPC: powerful, but heavyweight and overengineered for most cases.
I wanted a crate that would:
- Integrate Arrow IPC memory format seamlessly into any async context
- Provide low-level control
- Establish composable patterns for arbitrary wire formats
- Require no heavy infrastructure to get started
- Offer a true zero-copy path from wire -> SIMD kernels without re-allocation
- Maintain reasonable compile times
Lightstream is the result: an extension to Minarrow that adds native streaming and I/O. It enables raw bytestream construction, reading and writing IPC and TLV formats, CSV/Parquet writers, and a high-performance 64-byte SIMD memory-mapped IPC reader.
- Bypassing library machinery and going straight to the wire
- Streaming Arrow IPC tables over network sockets with zero-copy buffers
- Defining custom binary transport protocols
- Zero-copy ingestion with mmap for ultra-fast analytics
- Control data alignment at source - SIMD-aligned Arrow IPC writers/readers
- Async data pipelines with backpressure-aware sinks and streams
- Building custom data transport layers and transfer protocols
Lightstream provides composable building blocks for high-performance data I/O in Rust:
- Asynchronous Arrow IPC streaming and file writing
- Framed decoders/sinks for IPC, TLV, CSV, and optional Parquet
- Zero-copy, memory-mapped Arrow file reads (~4.5ms for 100M rows × 4 columns on a consumer laptop)
- Direct integration with Tokio and futures using zero-copy buffers
- 64-byte SIMD aligned readers/writers (the only Arrow-compatible crate providing this in 2025)
- Customisable – You own the buffer. Pull-based or sink-driven streaming.
- Composable – Layerable codecs for encoders, decoders, sinks, and stream adapters.
- Control – Wire-level framing: IPC, TLV, CSV, and Parquet are handled at the transport boundary.
- Compatible – Native async support for futures and Tokio.
- Power – 64-byte aligned memory via
Vec64
ensures deterministic SIMD without hot-loop re-allocations. - Extensible – Primitive building blocks to implement custom wire formats. Contributions welcome.
- Efficient – Minimal dependencies and fast compile times.
Layer | Provided by Lightstream | Replaceable |
---|---|---|
Framing | TlvFrame , IpcMessage |
✅ |
Buffering | StreamBuffer |
✅ |
Encoding/Decoding | FrameEncoder , FrameDecoder |
✅ |
Streaming | GenByteStream , Sink |
✅ |
Formats | IPC, Parquet, CSV, TLV | ✅ |
Each layer is trait-based with a reference implementation. Swap in your own framing, buffering, or encoding logic without re-implementing the stack.
- Arrow IPC – Full support for SIMD-aligned File and Stream protocols, schema + dictionaries, streaming or random access.
- TLV – Minimal type-length-value framing for telemetry, control, or lightweight transport.
- Parquet (feature-gated) – Compact, columnar, compression-aware writer (Zstd, Snappy) with minimal dependencies.
- CSV – Streaming Arrow/Minarrow table readers/writers with headers, nulls, and custom delimiter/null handling.
- Memory Maps – Ultra-fast, zero-copy ingestion: millions of rows in microseconds, SIMD-ready.
use lightstream::models::streams::framed_byte_stream::FramedByteStream;
use lightstream::models::decoders::ipc::table_stream::TableStreamDecoder;
use lightstream::models::readers::ipc::table_stream_reader::TableStreamReader;
let framed = FramedByteStream::new(socket, TableStreamDecoder::default());
let mut reader = TableStreamReader::new(framed);
while let Some(table) = reader.next_table().await? {
println!("Received table: {:?}", table.name);
}
pub struct MyFramer;
impl FrameDecoder for MyFramer {
type Frame = Vec<u8>;
fn decode(&mut self, buf: &[u8]) -> DecodeResult<Self::Frame> {
// Custom framing logic
}
}
let stream = FramedByteStream::new(socket, MyFramer);
use minarrow::{arr_i32, arr_str32, FieldArray, Table};
use lightstream::io::table_writer::TableWriter;
use lightstream::enums::IPCMessageProtocol;
use tokio::fs::File;
#[tokio::main]
async fn main() -> std::io::Result<()> {
let col1 = FieldArray::from_inner("numbers", arr_i32![1, 2, 3]);
let col2 = FieldArray::from_inner("letters", arr_str32!["x", "y", "z"]);
let table = Table::new("demo".into(), vec![col1, col2].into());
let file = File::create("demo.arrow").await?;
let schema = table.schema().to_vec();
let mut writer = TableWriter::new(file, schema, IPCMessageProtocol::File)?;
writer.write_table(table).await?;
writer.finish().await?;
Ok(())
}
parquet
– Parquet writermmap
– Memory-mapped fileszstd
– Zstd compression (IPC + Parquet)snappy
– Snappy compression (IPC + Parquet)
This project is licensed under the MIT Licence. See the LICENCE
file for full terms, and THIRD_PARTY_LICENSES
for Apache-licensed dependencies.
This project is not affiliated with Apache Arrow or the Apache Software Foundation.
It serialises the public Arrow format via a custom implementation (Minarrow), while reusing Flatbuffers schemas from Arrow-RS for schema type generation.