Skip to content

High-performance inter-process communication for Python using memory-mapped files

License

Notifications You must be signed in to change notification settings

Cusp-AI/mmap-ipc

Repository files navigation

mmap-ipc

CI Python Version Version License

High-performance inter-process communication for Python using memory-mapped files. Sidestep the GIL by using separate processes with fast shared memory.

Why?

Python's GIL prevents parallel execution in threads. This library enables true parallelism by:

  • Using separate processes (each with its own GIL)
  • Sharing memory via mmap (no pickling overhead)
  • Providing 10-50x better throughput than multiprocessing.Queue (depending on workload)

Installation

uv venv && source .venv/bin/activate
uv pip install -e .

Quick Start

Ring Buffer (Single Producer/Consumer)

from mmapipc import SharedMemoryRing
import multiprocessing as mp

def producer():
    with SharedMemoryRing.open("/tmp/ring") as ring:
        for i in range(1000):
            ring.write(f"Message {i}".encode(), block=True)

def consumer():
    with SharedMemoryRing.open("/tmp/ring") as ring:
        while True:
            data = ring.read(block=True, timeout=1.0)
            if data:
                process(data)

# Create buffer
with SharedMemoryRing.create("/tmp/ring", capacity=1024*1024):
    pass

# Start processes
mp.Process(target=consumer).start()
mp.Process(target=producer).start()

Message Queue (Multiple Producers/Consumers)

from mmapipc import SharedMemoryQueue

def worker(worker_id):
    with SharedMemoryQueue.open("/tmp/queue") as queue:
        while True:
            task = queue.get(block=True, timeout=5.0)
            if task:
                process_task(task)

# Create and populate queue
with SharedMemoryQueue.create("/tmp/queue", capacity=1024*1024) as queue:
    for i in range(100):
        queue.put(f"Task {i}".encode())

# Start workers
for i in range(4):
    mp.Process(target=worker, args=(i,)).start()

Low-Level Shared Memory

from mmapipc import SharedMemory

# Process 1: Write
with SharedMemory.create("/tmp/shm", size=1024) as shm:
    shm.write(0, b"Hello")
    shm.write_uint(100, 42)

# Process 2: Read
with SharedMemory.open("/tmp/shm") as shm:
    data = shm.read(0, 5)
    num = shm.read_uint(100)

API

SharedMemoryRing

Lock-free ring buffer for streaming data (single producer, single consumer).

  • create(path, capacity) / open(path) - Create or open buffer
  • write(data, block=False, timeout=None) - Write bytes (returns bool)
  • read(block=False, timeout=None) - Read bytes (returns bytes or None)
  • available_write() / available_read() - Get available space/data
  • clear() - Reset buffer
  • close() / unlink() - Close or delete

Performance & GIL-Free IPC

This library is designed to bypass the Global Interpreter Lock (GIL) for high-throughput data transfer between Python processes.

Why it's useful:

  • True Parallelism: Processes run independently, utilizing multiple CPU cores.
  • Zero Serialization: Unlike multiprocessing.Queue (which pickles objects), this library copies raw bytes (or avoids copying entirely with get_view), resulting in massive throughput gains.
  • Zero-Copy Reads: With get_view(), you can process data directly from shared memory without creating intermediate Python bytes objects.

Benchmarks (MacBook Pro M3):

  • Throughput: Up to 19 GB/s (Zero-Copy Read).
  • Message Rate: Up to 800,000 msg/sec (Batch Write).
  • Latency: Sub-millisecond overhead.

Best For:

  • Video processing pipelines (passing raw frames).
  • High-frequency trading / financial data feeds.
  • Machine Learning data loading (passing tensors/arrays).

SharedMemoryQueue

Thread-safe queue for discrete messages (multiple producers/consumers).

  • create(path, capacity) / open(path) - Create or open queue
  • put(data, block=True, timeout=None) - Enqueue message (returns bool)
  • put_batch(messages, block=True, timeout=None) - Enqueue multiple messages (returns count)
  • get(block=True, timeout=None) - Dequeue message (returns bytes or None)
  • get_batch(max_count, block=True, timeout=None) - Dequeue multiple messages (returns list[bytes])
  • get_view(block=True, timeout=None) - Zero-copy dequeue (returns list[memoryview] or None)
  • qsize() / empty() - Get size or check if empty
  • clear() - Clear queue
  • close() / unlink() - Close or delete

SharedMemory

  • create(path, size) / open(path) - Create or open mapping
  • write(offset, data) / read(offset, length) - Read/write bytes
  • write_int(offset, val) / read_int(offset) - Read/write signed int64
  • write_uint(offset, val) / read_uint(offset) - Read/write unsigned int64
  • get_view(offset, size) - Get zero-copy memoryview
  • close() / unlink() - Close or delete

Examples

python examples/ring_buffer_example.py  # Streaming demo
python examples/queue_example.py        # Worker pool demo
python examples/benchmark.py            # Basic performance comparison
python examples/benchmark_advanced.py   # Batching & Throughput benchmark

Testing

uv pip install -e ".[dev]"
pytest tests/ -v

Alternatives Comparison

When to use mmap-ipc

  • ✅ Same-machine Python processes
  • ✅ High throughput required (>1 GB/s)
  • ✅ Low latency critical (<100μs)
  • ✅ Unix/Linux environment

Alternatives

Solution Throughput Use Case
mmap-ipc 19 GB/s High-performance local IPC
multiprocessing.shared_memory ~19 GB/s* Cross-platform, DIY sync
NumPy + shared memory ~19 GB/s* Numerical arrays only
Redis (localhost) 1-2 GB/s Distributed systems

*Theoretical maximum if you implement synchronization correctly

Bottom line: For high-performance, same-machine Python IPC, mmap-ipc provides the best balance of performance and ease-of-use.

Production Readiness

Maturity

  • Version: 1.0.0 (Beta)
  • Test Coverage: 27 comprehensive tests
  • CI/CD: Automated testing on Python 3.8-3.12, Ubuntu/macOS
  • Stability: Built on battle-tested primitives (mmap, fcntl)

What's Tested

  • ✅ Functional correctness (all operations)
  • ✅ Cross-process communication
  • ✅ Crash safety (process dies while holding lock)
  • ✅ Performance benchmarks
  • ✅ Edge cases (wrap-around, buffer full, etc.)

Known Limitations

  • Unix-only: Requires fcntl (Linux, macOS). Windows not supported.
  • File-based: Uses /tmp/ for shared memory files.

Recommended Usage

  1. Development: Use freely, report issues
  2. Production: Thoroughly test your specific use case first
  3. Fallback: Have a plan to use multiprocessing.Queue if needed

Notes

  • All primitives support context managers (with statements)
  • Use close() to release resources (automatic with context managers)
  • Use unlink() to delete the file (only call once when completely done)
  • Files are typically in /tmp/ and cached in RAM by the OS
  • Ring buffer: lock-free, ~48K msg/sec (optimized with slice assignment)
  • Queue: robust fcntl locking (crash-safe), ~38K msg/sec
  • Works on Unix-like systems (Linux, macOS) - requires fcntl

License

MIT

About

High-performance inter-process communication for Python using memory-mapped files

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages