mmap-ipc

High-performance inter-process communication for Python using memory-mapped files. Sidestep the GIL by using separate processes with fast shared memory.

Why?

Python's GIL prevents parallel execution in threads. This library enables true parallelism by:

Using separate processes (each with its own GIL)
Sharing memory via mmap (no pickling overhead)
Providing 10-50x better throughput than multiprocessing.Queue (depending on workload)

Installation

uv venv && source .venv/bin/activate
uv pip install -e .

Quick Start

Ring Buffer (Single Producer/Consumer)

from mmapipc import SharedMemoryRing
import multiprocessing as mp

def producer():
    with SharedMemoryRing.open("/tmp/ring") as ring:
        for i in range(1000):
            ring.write(f"Message {i}".encode(), block=True)

def consumer():
    with SharedMemoryRing.open("/tmp/ring") as ring:
        while True:
            data = ring.read(block=True, timeout=1.0)
            if data:
                process(data)

# Create buffer
with SharedMemoryRing.create("/tmp/ring", capacity=1024*1024):
    pass

# Start processes
mp.Process(target=consumer).start()
mp.Process(target=producer).start()

Message Queue (Multiple Producers/Consumers)

from mmapipc import SharedMemoryQueue

def worker(worker_id):
    with SharedMemoryQueue.open("/tmp/queue") as queue:
        while True:
            task = queue.get(block=True, timeout=5.0)
            if task:
                process_task(task)

# Create and populate queue
with SharedMemoryQueue.create("/tmp/queue", capacity=1024*1024) as queue:
    for i in range(100):
        queue.put(f"Task {i}".encode())

# Start workers
for i in range(4):
    mp.Process(target=worker, args=(i,)).start()

Low-Level Shared Memory

from mmapipc import SharedMemory

# Process 1: Write
with SharedMemory.create("/tmp/shm", size=1024) as shm:
    shm.write(0, b"Hello")
    shm.write_uint(100, 42)

# Process 2: Read
with SharedMemory.open("/tmp/shm") as shm:
    data = shm.read(0, 5)
    num = shm.read_uint(100)

API

SharedMemoryRing

Lock-free ring buffer for streaming data (single producer, single consumer).

create(path, capacity) / open(path) - Create or open buffer
write(data, block=False, timeout=None) - Write bytes (returns bool)
read(block=False, timeout=None) - Read bytes (returns bytes or None)
available_write() / available_read() - Get available space/data
clear() - Reset buffer
close() / unlink() - Close or delete

Performance & GIL-Free IPC

This library is designed to bypass the Global Interpreter Lock (GIL) for high-throughput data transfer between Python processes.

Why it's useful:

True Parallelism: Processes run independently, utilizing multiple CPU cores.
Zero Serialization: Unlike multiprocessing.Queue (which pickles objects), this library copies raw bytes (or avoids copying entirely with get_view), resulting in massive throughput gains.
Zero-Copy Reads: With get_view(), you can process data directly from shared memory without creating intermediate Python bytes objects.

Benchmarks (MacBook Pro M3):

Throughput: Up to 19 GB/s (Zero-Copy Read).
Message Rate: Up to 800,000 msg/sec (Batch Write).
Latency: Sub-millisecond overhead.

Best For:

Video processing pipelines (passing raw frames).
High-frequency trading / financial data feeds.
Machine Learning data loading (passing tensors/arrays).

SharedMemoryQueue

Thread-safe queue for discrete messages (multiple producers/consumers).

create(path, capacity) / open(path) - Create or open queue
put(data, block=True, timeout=None) - Enqueue message (returns bool)
put_batch(messages, block=True, timeout=None) - Enqueue multiple messages (returns count)
get(block=True, timeout=None) - Dequeue message (returns bytes or None)
get_batch(max_count, block=True, timeout=None) - Dequeue multiple messages (returns list[bytes])
get_view(block=True, timeout=None) - Zero-copy dequeue (returns list[memoryview] or None)
qsize() / empty() - Get size or check if empty
clear() - Clear queue
close() / unlink() - Close or delete

SharedMemory

create(path, size) / open(path) - Create or open mapping
write(offset, data) / read(offset, length) - Read/write bytes
write_int(offset, val) / read_int(offset) - Read/write signed int64
write_uint(offset, val) / read_uint(offset) - Read/write unsigned int64
get_view(offset, size) - Get zero-copy memoryview
close() / unlink() - Close or delete

Examples

python examples/ring_buffer_example.py  # Streaming demo
python examples/queue_example.py        # Worker pool demo
python examples/benchmark.py            # Basic performance comparison
python examples/benchmark_advanced.py   # Batching & Throughput benchmark

Testing

uv pip install -e ".[dev]"
pytest tests/ -v

Alternatives Comparison

When to use mmap-ipc

✅ Same-machine Python processes
✅ High throughput required (>1 GB/s)
✅ Low latency critical (<100μs)
✅ Unix/Linux environment

Alternatives

Solution	Throughput	Use Case
mmap-ipc	19 GB/s	High-performance local IPC
`multiprocessing.shared_memory`	~19 GB/s*	Cross-platform, DIY sync
NumPy + shared memory	~19 GB/s*	Numerical arrays only
Redis (localhost)	1-2 GB/s	Distributed systems

*Theoretical maximum if you implement synchronization correctly

Bottom line: For high-performance, same-machine Python IPC, mmap-ipc provides the best balance of performance and ease-of-use.

Production Readiness

Maturity

Version: 1.0.0 (Beta)
Test Coverage: 27 comprehensive tests
CI/CD: Automated testing on Python 3.8-3.12, Ubuntu/macOS
Stability: Built on battle-tested primitives (mmap, fcntl)

What's Tested

✅ Functional correctness (all operations)
✅ Cross-process communication
✅ Crash safety (process dies while holding lock)
✅ Performance benchmarks
✅ Edge cases (wrap-around, buffer full, etc.)

Known Limitations

Unix-only: Requires fcntl (Linux, macOS). Windows not supported.
File-based: Uses /tmp/ for shared memory files.

Recommended Usage

Development: Use freely, report issues
Production: Thoroughly test your specific use case first
Fallback: Have a plan to use multiprocessing.Queue if needed

Notes

All primitives support context managers (with statements)
Use close() to release resources (automatic with context managers)
Use unlink() to delete the file (only call once when completely done)
Files are typically in /tmp/ and cached in RAM by the OS
Ring buffer: lock-free, ~48K msg/sec (optimized with slice assignment)
Queue: robust fcntl locking (crash-safe), ~38K msg/sec
Works on Unix-like systems (Linux, macOS) - requires fcntl

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
examples		examples
src/mmapipc		src/mmapipc
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mmap-ipc

Why?

Installation

Quick Start

Ring Buffer (Single Producer/Consumer)

Message Queue (Multiple Producers/Consumers)

Low-Level Shared Memory

API

SharedMemoryRing

Performance & GIL-Free IPC

SharedMemoryQueue

SharedMemory

Examples

Testing

Alternatives Comparison

When to use mmap-ipc

Alternatives

Production Readiness

Maturity

What's Tested

Known Limitations

Recommended Usage

Notes

License

About

Uh oh!

Releases

Packages

Languages

License

Cusp-AI/mmap-ipc

Folders and files

Latest commit

History

Repository files navigation

mmap-ipc

Why?

Installation

Quick Start

Ring Buffer (Single Producer/Consumer)

Message Queue (Multiple Producers/Consumers)

Low-Level Shared Memory

API

SharedMemoryRing

Performance & GIL-Free IPC

SharedMemoryQueue

SharedMemory

Examples

Testing

Alternatives Comparison

When to use mmap-ipc

Alternatives

Production Readiness

Maturity

What's Tested

Known Limitations

Recommended Usage

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages