-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Problem Overview
The relay experiences nonce ordering failures when multiple eth_sendRawTransaction
requests from the same sender address are submitted in rapid succession (hundreds of microseconds apart). Async calls to mirror nodes and other precheck operations can reorder the processing sequence, causing transactions to reach consensus nodes out-of-order and trigger "wrong nonce" errors.
Proposed Solution
Implement a two-tier mutex system to ensure proper nonce ordering:
- Local mutex: Handle same-process concurrency
- Distributed mutex: Handle cross-instance coordination
POC Scope: Start with local mutex implementation first, then upgrade to include distributed mutex in subsequent phases.
Why Two-Tier Architecture
- Local mutex: Fast in-memory coordination within single relay instance (~microseconds)
- Distributed mutex: Global ordering across multiple relay instances (~1-5ms via Redis)
- Combined benefit: Optimal performance for both single and multi-instance deployments
- Graceful scaling: Start simple, add distributed coordination when needed
Lock & Unlock Placement Strategy
Lock Placement: At ingress time in KOA middleware
- Position: Before ANY side-effect, async operation
- Idealy before
await this.getRequestResult(body, ctx.ip, requestId);
inKoaJsonRpc.handleSingleRequest()
- Rationale: Acquire lock immediately when request enters the system
- Ensures consistent execution order from the earliest point
Unlock Placement: After SDK execution
- Position: Immediately after
transaction.execute()
returns - Rationale: Transaction is committed to consensus at this point
- Performance benefit: Reduces lock hold time by unlocking before subsequent operations (receipt checks, HBAR rate limiting, etc.), improving throughput
Library Selection
Local Mutex: async-mutex
- 4M+ weekly downloads, actively maintained
- Clean async/await integration with
runExclusive()
method - Production-proven with minimal overhead
Distributed Mutex: Redis with node-redis
- Industry standard SET NX EX pattern for distributed locking
- Official Node.js Redis client
- Single Redis instance approach (matches current relay infrastructure)
Per-Sender Locking Strategy
- Lock granularity: One mutex per sender address (wallet address)
- Different wallet addresses can process transactions concurrently
- Automatic cleanup using LRU cache to prevent memory leaks
- Maximizes throughput while ensuring correctness per sender
Minimal Changes Principle
- Scope: Only affects
eth_sendRawTransaction
method - No impact: All other RPC methods remain unchanged
- Clean integration: New synchronizer class with clear boundaries
- Risk reduction: Isolated changes for easier testing and rollback
Success Criteria
POC Phase Goals:
- Fully working local mutex implementation (no distributed mutex yet)
- Build tests with transactions submitted nearly instantly. Acceptance test and mimic the scenario with a forge script by signing a list of transactions and submitting them all at once or spacing them a few hundred microseconds apart.
- Eliminate nonce ordering failures under rapid succession load
Performance & Quality Goals:
- Maintain or improve overall system performance
- Zero impact on non-eth_sendRawTransaction operations
- Clean, testable implementation ready for distributed upgrade