Skip to content

Commit c3ba914

Browse files
flubmatheus23“ramfox”
authored
Blog post: Multipath Will Fix This (#367)
* Draft: Multipath Will Fix This This doesn't really explain the title, probaly should be a different title. I'm also not really sure who the audience is. It's written from a very high level, hinting at the mechanisms involved without going into technical details. It's also a first draft. But still, please comment. * PR review, typos etc * Change everything into line-per-sentence style Makes more sense after the first draft. * Finish the sentence * better wording of this paragraph * Move footnotes to after the full stops This looks much better. * explain this needs to be on the same socket * link to quinn * typo * emphasis this is a path a bit earlier * mention tunneling explicitly * typo * cleanup whitespace * make nat traversal a link * small wording fixes * introduce the fake ip addr * change title and tag line * make this less bold * consistency * Add clarifying sentence * more sentence * try fix links * capitalisation * avoid bold * Fix typos * chore: update date & url slug --------- Co-authored-by: Philipp Krüger <[email protected]> Co-authored-by: “ramfox” <“[email protected]”>
1 parent 640abca commit c3ba914

File tree

1 file changed

+328
-0
lines changed
  • src/app/blog/iroh-on-QUIC-multipath

1 file changed

+328
-0
lines changed
Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
import { BlogPostLayout } from '@/components/BlogPostLayout'
2+
import {ThemeImage} from '@/components/ThemeImage'
3+
4+
export const post = {
5+
draft: false,
6+
author: 'Floris Bruynooghe',
7+
date: '2025-08-05',
8+
title: 'iroh on QUIC Multipath',
9+
description:
10+
"Why we're upgrading iroh's networking engine",
11+
}
12+
13+
export const metadata = {
14+
title: post.title,
15+
description: post.description,
16+
openGraph: {
17+
title: post.title,
18+
description: post.description,
19+
images: [{
20+
url: `/api/og?title=Blog&subtitle=${post.title}`,
21+
width: 1200,
22+
height: 630,
23+
alt: post.title,
24+
type: 'image/png',
25+
}],
26+
type: 'article'
27+
}
28+
}
29+
30+
export default (props) => <BlogPostLayout article={post} {...props} />
31+
32+
33+
Iroh is a library to establish direct peer-to-peer QUIC connections.
34+
This means iroh does [NAT traversal],
35+
colloquially known as holepunching.
36+
37+
[NAT traversal]: https://en.wikipedia.org/wiki/NAT_traversal
38+
39+
The basic idea is that two endpoints, both behind a NAT, establish a connection via a relay server.
40+
Once the connection to the relay server is established they can do two things:
41+
42+
- Exchange QUIC datagrams via the relay connection.
43+
- Coordinate holepunching to establish a direct connection.
44+
45+
And once you have holepunched,
46+
you can move the QUIC datagrams to the direct connection and stop relying on the relay server.
47+
Simple.
48+
49+
<Note>
50+
This post is generally going to simplify the world a lot.
51+
Of course there are many more network situations other than two endpoints both connected to the internet via a NAT router.
52+
And iroh has to work with all of them.
53+
But you would get bored reading this and I would get lost writing it.
54+
So I'm keeping this narrative simple.
55+
</Note>
56+
57+
# Relay Servers
58+
59+
An iroh relay server is a classical piece of server software,
60+
running in a datacenter.
61+
It exists even though we want p2p connections,
62+
because in today's internet we cannot have direct connections without holepunching.
63+
And you cannot have holepunching without being able to coordinate.
64+
Thus, the relay server.
65+
66+
Because we would like this relay server to essentially *always* work,
67+
it uses the most common protocol on the internet:
68+
HTTP1.1 inside a TLS stream.
69+
Endpoints establish an entirely normal HTTPS connection to the relay server and then upgrade it to a WebSocket connection.[^websocket]
70+
This works even in many places where the TLS connection is Machine-In-The-Middled by inserting new "trusted" root certs because of "security".
71+
As long as an endpoint keeps this WebSocket connection open it can use the relay server.
72+
73+
[^websocket]: What's that?
74+
You're still using iroh < 0.91?
75+
Ok fine, maybe your relay server still uses a custom upgrade protocol instead of WebSockets.
76+
77+
The relay server itself is the simplest thing we can get away with.
78+
It forwards UDP datagrams from one endpoint to another,
79+
tunneling them inside the HTTP connections.
80+
Since iroh endpoints are identified by a [`NodeId`] it means you send it a destination `NodeId` together with a datagram.
81+
The relay server might now either:
82+
83+
[`NodeId`]: https://docs.rs/iroh/0.90.0/iroh/type.NodeId.html
84+
85+
- Drop the datagram on the floor,
86+
because the destination endpoint is not connected to this relay server.
87+
88+
- Forward the datagram to the destination.
89+
90+
The relay server does not need to know what is in the datagram.
91+
In fact, iroh makes sure it **does not** know what is inside:
92+
the payload is always encrypted to the destination endpoint.[^1]
93+
The relay server is nothing more than another network path along which UDP datagrams can travel between iroh nodes.
94+
95+
[^1]: Almost: The QUIC handshake has to establish a TLS connection.
96+
This means it has to send the TLS `ClientHello` message in clear text like any other TLS connection on the internet.
97+
Yes, we know about ECH.
98+
One thing at a time.
99+
100+
101+
# Holepunching
102+
103+
UDP holepunching is simple really.[^simplehp]
104+
All you need is for each endpoint to send a UDP datagram to the other at the same time.
105+
The NAT routers will think the incoming datagrams are a response to the outgoing ones and treat it as a connection.
106+
Now you have a holepunched, direct connection.
107+
108+
[^simplehp]: Of course it isn't.
109+
But as already said,
110+
the word count of this post is finite.
111+
112+
To do this an endpoint needs to:
113+
114+
- Know which IP addresses it might be reachable on.
115+
Some time we'll write this up in its own blog post,
116+
for now I'll just assume the endpoints know this.
117+
118+
- Send these IP address candidates to the remote endpoint via the relay server.
119+
120+
- Once both endpoints have the peer's candidate addresses,
121+
send "ping" datagrams to each candidate address of the peer.
122+
Both at the same time.
123+
124+
- If a "ping" datagram is received,
125+
respond with "yay, we holepunched!".
126+
Typically this will be only on 1 IP path out of all the candidates.
127+
Or maybe more and more these days it'll succeed for both an IPv4 and an IPv6 path.
128+
129+
If you followed carefully you'll have counted 3 special messages that need to be sent to the peer endpoint:
130+
131+
1. IP address candidates. These are sent via the relay server.
132+
133+
2. Pings. These are sent on the non-relayed IP paths.
134+
135+
3. Pongs. These are also sent on the non-relayed IP paths.
136+
137+
They need to be sent as UDP datagrams.
138+
Over the same *paths* as the QUIC datagrams are also being sent:
139+
the *relay path* and any *direct paths*.
140+
141+
142+
# Multiplexing UDP datagrams
143+
144+
Iroh stands on the shoulders of giants,
145+
and it looked carefully at ZeroTier and Tailscale.
146+
In particular it borrowed a lot from the DERP design from Tailscale.
147+
From the above holepunching description we get two kinds of packets:
148+
149+
- Application payload.
150+
For iroh these are QUIC datagrams.
151+
- Holepunching datagrams.
152+
153+
Both these need to be sent and received from the same socket,
154+
because holepunching a different socket than your application data uses is not that helpful.
155+
So when an iroh endpoint receives a packet it needs to first figure out which kind of packet this is:
156+
a QUIC datagram, or a holepunching datagram?
157+
If it is a QUIC datagram it is passed onto the QUIC stack.[^quicstack]
158+
If it is a holepunching datagram it needs to be handled by iroh itself,
159+
by a component we call the *magic socket*.
160+
This is done using the "QUIC bit",
161+
a bit in the first byte of the datagram which is defined as always set to 1 in QUIC version 1.[^greasing]
162+
For holepunching datagrams we set this bit to 0.
163+
164+
165+
[^quicstack]: iroh uses [Quinn] for the QUIC stack, an excellent project.
166+
167+
[Quinn]: https://crates.io/crates/quinn
168+
169+
[^greasing]: Since QUIC has released RFC 9287 which advocates "greasing" this bit:
170+
effectively toggling it randomly.
171+
This is an attempt to stop middleboxes from ossifying the protocol by starting to recognize this bit.
172+
Iroh not being able to grease this bit right now is not ideal either.
173+
174+
175+
# IP Congestion Control
176+
177+
This system works great and is what powers iroh today.
178+
However, it also has its limitations.
179+
One interesting aspect of the internet is *congestion control*.
180+
Basically, IP packets get sent around the internet from router to router,
181+
and each hop has its own speed and capacity.
182+
If you send too many packets the pipes will clog up and start to slow down.
183+
If you send yet more packets,
184+
routers will start dropping them.
185+
186+
Congestion control is tasked with threading the fine line of sending as many packets as fast as possible between two endpoints,
187+
without adversely affecting the latency and packet loss.
188+
This is difficult because there are many independent endpoints using all those links between routers at the same time.
189+
But it also has had a few decades of research,
190+
so we achieve reasonably decent results by now.
191+
192+
Each TCP connection has its own congestion controllers,
193+
one per endpoint.
194+
And the same goes for each QUIC connection.
195+
Unfortunately, our holepunching packets live outside of the QUIC connection,
196+
so they do not know about the QUIC congestion controller.
197+
What is worse:
198+
when holepunching succeeds,
199+
the iroh endpoint will route the QUIC datagrams via a different path than before;
200+
they will stop flowing over the relay connection and start using the direct path.
201+
202+
But the QUIC stack is entirely unaware of this!
203+
It has no idea about what destination packets get sent to.
204+
Iroh completely lies to the QUIC stack and tells it to send packets to some private IPv6 range.[^ipv6ula]
205+
Routing them to the correct path on the way out and rewriting received packets to come from this address.
206+
207+
[^ipv6ula]: Using our own IPv6 [Unique Local Address] Global ID.
208+
209+
[Unique Local Address]: https://en.wikipedia.org/wiki/Unique_local_address
210+
211+
Which is not great for the congestion controller,
212+
so iroh somehow coerces the QUIC congestion controller to restart whenever iroh chooses a new path.
213+
214+
215+
# Multiple Paths
216+
217+
By now I've talked several times about a "relay path" and a "direct path".
218+
A typical iroh connection has probably quite a few possible paths available between the two endpoints.
219+
A typical set would be:
220+
221+
- The path via the relay server.[^relaypath]
222+
- An IPv4 path over the WiFi interface.
223+
- An IPv6 path over the WiFi interface.
224+
- An IPv4 path over the mobile data interface.
225+
- An IPv6 path over the mobile data interface.
226+
227+
[^relaypath]: While this is currently a single relay path,
228+
you can easily imagine how you could expand this to a number of relay server paths.
229+
Patience. The future.
230+
231+
The entire point of the relay path is to be able to start communicating without needing holepunching.
232+
So that path just works.
233+
But generally you'd expect the bottom 4 paths to need holepunching.
234+
And currently iroh chooses the path with the lowest latency after holepunching.
235+
But what if iroh was aware of all those paths all the time?
236+
237+
238+
# QUIC Multipath
239+
240+
Let's forget holepunching for a minute,
241+
and assume we can establish all those paths without any firewall getting in the way.
242+
Would it not be great if our QUIC stack was aware of these multiple paths?
243+
For example, it could keep a congestion controller for each path separately.
244+
Each path would also have its own Round Trip Time (RTT).
245+
So you can make an educated guess on which path you'd like to send new packets without them being blocked, dropped or slowed down.[^mpcongestion]
246+
247+
This is exactly what the [QUIC-MULTIPATH] IETF draft has been figuring out:
248+
allow QUIC endpoints to use multiple paths at the same time.
249+
And we totally want to use this in iroh.
250+
We can have a world where we have several possible paths,
251+
select one as primary and others as backup paths and seamlessly transition between them as your endpoint moves through the network and paths appear and disappear.[^irohmove]
252+
253+
There are *a lot* of details about QUIC-MULTIPATH on how to make it work.
254+
And adding this functionality to Quinn has been a major undertaking.
255+
But the branch is becoming functional at last.
256+
257+
[^mpcongestion]: But hey!
258+
Some of these paths share at least the first and last hop.
259+
So they are not independent!
260+
Indeed, they are not.
261+
Congestion control is still a research area,
262+
especially for multiple paths with shared bottlenecks.
263+
Though, you should note that this already happens a lot on the internet,
264+
your laptop or phone probably has many TCP and/or QUIC connections to several servers right now.
265+
And these definitely share hops.
266+
Yet the congestion controllers do somehow figure out how to make this work,
267+
at least to some reasonable degree.
268+
269+
[^irohmove]: Wait, doesn't iroh already say it can do this?
270+
Indeed, indeed.
271+
Though if you've tried this you'd have noticed your application did experience some hiccups for a few seconds as iroh was figuring out where traffic needs to go.
272+
In theory we can do better with multipath,
273+
though it'll take some tweaking and tuning.
274+
275+
[QUIC-MULTIPATH]: https://datatracker.ietf.org/doc/draft-ietf-quic-multipath/
276+
277+
# Multipath Holepunching
278+
279+
If you've paid attention you'll have noticed that so far this still doesn't solve some of our issues:
280+
the holepunching datagrams still live outside of the QUIC stack.
281+
This means we send them at whatever time,
282+
not paying attention to the congestion controller.
283+
That's fine under light load,
284+
but under heavy load often results in lost packets.
285+
That in turn leads to having to re-try sending those.
286+
But preferably without accidentally DOSing an innocent UDP socket just quietly hanging out on the internet,
287+
accidentally using an IP address that you thought might belong to the remote endpoint.
288+
289+
So the next step we would like to take with the iroh multipath project is to move holepunching logic itself into QUIC.
290+
We're also not the first to consider this:
291+
Marten Seemann and Christian Huitema have been thinking about this as well and wrote down [some thoughts in a blog post].
292+
More importantly they started [QUIC-NAT-TRAVERSAL] draft which conceptually does a simple thing: move the holepunching packets *into* QUIC packets.
293+
294+
[some thoughts in a blog post]: https://seemann.io/posts/2024-10-26---p2p-quic/
295+
[QUIC-NAT-TRAVERSAL]: https://datatracker.ietf.org/doc/draft-seemann-quic-nat-traversal/
296+
297+
While QUIC-NAT-TRAVERSAL is highly experimental and we don't expect to follow it exactly as of the time of writing,
298+
this does have a number of benefits:
299+
300+
- The QUIC packets are already encrypted,
301+
we no longer need to manage our own encryption layer separately.
302+
303+
- QUIC already has very advanced packet acknowledgement and loss recovery mechanisms.
304+
Including the congestion control mechanisms.
305+
Essentially QUIC is a reliable transport,
306+
which this gets to benefit from.
307+
308+
- QUIC already has robust protection against sending too much data to unsuspecting hosts on the internet.
309+
310+
- In combination with QUIC-MULTIPATH we get a very robust and flexible system,
311+
allowing us to always schedule packets on the best possible path.
312+
Timely reacting to path changes and restarting holepunching.
313+
314+
Another consideration is that QUIC is already extensible.
315+
Notice that both QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL are negotiated at connection setup.
316+
This is a robust mechanism that allows us to be confident that in the future we'll be able to improve on these mechanisms.
317+
318+
319+
# Work In Progress
320+
321+
Integrating QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL into iroh changes the wire-protocol.
322+
That is part of the reason we want this done before our 1.0 release:
323+
once we release this we promise to keep our wire-protocol backwards compatible.
324+
Right now we're hard at work building the pieces needed to make these improvements.
325+
And sometime soon-ish they will start landing in the 0.9x releases.
326+
327+
We aim for iroh to become even more reliable for folks who push the limits,
328+
thanks to moving all the holepunching logic right into the QUIC stack.

0 commit comments

Comments
 (0)