|
| 1 | +import { BlogPostLayout } from '@/components/BlogPostLayout' |
| 2 | +import {ThemeImage} from '@/components/ThemeImage' |
| 3 | + |
| 4 | +export const post = { |
| 5 | + draft: false, |
| 6 | + author: 'Floris Bruynooghe', |
| 7 | + date: '2025-08-05', |
| 8 | + title: 'iroh on QUIC Multipath', |
| 9 | + description: |
| 10 | + "Why we're upgrading iroh's networking engine", |
| 11 | +} |
| 12 | + |
| 13 | +export const metadata = { |
| 14 | + title: post.title, |
| 15 | + description: post.description, |
| 16 | + openGraph: { |
| 17 | + title: post.title, |
| 18 | + description: post.description, |
| 19 | + images: [{ |
| 20 | + url: `/api/og?title=Blog&subtitle=${post.title}`, |
| 21 | + width: 1200, |
| 22 | + height: 630, |
| 23 | + alt: post.title, |
| 24 | + type: 'image/png', |
| 25 | + }], |
| 26 | + type: 'article' |
| 27 | + } |
| 28 | +} |
| 29 | + |
| 30 | +export default (props) => <BlogPostLayout article={post} {...props} /> |
| 31 | + |
| 32 | + |
| 33 | +Iroh is a library to establish direct peer-to-peer QUIC connections. |
| 34 | +This means iroh does [NAT traversal], |
| 35 | +colloquially known as holepunching. |
| 36 | + |
| 37 | +[NAT traversal]: https://en.wikipedia.org/wiki/NAT_traversal |
| 38 | + |
| 39 | +The basic idea is that two endpoints, both behind a NAT, establish a connection via a relay server. |
| 40 | +Once the connection to the relay server is established they can do two things: |
| 41 | + |
| 42 | +- Exchange QUIC datagrams via the relay connection. |
| 43 | +- Coordinate holepunching to establish a direct connection. |
| 44 | + |
| 45 | +And once you have holepunched, |
| 46 | +you can move the QUIC datagrams to the direct connection and stop relying on the relay server. |
| 47 | +Simple. |
| 48 | + |
| 49 | +<Note> |
| 50 | +This post is generally going to simplify the world a lot. |
| 51 | +Of course there are many more network situations other than two endpoints both connected to the internet via a NAT router. |
| 52 | +And iroh has to work with all of them. |
| 53 | +But you would get bored reading this and I would get lost writing it. |
| 54 | +So I'm keeping this narrative simple. |
| 55 | +</Note> |
| 56 | + |
| 57 | +# Relay Servers |
| 58 | + |
| 59 | +An iroh relay server is a classical piece of server software, |
| 60 | +running in a datacenter. |
| 61 | +It exists even though we want p2p connections, |
| 62 | +because in today's internet we cannot have direct connections without holepunching. |
| 63 | +And you cannot have holepunching without being able to coordinate. |
| 64 | +Thus, the relay server. |
| 65 | + |
| 66 | +Because we would like this relay server to essentially *always* work, |
| 67 | +it uses the most common protocol on the internet: |
| 68 | +HTTP1.1 inside a TLS stream. |
| 69 | +Endpoints establish an entirely normal HTTPS connection to the relay server and then upgrade it to a WebSocket connection.[^websocket] |
| 70 | +This works even in many places where the TLS connection is Machine-In-The-Middled by inserting new "trusted" root certs because of "security". |
| 71 | +As long as an endpoint keeps this WebSocket connection open it can use the relay server. |
| 72 | + |
| 73 | +[^websocket]: What's that? |
| 74 | + You're still using iroh < 0.91? |
| 75 | + Ok fine, maybe your relay server still uses a custom upgrade protocol instead of WebSockets. |
| 76 | + |
| 77 | +The relay server itself is the simplest thing we can get away with. |
| 78 | +It forwards UDP datagrams from one endpoint to another, |
| 79 | +tunneling them inside the HTTP connections. |
| 80 | +Since iroh endpoints are identified by a [`NodeId`] it means you send it a destination `NodeId` together with a datagram. |
| 81 | +The relay server might now either: |
| 82 | + |
| 83 | +[`NodeId`]: https://docs.rs/iroh/0.90.0/iroh/type.NodeId.html |
| 84 | + |
| 85 | +- Drop the datagram on the floor, |
| 86 | + because the destination endpoint is not connected to this relay server. |
| 87 | + |
| 88 | +- Forward the datagram to the destination. |
| 89 | + |
| 90 | +The relay server does not need to know what is in the datagram. |
| 91 | +In fact, iroh makes sure it **does not** know what is inside: |
| 92 | +the payload is always encrypted to the destination endpoint.[^1] |
| 93 | +The relay server is nothing more than another network path along which UDP datagrams can travel between iroh nodes. |
| 94 | + |
| 95 | +[^1]: Almost: The QUIC handshake has to establish a TLS connection. |
| 96 | + This means it has to send the TLS `ClientHello` message in clear text like any other TLS connection on the internet. |
| 97 | + Yes, we know about ECH. |
| 98 | + One thing at a time. |
| 99 | + |
| 100 | + |
| 101 | +# Holepunching |
| 102 | + |
| 103 | +UDP holepunching is simple really.[^simplehp] |
| 104 | +All you need is for each endpoint to send a UDP datagram to the other at the same time. |
| 105 | +The NAT routers will think the incoming datagrams are a response to the outgoing ones and treat it as a connection. |
| 106 | +Now you have a holepunched, direct connection. |
| 107 | + |
| 108 | +[^simplehp]: Of course it isn't. |
| 109 | + But as already said, |
| 110 | + the word count of this post is finite. |
| 111 | + |
| 112 | +To do this an endpoint needs to: |
| 113 | + |
| 114 | +- Know which IP addresses it might be reachable on. |
| 115 | + Some time we'll write this up in its own blog post, |
| 116 | + for now I'll just assume the endpoints know this. |
| 117 | + |
| 118 | +- Send these IP address candidates to the remote endpoint via the relay server. |
| 119 | + |
| 120 | +- Once both endpoints have the peer's candidate addresses, |
| 121 | + send "ping" datagrams to each candidate address of the peer. |
| 122 | + Both at the same time. |
| 123 | + |
| 124 | +- If a "ping" datagram is received, |
| 125 | + respond with "yay, we holepunched!". |
| 126 | + Typically this will be only on 1 IP path out of all the candidates. |
| 127 | + Or maybe more and more these days it'll succeed for both an IPv4 and an IPv6 path. |
| 128 | + |
| 129 | +If you followed carefully you'll have counted 3 special messages that need to be sent to the peer endpoint: |
| 130 | + |
| 131 | +1. IP address candidates. These are sent via the relay server. |
| 132 | + |
| 133 | +2. Pings. These are sent on the non-relayed IP paths. |
| 134 | + |
| 135 | +3. Pongs. These are also sent on the non-relayed IP paths. |
| 136 | + |
| 137 | +They need to be sent as UDP datagrams. |
| 138 | +Over the same *paths* as the QUIC datagrams are also being sent: |
| 139 | +the *relay path* and any *direct paths*. |
| 140 | + |
| 141 | + |
| 142 | +# Multiplexing UDP datagrams |
| 143 | + |
| 144 | +Iroh stands on the shoulders of giants, |
| 145 | +and it looked carefully at ZeroTier and Tailscale. |
| 146 | +In particular it borrowed a lot from the DERP design from Tailscale. |
| 147 | +From the above holepunching description we get two kinds of packets: |
| 148 | + |
| 149 | +- Application payload. |
| 150 | + For iroh these are QUIC datagrams. |
| 151 | +- Holepunching datagrams. |
| 152 | + |
| 153 | +Both these need to be sent and received from the same socket, |
| 154 | +because holepunching a different socket than your application data uses is not that helpful. |
| 155 | +So when an iroh endpoint receives a packet it needs to first figure out which kind of packet this is: |
| 156 | +a QUIC datagram, or a holepunching datagram? |
| 157 | +If it is a QUIC datagram it is passed onto the QUIC stack.[^quicstack] |
| 158 | +If it is a holepunching datagram it needs to be handled by iroh itself, |
| 159 | +by a component we call the *magic socket*. |
| 160 | +This is done using the "QUIC bit", |
| 161 | +a bit in the first byte of the datagram which is defined as always set to 1 in QUIC version 1.[^greasing] |
| 162 | +For holepunching datagrams we set this bit to 0. |
| 163 | + |
| 164 | + |
| 165 | +[^quicstack]: iroh uses [Quinn] for the QUIC stack, an excellent project. |
| 166 | + |
| 167 | +[Quinn]: https://crates.io/crates/quinn |
| 168 | + |
| 169 | +[^greasing]: Since QUIC has released RFC 9287 which advocates "greasing" this bit: |
| 170 | + effectively toggling it randomly. |
| 171 | + This is an attempt to stop middleboxes from ossifying the protocol by starting to recognize this bit. |
| 172 | + Iroh not being able to grease this bit right now is not ideal either. |
| 173 | + |
| 174 | + |
| 175 | +# IP Congestion Control |
| 176 | + |
| 177 | +This system works great and is what powers iroh today. |
| 178 | +However, it also has its limitations. |
| 179 | +One interesting aspect of the internet is *congestion control*. |
| 180 | +Basically, IP packets get sent around the internet from router to router, |
| 181 | +and each hop has its own speed and capacity. |
| 182 | +If you send too many packets the pipes will clog up and start to slow down. |
| 183 | +If you send yet more packets, |
| 184 | +routers will start dropping them. |
| 185 | + |
| 186 | +Congestion control is tasked with threading the fine line of sending as many packets as fast as possible between two endpoints, |
| 187 | +without adversely affecting the latency and packet loss. |
| 188 | +This is difficult because there are many independent endpoints using all those links between routers at the same time. |
| 189 | +But it also has had a few decades of research, |
| 190 | +so we achieve reasonably decent results by now. |
| 191 | + |
| 192 | +Each TCP connection has its own congestion controllers, |
| 193 | +one per endpoint. |
| 194 | +And the same goes for each QUIC connection. |
| 195 | +Unfortunately, our holepunching packets live outside of the QUIC connection, |
| 196 | +so they do not know about the QUIC congestion controller. |
| 197 | +What is worse: |
| 198 | +when holepunching succeeds, |
| 199 | +the iroh endpoint will route the QUIC datagrams via a different path than before; |
| 200 | +they will stop flowing over the relay connection and start using the direct path. |
| 201 | + |
| 202 | +But the QUIC stack is entirely unaware of this! |
| 203 | +It has no idea about what destination packets get sent to. |
| 204 | +Iroh completely lies to the QUIC stack and tells it to send packets to some private IPv6 range.[^ipv6ula] |
| 205 | +Routing them to the correct path on the way out and rewriting received packets to come from this address. |
| 206 | + |
| 207 | +[^ipv6ula]: Using our own IPv6 [Unique Local Address] Global ID. |
| 208 | + |
| 209 | +[Unique Local Address]: https://en.wikipedia.org/wiki/Unique_local_address |
| 210 | + |
| 211 | +Which is not great for the congestion controller, |
| 212 | +so iroh somehow coerces the QUIC congestion controller to restart whenever iroh chooses a new path. |
| 213 | + |
| 214 | + |
| 215 | +# Multiple Paths |
| 216 | + |
| 217 | +By now I've talked several times about a "relay path" and a "direct path". |
| 218 | +A typical iroh connection has probably quite a few possible paths available between the two endpoints. |
| 219 | +A typical set would be: |
| 220 | + |
| 221 | +- The path via the relay server.[^relaypath] |
| 222 | +- An IPv4 path over the WiFi interface. |
| 223 | +- An IPv6 path over the WiFi interface. |
| 224 | +- An IPv4 path over the mobile data interface. |
| 225 | +- An IPv6 path over the mobile data interface. |
| 226 | + |
| 227 | +[^relaypath]: While this is currently a single relay path, |
| 228 | + you can easily imagine how you could expand this to a number of relay server paths. |
| 229 | + Patience. The future. |
| 230 | + |
| 231 | +The entire point of the relay path is to be able to start communicating without needing holepunching. |
| 232 | +So that path just works. |
| 233 | +But generally you'd expect the bottom 4 paths to need holepunching. |
| 234 | +And currently iroh chooses the path with the lowest latency after holepunching. |
| 235 | +But what if iroh was aware of all those paths all the time? |
| 236 | + |
| 237 | + |
| 238 | +# QUIC Multipath |
| 239 | + |
| 240 | +Let's forget holepunching for a minute, |
| 241 | +and assume we can establish all those paths without any firewall getting in the way. |
| 242 | +Would it not be great if our QUIC stack was aware of these multiple paths? |
| 243 | +For example, it could keep a congestion controller for each path separately. |
| 244 | +Each path would also have its own Round Trip Time (RTT). |
| 245 | +So you can make an educated guess on which path you'd like to send new packets without them being blocked, dropped or slowed down.[^mpcongestion] |
| 246 | + |
| 247 | +This is exactly what the [QUIC-MULTIPATH] IETF draft has been figuring out: |
| 248 | +allow QUIC endpoints to use multiple paths at the same time. |
| 249 | +And we totally want to use this in iroh. |
| 250 | +We can have a world where we have several possible paths, |
| 251 | +select one as primary and others as backup paths and seamlessly transition between them as your endpoint moves through the network and paths appear and disappear.[^irohmove] |
| 252 | + |
| 253 | +There are *a lot* of details about QUIC-MULTIPATH on how to make it work. |
| 254 | +And adding this functionality to Quinn has been a major undertaking. |
| 255 | +But the branch is becoming functional at last. |
| 256 | + |
| 257 | +[^mpcongestion]: But hey! |
| 258 | + Some of these paths share at least the first and last hop. |
| 259 | + So they are not independent! |
| 260 | + Indeed, they are not. |
| 261 | + Congestion control is still a research area, |
| 262 | + especially for multiple paths with shared bottlenecks. |
| 263 | + Though, you should note that this already happens a lot on the internet, |
| 264 | + your laptop or phone probably has many TCP and/or QUIC connections to several servers right now. |
| 265 | + And these definitely share hops. |
| 266 | + Yet the congestion controllers do somehow figure out how to make this work, |
| 267 | + at least to some reasonable degree. |
| 268 | + |
| 269 | +[^irohmove]: Wait, doesn't iroh already say it can do this? |
| 270 | + Indeed, indeed. |
| 271 | + Though if you've tried this you'd have noticed your application did experience some hiccups for a few seconds as iroh was figuring out where traffic needs to go. |
| 272 | + In theory we can do better with multipath, |
| 273 | + though it'll take some tweaking and tuning. |
| 274 | + |
| 275 | +[QUIC-MULTIPATH]: https://datatracker.ietf.org/doc/draft-ietf-quic-multipath/ |
| 276 | + |
| 277 | +# Multipath Holepunching |
| 278 | + |
| 279 | +If you've paid attention you'll have noticed that so far this still doesn't solve some of our issues: |
| 280 | +the holepunching datagrams still live outside of the QUIC stack. |
| 281 | +This means we send them at whatever time, |
| 282 | +not paying attention to the congestion controller. |
| 283 | +That's fine under light load, |
| 284 | +but under heavy load often results in lost packets. |
| 285 | +That in turn leads to having to re-try sending those. |
| 286 | +But preferably without accidentally DOSing an innocent UDP socket just quietly hanging out on the internet, |
| 287 | +accidentally using an IP address that you thought might belong to the remote endpoint. |
| 288 | + |
| 289 | +So the next step we would like to take with the iroh multipath project is to move holepunching logic itself into QUIC. |
| 290 | +We're also not the first to consider this: |
| 291 | +Marten Seemann and Christian Huitema have been thinking about this as well and wrote down [some thoughts in a blog post]. |
| 292 | +More importantly they started [QUIC-NAT-TRAVERSAL] draft which conceptually does a simple thing: move the holepunching packets *into* QUIC packets. |
| 293 | + |
| 294 | +[some thoughts in a blog post]: https://seemann.io/posts/2024-10-26---p2p-quic/ |
| 295 | +[QUIC-NAT-TRAVERSAL]: https://datatracker.ietf.org/doc/draft-seemann-quic-nat-traversal/ |
| 296 | + |
| 297 | +While QUIC-NAT-TRAVERSAL is highly experimental and we don't expect to follow it exactly as of the time of writing, |
| 298 | +this does have a number of benefits: |
| 299 | + |
| 300 | +- The QUIC packets are already encrypted, |
| 301 | + we no longer need to manage our own encryption layer separately. |
| 302 | + |
| 303 | +- QUIC already has very advanced packet acknowledgement and loss recovery mechanisms. |
| 304 | + Including the congestion control mechanisms. |
| 305 | + Essentially QUIC is a reliable transport, |
| 306 | + which this gets to benefit from. |
| 307 | + |
| 308 | +- QUIC already has robust protection against sending too much data to unsuspecting hosts on the internet. |
| 309 | + |
| 310 | +- In combination with QUIC-MULTIPATH we get a very robust and flexible system, |
| 311 | + allowing us to always schedule packets on the best possible path. |
| 312 | + Timely reacting to path changes and restarting holepunching. |
| 313 | + |
| 314 | +Another consideration is that QUIC is already extensible. |
| 315 | +Notice that both QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL are negotiated at connection setup. |
| 316 | +This is a robust mechanism that allows us to be confident that in the future we'll be able to improve on these mechanisms. |
| 317 | + |
| 318 | + |
| 319 | +# Work In Progress |
| 320 | + |
| 321 | +Integrating QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL into iroh changes the wire-protocol. |
| 322 | +That is part of the reason we want this done before our 1.0 release: |
| 323 | +once we release this we promise to keep our wire-protocol backwards compatible. |
| 324 | +Right now we're hard at work building the pieces needed to make these improvements. |
| 325 | +And sometime soon-ish they will start landing in the 0.9x releases. |
| 326 | + |
| 327 | +We aim for iroh to become even more reliable for folks who push the limits, |
| 328 | +thanks to moving all the holepunching logic right into the QUIC stack. |
0 commit comments