Skip to content

Conversation

@philipch07
Copy link
Contributor

@philipch07 philipch07 commented Oct 4, 2025

Description

Adds RACK to SCTP (see the paper that this is based on here: https://icnp20.cs.ucr.edu/proceedings/nipaa/RACK%20for%20SCTP.pdf)

I would also like to add the variable burst mitigation as described in the above paper (section 5a), but that will be done in a separate PR as it doesn't require RACK to be implemented.

Reference issue

Closes #389

Upd: expanded version of the paper above: https://duepublico2.uni-due.de/servlets/MCRFileNodeServlet/duepublico_derivate_00073893/Diss_Weinrank.pdf#page=97

@codecov
Copy link

codecov bot commented Oct 4, 2025

Codecov Report

❌ Patch coverage is 84.72222% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.83%. Comparing base (b1a66a4) to head (b2c3746).

Files with missing lines Patch % Lines
association.go 83.38% 37 Missing and 18 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #390      +/-   ##
==========================================
+ Coverage   83.74%   83.83%   +0.08%     
==========================================
  Files          51       52       +1     
  Lines        4527     3779     -748     
==========================================
- Hits         3791     3168     -623     
+ Misses        597      453     -144     
- Partials      139      158      +19     
Flag Coverage Δ
go 83.83% <84.72%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@philipch07
Copy link
Contributor Author

@enobufs Hi, I think you've previously tried to implement RACK for SCTP, would you happen to have any pointers for whether my current work looks like it's on the right track, or any notes/comments on things that you had in mind? If you also have a branch for me to use as reference if you've worked on it before then that would also be very appreciated!

@philipch07 philipch07 force-pushed the pch07/rack-sctp branch 2 times, most recently from eb3bd88 to 686e5ec Compare October 7, 2025 01:23
@philipch07 philipch07 marked this pull request as ready for review October 7, 2025 01:50
@philipch07
Copy link
Contributor Author

To test this I used this branch's SCTP version in the following examples:

Each of the examples (and the listed versions) worked as the behavior matched the expected behavior, so I think it's looking good so far :)

@Sean-Der
Copy link
Member

@philipch07 this is beyond amazing. You are doing something phenomenal and making it look easy :)

Anything I can do to help? In the back of your head think about writing a blogpost.

  • You get the credit you deserve
  • Its gets people excited/teaches the next generation about why this stuff is fun :)

@JoeTurki JoeTurki requested review from at-wat and sirzooro October 22, 2025 19:58
@philipch07 philipch07 force-pushed the pch07/rack-sctp branch 4 times, most recently from 6c3e8ba to 3914804 Compare October 22, 2025 21:10
@philipch07
Copy link
Contributor Author

Thank you @Sean-Der! I think the only thing I need right now regarding this PR is more reviews and testing (ideally in real applications with some latency). I'm excited but also cautious about this since it seems to be a pretty desired feature!

The next thing I have in mind at the moment for RACK would be #394 (adaptive burst mitigation) which only further improves RACK's performance. I hope that the two features would make for a compelling upgrade for users :)

I'm also brainstorming some ideas for the blog post as you've mentioned but it may be tied closer to open source contributing as a whole instead of just about RACK (especially since this type of feature is a bit less flashy). With some time and luck, I think I can create an interesting narrative about my motives, interests, and journey through the contributions that I've made so far!

@philipch07
Copy link
Contributor Author

Also I performed a short test with Linux's traffic control in my WSL setup to simulate 500ms latency for outbound traffic (so anything going from server -> client would be delayed) using the data-channels example from webrtc (latest version, so v4.1.6) and it didn't drop any messages in either direction, even when I sent a burst of messages to the server (all of which were instant), and with intermittent messages to the server.

@ValorZard
Copy link

@ValorZard
Copy link

also this is off topic, but @philipch07 are you in the discord server? I'd like to talk you more casually about SCTP/datachannel stuff.

@Sean-Der
Copy link
Member

Sean-Der commented Oct 23, 2025

this type of feature is a bit less flashy

This is the flashiest thing that could happen. If you can put up a graph that shows 'Under these network conditions file transfers happen X faster' and call out stuff that use DataChannels + Go (WebTorrent and Tor Snowflake) that is a huge thing.

I would love to introduce you to companies with a flashy intro of 'This is the person that made WebRTC Datachannels n% faster'. This and ICE Renomination that @kevmo314 working on I think have biggest change of just instantly making peoples WebRTC sessions better with no effort on their side.

@philipch07
Copy link
Contributor Author

@philipch07 Could you test this with https://github.com/pion/example-webrtc-applications/tree/master/ebiten-game

Just finished testing this and confirmed that this works! I didn't test it with added latency though.

This is the flashiest thing that could happen. If you can put up a graph that shows 'Under these network conditions file transfers happen X faster' and call out stuff that use DataChannels + Go (WebTorrent and Tor Snowflake) that is a huge thing.

That's a fantastic idea, that makes me think of creating some sort of visual simulation to display the impact of RACK vs non-RACK implementation in practice!

I would love to introduce you to companies with a flashy intro of 'This is the person that made WebRTC Datachannels n% faster'. This and ICE Renomination that @kevmo314 working on I think have biggest change of just instantly making peoples WebRTC sessions better with no effort on their side.

That makes sense to me, and I agree that "free" performance with 0 changes/settings toggled is always a win for a user :)

@philipch07
Copy link
Contributor Author

Whoops forgot to @Sean-Der in the above comment^

@kevmo314
Copy link

Chiming in to say that this is really cool! The paper is really interesting, so cool to see this in pion.

Copy link
Member

@JoeTurki JoeTurki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good, I reviewed it over days, there are no obvious issues in the code logic.
Really excited to see the benchmark.
Thank you a lot <3

@philipch07 philipch07 force-pushed the pch07/rack-sctp branch 4 times, most recently from 8fc1911 to d7f2b58 Compare October 23, 2025 22:53
@JoeTurki
Copy link
Member

Going from low RTT to high RTT causes the packets to drop, but that's fixed after commit #244b97f

It needed to go to extreme values tho, and it was more of a manual test, I'll try to get it as a CI test, but It's very hard to make in a way that's not flaky and under ~20s

before:

--- FAIL: TestRACK_RTTSwitch_Regression (41.00s)
    vnet_test.go:887: 
                Error Trace:    /home/joe/work/pion/sctp/vnet_test.go:887
                Error:          regression
                Test:           TestRACK_RTTSwitch_Regression
                Messages:       missing echoes: got=61 want=200 missing=[61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199]
FAIL
FAIL    github.com/pion/sctp    41.006s
FAIL

After:

[joe@nixos:~/work/pion/sctp]$ go test -run ^TestRACK_RTTSwitch_Regression$ github.com/pion/sctp -count 1
ok  	github.com/pion/sctp	36.082s

Thank you.

@JoeTurki
Copy link
Member

Also there are many network conditions that makes the current sctp implementation drop packets but this implementation handles it so well. really amazing work.

Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me! Added some minor comments

@philipch07 philipch07 force-pushed the pch07/rack-sctp branch 2 times, most recently from 570c8e0 to eee6720 Compare October 28, 2025 20:49
Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@philipch07
Copy link
Contributor Author

@sirzooro Let me know if you're interested in reviewing this PR before I merge it and if you have any eta if so!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements RACK (Recent ACKnowledgement) loss detection and Tail Loss Probe (TLP) mechanisms for SCTP, providing more efficient and responsive loss recovery compared to traditional timeout-based retransmission. RACK is enabled by default but can be disabled via configuration.

Key changes:

  • Adds windowedMin data structure to track minimum RTT over a sliding window
  • Implements RACK state tracking and loss detection logic in Association
  • Introduces unified timer system for RACK and PTO deadlines
  • Updates test assertions to accommodate RACK-based recovery patterns

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
windowedmin.go New data structure implementing sliding window minimum for RTT tracking
windowedmin_test.go Comprehensive tests for windowedMin implementation
association.go Core RACK/TLP implementation including state variables, timer management, and loss detection logic
association_test.go New unit tests for RACK behavior and updated congestion control test expectations
vnet_test.go Uncommented assertions to verify RACK reduces fast retransmits

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@philipch07
Copy link
Contributor Author

During some SCP testing, @JoeTurki found that the CPU utilization was a little higher than expected. Luckily there's in RFC 8985 sec 6.2 there's a section that I originally overlooked which addresses the concern:

As an optimization, an implementation can choose to check only segments that have been sent before RACK.xmit_ts. This can be more efficient than scanning the entire SACK scoreboard, especially when there are many segments in flight. The implementation can use a separate doubly linked list ordered by Segment.xmit_ts, insert a segment at the tail of the list when it is (re)transmitted, and remove a segment from the list when it is delivered or marked as lost. In Linux TCP, this optimization improved CPU usage by orders of magnitude during some fast recovery episodes on high-speed WAN networks.

The new commits address the issue and will be tested to see whether the more involved fix is worth it over the simple fix regarding CPU usage among other stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add RACK to SCTP

8 participants