device: Allow buffer memory growth to be limited at run time #69

gitlankford · 2023-02-26T15:19:55Z

The infinite memory growth allowed by the default PreallocatedBuffersPerPool setting causes processes to be oom-killed on low memory devices. This occurs even when a soft limit is set with GOMEMLIMIT. Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM) will exhaust all memory and be oom-killed when put under heavy load. Allowing this value to be overwritten as is done in the iOS build will allow tuning to cap memory expansion and prevent oom-kill.

see tailscale issue thread for further info:
tailscale/tailscale#7272

The infinite memory growth allowed by the default PreallocatedBuffersPerPool setting causes processes to be oom-killed on low memory devices. This occurs even when a soft limit is set with GOMEMLIMIT. Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM) will exhaust all memory and be oom-killed when put under heavy load. Allowing this value to be overwritten as is done in the iOS build will allow tuning to cap memory expansion and prevent oom-kill. see tailscale issue thread for further info: tailscale/tailscale#7272 Signed-off-by: Seth Lankford <[email protected]>

gitlankford · 2023-02-26T15:25:37Z

Thank you for your patience.

I'm sure there are other ways to solve this issue (specific low-mem builds, etc).
This method follows the prior art of the iOS build.

zx2c4 · 2023-03-03T13:31:00Z

Not sure it's a good idea to add random knobs like this for third parties to twiddle and trip over. I wonder if there's some heuristic that could be used instead, which would always work and dynamically scale accordingly? ratelimiter.c in the kernel does this, for example, with something pretty kludgy but it does work:

int wg_ratelimiter_init(void)
{
        mutex_lock(&init_lock);
        if (++init_refcnt != 1)
                goto out;

        entry_cache = KMEM_CACHE(ratelimiter_entry, 0);
        if (!entry_cache)
                goto err;

        /* xt_hashlimit.c uses a slightly different algorithm for ratelimiting,
         * but what it shares in common is that it uses a massive hashtable. So,
         * we borrow their wisdom about good table sizes on different systems
         * dependent on RAM. This calculation here comes from there.
         */
        table_size = (totalram_pages() > (1U << 30) / PAGE_SIZE) ? 8192 :
                max_t(unsigned long, 16, roundup_pow_of_two(
                        (totalram_pages() << PAGE_SHIFT) /
                        (1U << 14) / sizeof(struct hlist_head)));
        max_entries = table_size * 8;

        table_v4 = kvcalloc(table_size, sizeof(*table_v4), GFP_KERNEL);
        if (unlikely(!table_v4))
                goto err_kmemcache;

#if IS_ENABLED(CONFIG_IPV6)
        table_v6 = kvcalloc(table_size, sizeof(*table_v6), GFP_KERNEL);
        if (unlikely(!table_v6)) {
                kvfree(table_v4);
                goto err_kmemcache;
        }
#endif

gitlankford · 2023-03-03T14:57:24Z

Thanks @zx2c4 - I am happy to experiment with that sort of dynamic scaling, but will be less confident about being able to do it up to your standards given my current limited GO experience. Perhaps that could be a future workstream?

In my experiments with different values, I found that when the PreallocatedBuffersPerPool was too high, the GC would be working overtime until oom-kill, and when it was "just right" the GC would only run every 2 minutes in the forced GC run. (an indirect measurement of the memory situation)

The other thing to note is that the router I'm running this on is dedicated to running this VPN, so the memory usage is otherwise extremely stable. That is probably the the most common setup for devices of this size (also: noswap).

qdm12 · 2024-05-10T14:41:58Z

Any news on this? It looks like someone had the issue as well over here: qdm12/gluetun#2036

shaheerem · 2024-12-05T12:11:20Z

Any ETA when will this be merged or the other solution mentioned by @zx2c4 implemented?
This increased our memory consumption for 10.5k peers upto 18GB, whereas after setting the buffer size to 4096 (as it is also being set in the android version) it went down to 2GB.

james-lawrence · 2025-07-04T04:25:44Z

hitting this as well. is there a mitigation? anyone use compile flags to set that value during compilation time?

james-lawrence · 2025-07-04T12:24:36Z

@zx2c4 this is actually a performance issue that is quite significant. not just in memory but in CPU usage due to the GC cycles high concurrent code triggers. essentially without the limit the code will generate giant spikes in allocations causing significant gc behavior. I suggest the default be changed to 4096 for everything currently at 0 at a minimum.

the below files are pprof, just named them .txt to get past github silly file filtter.
current = 0:

profile.pprof.txt

updated to = 4096:

mem.pprof.txt

cpu.pprof.txt

james-lawrence · 2025-07-04T15:51:52Z

for what its worth I agree knobs are annoying and a heuristic would be better, but leaving this as is is painful. this is a very simple change that would allow developers to tune usage until a heuristic is implemented.

zx2c4 · 2025-07-04T16:06:41Z

I think @bradfitz was looking at some major refactoring that would make this more natural?

james-lawrence · 2025-07-04T16:31:16Z

@zx2c4 which is fine but this is over 2 year(s) old. shrug this minor of a fix is better than the fact the library torches any highly concurrent application with no generally applicable way to resolve it by developers. we're not talking peanuts we're talking orders of magnitude in performance differences.

james-lawrence · 2025-07-04T16:36:51Z

to be clear, from my perspective this is an easy resolution now that makes the problem runtime/compile time adjustable by developers with no real impact on any future refactoring vs an unknown future date for a 'major refactor' as much as I respect @bradfitz as a developer a bird in hand is worth two in the bush.

james-lawrence · 2025-07-04T16:47:06Z

for reference numbers on a 32 core / 124GB machine:

PreallocatedBuffersPerPool = 0:
RAM: >20GB
Cores: maxed out.

PreallocatedBuffersPerPool = 4096:
RAM: <2GB
Cores: <30% util

with no observable change in the applications network throughput (though I assume thats mostly due to a bottle neck in the application itself)

Patch wireguard-go to workaround ongoing issue related to pods with low memory limits OOMing due to wireguard-go buffer/gc issues. Related to WireGuard/wireguard-go#69

rasooll approved these changes Feb 26, 2023

View reviewed changes

zx2c4-bot force-pushed the master branch from 787da64 to f41f474 Compare March 10, 2023 13:53

zx2c4-bot force-pushed the master branch from d3cb5bd to 6f895be Compare March 24, 2023 16:05

Merge branch 'WireGuard:master' into master

f35045b

luker983 mentioned this pull request Jul 6, 2023

High Memory usage per TCP connection sandialabs/wiretap#16

Closed

qdm12 mentioned this pull request May 10, 2024

Bug: wireguard-go unbounded memory usage qdm12/gluetun#2036

Open

gitlankford mentioned this pull request May 11, 2024

High memory usage, when reading large file over NFS tailscale/tailscale#11382

Open

Merge branch 'WireGuard:master' into master

ef1a626

lmagyar mentioned this pull request Feb 26, 2025

Failscale keeps stops after a few hours hassio-addons/addon-tailscale#461

Closed

zx2c4-bot force-pushed the master branch from a08e667 to 436f7fd Compare May 5, 2025 13:10

zx2c4-bot force-pushed the master branch 7 times, most recently from 5ba9663 to c92064f Compare May 21, 2025 23:44

Merge branch 'WireGuard:master' into master

600c2d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

device: Allow buffer memory growth to be limited at run time #69

device: Allow buffer memory growth to be limited at run time #69

Uh oh!

gitlankford commented Feb 26, 2023

Uh oh!

gitlankford commented Feb 26, 2023

Uh oh!

zx2c4 commented Mar 3, 2023

Uh oh!

gitlankford commented Mar 3, 2023

Uh oh!

qdm12 commented May 10, 2024

Uh oh!

shaheerem commented Dec 5, 2024

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025 •

edited

Loading

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

zx2c4 commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025 •

edited

Loading

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

Uh oh!

device: Allow buffer memory growth to be limited at run time #69

Are you sure you want to change the base?

device: Allow buffer memory growth to be limited at run time #69

Uh oh!

Conversation

gitlankford commented Feb 26, 2023

Uh oh!

gitlankford commented Feb 26, 2023

Uh oh!

zx2c4 commented Mar 3, 2023

Uh oh!

gitlankford commented Mar 3, 2023

Uh oh!

qdm12 commented May 10, 2024

Uh oh!

shaheerem commented Dec 5, 2024

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

zx2c4 commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

james-lawrence commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

james-lawrence commented Jul 4, 2025

Uh oh!

Uh oh!

james-lawrence commented Jul 4, 2025 •

edited

Loading

james-lawrence commented Jul 4, 2025 •

edited

Loading