-
Notifications
You must be signed in to change notification settings - Fork 1.2k
device: Allow buffer memory growth to be limited at run time #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The infinite memory growth allowed by the default PreallocatedBuffersPerPool setting causes processes to be oom-killed on low memory devices. This occurs even when a soft limit is set with GOMEMLIMIT. Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM) will exhaust all memory and be oom-killed when put under heavy load. Allowing this value to be overwritten as is done in the iOS build will allow tuning to cap memory expansion and prevent oom-kill. see tailscale issue thread for further info: tailscale/tailscale#7272 Signed-off-by: Seth Lankford <[email protected]>
Thank you for your patience. I'm sure there are other ways to solve this issue (specific low-mem builds, etc). |
Not sure it's a good idea to add random knobs like this for third parties to twiddle and trip over. I wonder if there's some heuristic that could be used instead, which would always work and dynamically scale accordingly? ratelimiter.c in the kernel does this, for example, with something pretty kludgy but it does work: int wg_ratelimiter_init(void)
{
mutex_lock(&init_lock);
if (++init_refcnt != 1)
goto out;
entry_cache = KMEM_CACHE(ratelimiter_entry, 0);
if (!entry_cache)
goto err;
/* xt_hashlimit.c uses a slightly different algorithm for ratelimiting,
* but what it shares in common is that it uses a massive hashtable. So,
* we borrow their wisdom about good table sizes on different systems
* dependent on RAM. This calculation here comes from there.
*/
table_size = (totalram_pages() > (1U << 30) / PAGE_SIZE) ? 8192 :
max_t(unsigned long, 16, roundup_pow_of_two(
(totalram_pages() << PAGE_SHIFT) /
(1U << 14) / sizeof(struct hlist_head)));
max_entries = table_size * 8;
table_v4 = kvcalloc(table_size, sizeof(*table_v4), GFP_KERNEL);
if (unlikely(!table_v4))
goto err_kmemcache;
#if IS_ENABLED(CONFIG_IPV6)
table_v6 = kvcalloc(table_size, sizeof(*table_v6), GFP_KERNEL);
if (unlikely(!table_v6)) {
kvfree(table_v4);
goto err_kmemcache;
}
#endif |
Thanks @zx2c4 - I am happy to experiment with that sort of dynamic scaling, but will be less confident about being able to do it up to your standards given my current limited GO experience. Perhaps that could be a future workstream? In my experiments with different values, I found that when the The other thing to note is that the router I'm running this on is dedicated to running this VPN, so the memory usage is otherwise extremely stable. That is probably the the most common setup for devices of this size (also: noswap). |
Any news on this? It looks like someone had the issue as well over here: qdm12/gluetun#2036 |
Any ETA when will this be merged or the other solution mentioned by @zx2c4 implemented? |
5ba9663
to
c92064f
Compare
hitting this as well. is there a mitigation? anyone use compile flags to set that value during compilation time? |
@zx2c4 this is actually a performance issue that is quite significant. not just in memory but in CPU usage due to the GC cycles high concurrent code triggers. essentially without the limit the code will generate giant spikes in allocations causing significant gc behavior. I suggest the default be changed to 4096 for everything currently at 0 at a minimum. the below files are pprof, just named them .txt to get past github silly file filtter. updated to = 4096: |
for what its worth I agree knobs are annoying and a heuristic would be better, but leaving this as is is painful. this is a very simple change that would allow developers to tune usage until a heuristic is implemented. |
I think @bradfitz was looking at some major refactoring that would make this more natural? |
@zx2c4 which is fine but this is over 2 year(s) old. shrug this minor of a fix is better than the fact the library torches any highly concurrent application with no generally applicable way to resolve it by developers. we're not talking peanuts we're talking orders of magnitude in performance differences. |
to be clear, from my perspective this is an easy resolution now that makes the problem runtime/compile time adjustable by developers with no real impact on any future refactoring vs an unknown future date for a 'major refactor' as much as I respect @bradfitz as a developer a bird in hand is worth two in the bush. |
for reference numbers on a 32 core / 124GB machine: PreallocatedBuffersPerPool = 0: PreallocatedBuffersPerPool = 4096: with no observable change in the applications network throughput (though I assume thats mostly due to a bottle neck in the application itself) |
Patch wireguard-go to workaround ongoing issue related to pods with low memory limits OOMing due to wireguard-go buffer/gc issues. Related to WireGuard/wireguard-go#69
The infinite memory growth allowed by the default PreallocatedBuffersPerPool setting causes processes to be oom-killed on low memory devices. This occurs even when a soft limit is set with GOMEMLIMIT. Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM) will exhaust all memory and be oom-killed when put under heavy load. Allowing this value to be overwritten as is done in the iOS build will allow tuning to cap memory expansion and prevent oom-kill.
see tailscale issue thread for further info:
tailscale/tailscale#7272