-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Description
Memory leaks during SSG
I investigated SSG performance and memory usage on a large 11k-docs site (see #11140) with SSG worker threads enabled (#10826).
To remember everything, here are some relevant notes.
Related issue for memory leaks on i18n sites: #10944
About Worker Threads
-
It seems that you can't easily specify a custom amount of memory for your Node.js workers: if the main process is given 5gb, all workers will be given that same amount:
(see also question: https://x.com/sebastienlorber/status/1920746219434828281) -
Our thread pool (Tinypool, fork of Piscina)
memoryLimitsoption won't work. However it has an interestingmaxMemoryLimitBeforeRecycle: 1_000_000_000option that permits the thread to respawn when taking too much memory. Even though it looks more like a workaround, it's still convenient and permits to contain memory that otherwise leaks.
EDIT: option exposed in #11166
About SSG memory leaks
When using workers, the current implementation doesn't clear the Node.js SSG require() cache. When using the current process, it calls it, but at the very end of the process only.
For very large sites with thousands of pages to SSG, this means that memory is likely to keep increasing over time, and we should also try to clear it periodically.
When attempting to do so, I noticed that despite clearing the require cache often, the memory still increases at each SSG render task. It turns out that the webpack runtime also has its own module cache system, and holds __webpack_modules__ and __webpack_module_cache__ caches that keep growing over time. We should find a way to free that memory periodically if we want to allow Docusaurus SSG to complete under constrained memory limits and avoid a spike. Note: this is not really a leak because this memory will likely be released at the end of the SSG process, but we'd still like to reduce the memory spike.
Until we solve that, using the thread pool maxMemoryLimitBeforeRecycle is a good workaround.
About SSG collected data memory usage
It is not really a leak, but we aggregate too much data during the SSG process, and memory of the main thread keeps growing during the SSG phase.
It stops growing significantly if we do:
appRenderResult.collectedData.anchors = [];
appRenderResult.collectedData.links = [];
appRenderResult.collectedData.modules = [];
// @ts-expect-error: test
appRenderResult.collectedData.metadata.internal = null;
appRenderResult.collectedData.metadata.helmet = null;There are some data that we definitively want to keep (otherwise the broken link checker wouldn't work), but modules and metadata.internal are heavy data structures and we don't use this data anywhere after the SSG completes so it's useless to collect these.
Edit: addressed in #11162