-
Notifications
You must be signed in to change notification settings - Fork 387
Description
Describe your feature request
We've noticed that mounting very large memory segments (e.g., at the TB scale) can take several minutes, which hinders practical deployment.
The overhead mainly comes from two parts:
-
Actual physical allocation, where memset (or equivalent) might benefit from parallelization to speed it up.
Registration with RDMA — this could be accelerated by using pre-allocated large pages (e.g., 2MB or 1GB page sizes), which might also improve runtime performance. -
Using large pages and parallel init could significantly reduce mounting time for such large segments.
Suggestions :
- Consider supporting libhugetlbfs or explicit mmap with hugepage hints.
- Maybe add a config flag to enable parallel zero-initialization for big segments.
Also, if we’re adding more environment variables, it feels like we’re piling up too many already — makes the user experience pretty clunky. Maybe it’s time we support a builder pattern in the pyclient for more readable and maintainable config construction?
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation