-
Notifications
You must be signed in to change notification settings - Fork 5
docs: Move memory tuning to an advanced topic #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
❌ Deploy Preview for seqera-docs failed. Why did it fail? →
|
Signed-off-by: Justine Geffen <[email protected]>
Added creation date and tags to the JVM memory tuning documentation. Signed-off-by: Justine Geffen <[email protected]>
Add creation date and tags for JVM memory tuning documentation Signed-off-by: Justine Geffen <[email protected]>
Add creation date and tags to JVM memory tuning documentation Signed-off-by: Justine Geffen <[email protected]>
Added creation date and tags to JVM memory tuning documentation. Signed-off-by: Justine Geffen <[email protected]>
Added creation date and tags to JVM memory tuning documentation. Signed-off-by: Justine Geffen <[email protected]>
gwright99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments and first thoughts within. Happy to discuss further async.
| "enterprise/advanced-topics/custom-launch-container", | ||
| "enterprise/advanced-topics/firewall-configuration", | ||
| "enterprise/advanced-topics/seqera-container-images", | ||
| "enterprise/advanced-topics/content-security-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the CSP link only starts in v25.2 docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSP was not configurable before then and was added for Studios support.
| @@ -0,0 +1,66 @@ | |||
| --- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have multiple copies of the same text in various versions, is it possible to make these pages DRY and link back to platform-enterprise_docs/enterprise/advanced-topics/jvm-memory-tuning.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly that is a limitation of Docusaurus.
platform-enterprise_docs/enterprise/advanced-topics/jvm-memory-tuning.md is not published and instead when you cut a version the specific versioned docs need the same page.
This duplication is more due to backporting.
platform-enterprise_docs/enterprise/advanced-topics/jvm-memory-tuning.md
Show resolved
Hide resolved
| JVM memory tuning is an advanced topic that may cause instability and performance issues. | ||
| ::: | ||
|
|
||
| Seqera Platform scales memory allocation based on resources allocated to the application. To best inform available memory, set memory requests and limits on your deployments. We recommend increasing memory allocation before manually configuring JVM settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"increasing memory allocation" -- I assume this means requests / limits in the K8s manifests? Vertical scaling on a docker compose node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to docker compose as-well.
We should be setting these values
backend:
image: cr.seqera.io/private/nf-tower-enterprise/backend:v25.3.0
platform: linux/amd64
command: -c '/wait-for-it.sh db:3306 -t 60; /tower.sh'
networks:
- frontend
- backend
expose:
- 8080
deploy:
resources:
limits:
memory: 4G # <---- Limit
reservations:
memory: 2G # <---- Reservations
restart: always
depends_on:
- db
- redis
- cron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently defined in the docker-compose template (maybe we should add it so things are aligned?)
# https://docs.seqera.io/assets/files/docker-compose-0655848af8f21b6e6211d1a9c8ebc702.yml
backend:
image: cr.seqera.io/private/nf-tower-enterprise/backend:v25.3.0
platform: linux/amd64
command: -c '/wait-for-it.sh db:3306 -t 60; /tower.sh'
networks:
- frontend
- backend
expose:
- 8080
volumes:
- $PWD/tower.yml:/tower.yml
# Data studios RSA key is required for the data studios functionality. Uncomment the line below to mount the key.
#- $PWD/data-studios-rsa.pem:/data-studios-rsa.pem
env_file:
# Seqera environment variables — see https://docs.seqera.io/platform-enterprise/enterprise/configuration/overview for details
- tower.env
environment:
# Micronaut environments are required. Do not edit these values
- MICRONAUT_ENVIRONMENTS=prod,redis,ha
restart: always
depends_on:
- db
- redis
- cron| JVM memory tuning is an advanced topic that may cause instability and performance issues. | ||
| ::: | ||
|
|
||
| Seqera Platform scales memory allocation based on resources allocated to the application. To best inform available memory, set memory requests and limits on your deployments. We recommend increasing memory allocation before manually configuring JVM settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario when we expect a client would need to start tinkering with the JVM settings? When is it? How would it be identified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this is answered by #954 where both of these will need revisions.
JVM monitoring links back to overall system monitoring.
| | 3 | 8 GB | 5 GB | 1.5 GB | `-XX:ActiveProcessorCount=3 -Xms2000M -Xmx5000M -XX:MaxDirectMemorySize=1500m` | | ||
| | 3 | 16 GB | 11 GB | 2.5 GB | `-XX:ActiveProcessorCount=3 -Xms4000M -Xmx11000M -XX:MaxDirectMemorySize=2500m` | | ||
|
|
||
| ## When to adjust memory settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it's at the bottom. Based on my questions, I think this would be more useful nearer to the top.
| **Increase heap memory (`-Xmx`)** if you see: | ||
|
|
||
| - `OutOfMemoryError: Java heap space` errors in logs | ||
| - Garbage collection pauses affecting performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we expecting these metrics to be visible. IIRC you dont get memory metrics on the standards EC2 monitoring package. Do we expect the client to upgrade their monitoring system / be using an aggregating agent like Datadog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above they would need to be monitoring via prometheus / or another agent and monitoring JVM stats.
|
|
||
| **Increase direct memory (`MaxDirectMemorySize`)** if you see: | ||
|
|
||
| - `OutOfMemoryError: Direct buffer memory` errors in logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increase relative to what?
- Grant more memory at the expense of heap?
- Grant more memory at the expense of overhead?
- Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be at the expense of the other.
There is a typical expected ratio of heap vs direct memory.
If the heap is hitting 100% useage that can be scaled on it's own you can then review your direct memory usage and opt to reduce if you have overhead or increase memory allocated to the pod.
|
|
||
| - `OutOfMemoryError: Direct buffer memory` errors in logs | ||
| - High concurrent workflow launch rates (more than 100 simultaneous workflows) | ||
| - Large configuration payloads or extensive API usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Large" ==?
"Extensive" == ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to drop these
| **Increase direct memory (`MaxDirectMemorySize`)** if you see: | ||
|
|
||
| - `OutOfMemoryError: Direct buffer memory` errors in logs | ||
| - High concurrent workflow launch rates (more than 100 simultaneous workflows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 100 a known pain point when using the default options or was this just chosen because it's a nice number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes workflows is bad here as it's not nextflow it's Java task allocation related.
Follow-up to #891 which moves the JVM memory tuning configuration to a dedicated page under Advanced topics.
Manually configuring JVM settings can have adverse consequences and should only be done based on observed performance issues. Specifying JVM parameters on deployments by default can negatively impact customers and we should rely on the frameworks default memory management systems.
I am following up with a detailed metrics observation guidance to help give customers greater insight into their application performance.