AWS ParallelCluster v2.4.1
We're excited to announce the release of AWS ParallelCluster Node 2.4.1.
This is associated with AWS ParallelCluster v2.4.1.
Enhancements
- Torque:
- process nodes added to or removed from the cluster in batches in order to speed up cluster scaling.
- scale up only if required slots/nodes can be satisfied
- scale down if pending jobs have unsatisfiable CPU/nodes requirements
- add support for jobs in hold/suspended state (this includes job dependencies)
- automatically terminate and replace faulty or unresponsive compute nodes
- add retries in case of failures when adding or removing nodes
- add support for ncpus reservation and multi nodes resource allocation (e.g. -l nodes=2:ppn=3+3:ppn=6)
Changes
- Drop support for Python 2. Node daemons now support Python >= 3.5.
- Torque: trigger a scheduling cycle every 1 minute when there are pending jobs in the queue. This is done in order
to speed up jobs scheduling with a dynamic cluster size.
Bug Fixes
- Restore logic that was automatically adding compute nodes identity to known_hosts file.
- Slurm: fix issue that was causing the daemons to fail when the cluster is stopped and an empty compute nodes file
is imported in Slurm config. - Torque: fix command to disable hosts in the scheduler before termination.
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192