Skip to content

Commit 9db8e2e

Browse files
committed
Update changelog for v2.4.0
Signed-off-by: Francesco De Martino <[email protected]>
1 parent 291f81a commit 9db8e2e

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,39 @@ aws-parallelcluster-node CHANGELOG
33

44
This file is used to list changes made in each version of the aws-parallelcluster-node package.
55

6+
2.4.0
7+
-----
8+
9+
**ENHANCEMENTS**
10+
- Dynamically fetch compute instance type and cluster size in order to support updates
11+
- SGE:
12+
- process nodes added to or removed from the cluster in batches in order to speed up cluster scaling.
13+
- scale up only if required slots/nodes can be satisfied
14+
- scale down if pending jobs have unsatisfiable CPU/nodes requirements
15+
- add support for jobs in hold/suspended state (this includes job dependencies)
16+
- automatically terminate and replace faulty or unresponsive compute nodes
17+
- add retries in case of failures when adding or removing nodes
18+
- Slurm:
19+
- scale up only if required slots/nodes can be satisfied
20+
- scale down if pending jobs have unsatisfiable CPU/nodes requirements
21+
- automatically terminate and replace faulty or unresponsive compute nodes
22+
- Dump logs of replaced failing compute nodes to shared home directory
23+
24+
**CHANGES**
25+
- SQS messages that fail to be processed are re-queued only 3 times and not forever
26+
- Reset idletime to 0 when the host becomes essential for the cluster (because of min size of ASG or because there are
27+
pending jobs in the scheduler queue)
28+
- SGE: a node is considered as busy when in one of the following states "u", "C", "s", "d", "D", "E", "P", "o".
29+
This allows a quick replacement of the node without waiting for the `nodewatcher` to terminate it.
30+
31+
**BUG FIXES**
32+
- Slurm: add "BeginTime", "NodeDown", "Priority" and "ReqNodeNotAvail" to the pending reasons that trigger
33+
a cluster scaling
34+
- Add a timeout on remote commands execution so that the daemons are not stuck if the compute node is unresponsive
35+
- Fix an edge case that was causing the `nodewatcher` to hang forever in case the node had become essential to the
36+
cluster during a call to `self_terminate`.
37+
38+
639
2.3.1
740
-----
841

0 commit comments

Comments
 (0)