@@ -3,6 +3,39 @@ aws-parallelcluster-node CHANGELOG
3
3
4
4
This file is used to list changes made in each version of the aws-parallelcluster-node package.
5
5
6
+ 2.4.0
7
+ -----
8
+
9
+ ** ENHANCEMENTS**
10
+ - Dynamically fetch compute instance type and cluster size in order to support updates
11
+ - SGE:
12
+ - process nodes added to or removed from the cluster in batches in order to speed up cluster scaling.
13
+ - scale up only if required slots/nodes can be satisfied
14
+ - scale down if pending jobs have unsatisfiable CPU/nodes requirements
15
+ - add support for jobs in hold/suspended state (this includes job dependencies)
16
+ - automatically terminate and replace faulty or unresponsive compute nodes
17
+ - add retries in case of failures when adding or removing nodes
18
+ - Slurm:
19
+ - scale up only if required slots/nodes can be satisfied
20
+ - scale down if pending jobs have unsatisfiable CPU/nodes requirements
21
+ - automatically terminate and replace faulty or unresponsive compute nodes
22
+ - Dump logs of replaced failing compute nodes to shared home directory
23
+
24
+ ** CHANGES**
25
+ - SQS messages that fail to be processed are re-queued only 3 times and not forever
26
+ - Reset idletime to 0 when the host becomes essential for the cluster (because of min size of ASG or because there are
27
+ pending jobs in the scheduler queue)
28
+ - SGE: a node is considered as busy when in one of the following states "u", "C", "s", "d", "D", "E", "P", "o".
29
+ This allows a quick replacement of the node without waiting for the ` nodewatcher ` to terminate it.
30
+
31
+ ** BUG FIXES**
32
+ - Slurm: add "BeginTime", "NodeDown", "Priority" and "ReqNodeNotAvail" to the pending reasons that trigger
33
+ a cluster scaling
34
+ - Add a timeout on remote commands execution so that the daemons are not stuck if the compute node is unresponsive
35
+ - Fix an edge case that was causing the ` nodewatcher ` to hang forever in case the node had become essential to the
36
+ cluster during a call to ` self_terminate ` .
37
+
38
+
6
39
2.3.1
7
40
-----
8
41
0 commit comments