File tree Expand file tree Collapse file tree 1 file changed +23
-0
lines changed Expand file tree Collapse file tree 1 file changed +23
-0
lines changed Original file line number Diff line number Diff line change @@ -3,6 +3,29 @@ aws-parallelcluster-node CHANGELOG
3
3
4
4
This file is used to list changes made in each version of the aws-parallelcluster-node package.
5
5
6
+ 2.6.1
7
+ -----
8
+
9
+ ** ENHANCEMENTS**
10
+ - Improved the management of SQS messages and retries to speed-up recovery times when failures occur.
11
+
12
+ ** CHANGES**
13
+ - Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower
14
+ at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected
15
+ capacity.
16
+ - Increase parallelism when starting ` slurmd ` on compute nodes that join the cluster from 10 to 30.
17
+ - Reduce the verbosity of messages logged by the node daemons.
18
+ - Do not dump logs to ` /home/logs ` when nodewatcher encounters a failure and terminates the node. CloudWatch can be
19
+ used to debug such failures.
20
+ - Reduce the number of retries for failed REMOVE events in sqswatcher.
21
+
22
+ ** BUG FIXES**
23
+ - Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load,
24
+ the scheduler configuration to be left in an inconsistent state.
25
+ - Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part
26
+ of the same ` sqswatcher ` iteration.
27
+
28
+
6
29
2.6.0
7
30
-----
8
31
You can’t perform that action at this time.
0 commit comments