Skip to content

Commit a35f872

Browse files
demartinofratilne
authored andcommitted
Changelog v2.6.1
Signed-off-by: Francesco De Martino <[email protected]>
1 parent dc9f008 commit a35f872

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,29 @@ aws-parallelcluster-node CHANGELOG
33

44
This file is used to list changes made in each version of the aws-parallelcluster-node package.
55

6+
2.6.1
7+
-----
8+
9+
**ENHANCEMENTS**
10+
- Improved the management of SQS messages and retries to speed-up recovery times when failures occur.
11+
12+
**CHANGES**
13+
- Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower
14+
at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected
15+
capacity.
16+
- Increase parallelism when starting `slurmd` on compute nodes that join the cluster from 10 to 30.
17+
- Reduce the verbosity of messages logged by the node daemons.
18+
- Do not dump logs to `/home/logs` when nodewatcher encounters a failure and terminates the node. CloudWatch can be
19+
used to debug such failures.
20+
- Reduce the number of retries for failed REMOVE events in sqswatcher.
21+
22+
**BUG FIXES**
23+
- Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load,
24+
the scheduler configuration to be left in an inconsistent state.
25+
- Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part
26+
of the same `sqswatcher` iteration.
27+
28+
629
2.6.0
730
-----
831

0 commit comments

Comments
 (0)