Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

eoghanmurray · 2025-07-04T12:57:12Z

Don't deplete all the startup nodes after ConnectionError/TimeoutError against all nodes, rather keep one around so that retry algorithm has at least one node to work with

Description of change

See bug report #3693

… or TimeoutError against all nodes, rather keep one around so that retry algorithm has at least one node to work with

Copilot

Pull Request Overview

This PR updates the cluster command execution logic to avoid removing the last remaining startup node on connection or timeout failures, ensuring retries can still proceed.

Preserve one startup node when all others fail and wrap the original exception in a RedisClusterException
Only remove failed nodes if more than one startup node remains
Re-raise the appropriate exception after forcing a cluster layout reinitialization

Comments suppressed due to low confidence (2)

redis/asyncio/cluster.py:824

[nitpick] The error message could be more descriptive and grammatically clear, e.g., 'Unable to connect to Redis Cluster: connection or timeout errors on all startup nodes'.

                        'Connection or Timeout Errors across all startup nodes'

redis/asyncio/cluster.py:820

Add a unit test covering the scenario where only one startup node remains to ensure it isn't removed and the correct RedisClusterException is raised with the original cause.

                if len(self.nodes_manager.startup_nodes) == 1:

Copilot · 2025-07-14T07:33:03Z

redis/asyncio/cluster.py

+                    ce = RedisClusterException(
+                        'Redis Cluster cannot be connected. '
+                        'Connection or Timeout Errors across all startup nodes'
+                    )
+                    ce.__cause__ = e
+                    e = ce


[nitpick] Reassigning the caught exception variable e to a new exception can be confusing; consider raising the new RedisClusterException directly or using a separate variable name for clarity.

Suggested change

ce = RedisClusterException(

'Redis Cluster cannot be connected. '

'Connection or Timeout Errors across all startup nodes'

)

ce.__cause__ = e

e = ce

raise RedisClusterException(

'Redis Cluster cannot be connected. '

'Connection or Timeout Errors across all startup nodes'

) from e

eoghanmurray and others added 2 commits July 4, 2025 13:55

Don't deplete all the startup nodes after a series of ConnectionError…

699f8f6

… or TimeoutError against all nodes, rather keep one around so that retry algorithm has at least one node to work with

Merge branch 'master' into keep-last-cluster-node-for-retry

9a6ab4a

petyaslavova requested a review from Copilot July 14, 2025 07:32

Copilot AI reviewed Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

eoghanmurray commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Uh oh!

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

Are you sure you want to change the base?

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

Conversation

eoghanmurray commented Jul 4, 2025

Description of change

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!