Skip to content

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions redis/asyncio/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,16 +816,28 @@ async def _execute_command(
return await target_node.execute_command(*args, **kwargs)
except (BusyLoadingError, MaxConnectionsError):
raise
except (ConnectionError, TimeoutError):
# Connection retries are being handled in the node's
# Retry object.
# Remove the failed node from the startup nodes before we try
# to reinitialize the cluster
self.nodes_manager.startup_nodes.pop(target_node.name, None)
except (ConnectionError, TimeoutError) as e:
if len(self.nodes_manager.startup_nodes) == 1:
# keep at least one node for retrying
ce = RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
)
ce.__cause__ = e
e = ce
Comment on lines +822 to +827
Copy link
Preview

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Reassigning the caught exception variable e to a new exception can be confusing; consider raising the new RedisClusterException directly or using a separate variable name for clarity.

Suggested change
ce = RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
)
ce.__cause__ = e
e = ce
raise RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
) from e

Copilot uses AI. Check for mistakes.

else:
# Connection retries are being handled in the node's
# Retry object.
# Remove the failed node from the startup nodes before we
# try to reinitialize the cluster
self.nodes_manager.startup_nodes.pop(
target_node.name,
None
)
# Hard force of reinitialize of the node/slots setup
# and try again with the new setup
await self.aclose()
raise
raise e
except (ClusterDownError, SlotNotCoveredError):
# ClusterDownError can occur during a failover and to get
# self-healed, we will try to reinitialize the cluster layout
Expand Down