Skip to content

Don't deplete all the startup nodes after ConnectionError/TimeoutError #3697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eoghanmurray
Copy link

Don't deplete all the startup nodes after ConnectionError/TimeoutError against all nodes, rather keep one around so that retry algorithm has at least one node to work with

Description of change

See bug report #3693

eoghanmurray and others added 2 commits July 4, 2025 13:55
… or TimeoutError against all nodes, rather keep one around so that retry algorithm has at least one node to work with
@petyaslavova petyaslavova requested a review from Copilot July 14, 2025 07:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the cluster command execution logic to avoid removing the last remaining startup node on connection or timeout failures, ensuring retries can still proceed.

  • Preserve one startup node when all others fail and wrap the original exception in a RedisClusterException
  • Only remove failed nodes if more than one startup node remains
  • Re-raise the appropriate exception after forcing a cluster layout reinitialization
Comments suppressed due to low confidence (2)

redis/asyncio/cluster.py:824

  • [nitpick] The error message could be more descriptive and grammatically clear, e.g., 'Unable to connect to Redis Cluster: connection or timeout errors on all startup nodes'.
                        'Connection or Timeout Errors across all startup nodes'

redis/asyncio/cluster.py:820

  • Add a unit test covering the scenario where only one startup node remains to ensure it isn't removed and the correct RedisClusterException is raised with the original cause.
                if len(self.nodes_manager.startup_nodes) == 1:

Comment on lines +822 to +827
ce = RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
)
ce.__cause__ = e
e = ce
Copy link
Preview

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Reassigning the caught exception variable e to a new exception can be confusing; consider raising the new RedisClusterException directly or using a separate variable name for clarity.

Suggested change
ce = RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
)
ce.__cause__ = e
e = ce
raise RedisClusterException(
'Redis Cluster cannot be connected. '
'Connection or Timeout Errors across all startup nodes'
) from e

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants