Skip to content

Conversation

MasterPtato
Copy link
Contributor

@MasterPtato MasterPtato commented Jul 18, 2025

Changes

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of database operations by adding automatic retry with exponential backoff for certain query errors.
    • Standardized how data is passed to database queries to prevent potential issues and ensure consistent behavior across services.
    • Minor adjustments to how parameters are referenced in various operations for improved stability.
  • Chores

    • Internal consistency improvements in parameter handling for database queries across multiple services.

@MasterPtato MasterPtato requested a review from NathanFlurry July 18, 2025 18:49
Copy link

vercel bot commented Jul 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivet-site Ready Ready Preview Comment Aug 21, 2025 8:15pm

Copy link

coderabbitai bot commented Jul 18, 2025

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

claude bot commented Jul 18, 2025

Claude finished @MasterPtato's task —— View job


PR Review: Make query_as macros retry

Reviewing this PR for code quality, potential bugs, performance, security, and test coverage.

Todo List:

  • Analyze main macro implementation in sql_query_macros.rs
  • Review usage patterns in affected service files
  • Assess code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide comprehensive feedback

Review Summary

This PR adds retry functionality to SQL query macros, improving reliability when dealing with transient database errors. The implementation adds retry logic to both __sql_query and __sql_query_as macros with exponential backoff.

Code Quality & Best Practices ✅

Positive aspects:

  • Consistent implementation: Both connection pool and transaction variants (@tx) have identical retry logic
  • Proper error categorization: Only retries on transient errors (Database, Io, Tls, Protocol, PoolTimedOut, PoolClosed, WorkerCrashed)
  • Comprehensive logging: Includes retry attempt logging with tracing::warn!
  • Backwards compatibility: All existing SQL macro calls automatically get retry functionality

Suggestions for improvement:

  1. Magic numbers: Consider making 50 (jitter parameter) a constant like QUERY_RETRY_MS
  2. Error wrapping: The custom error wrapping in std::io::Error feels indirect - consider a more direct approach

Potential Bugs & Issues ⚠️

1. Connection reuse on retry (packages/common/pools/src/utils/sql_query_macros.rs:162-196)

// Retry loop
let res = loop {
    let query = sqlx::query($crate::__opt_indoc!($sql))
    $(
        .bind($bind)
    )*;

    match query.execute(&mut *conn).await {
        // ... retry logic uses same connection
    }
};

Issue: The macro reuses the same connection for all retry attempts. If the connection is in a bad state (e.g., broken, timed out), retries will likely fail again.

Recommendation: Consider acquiring a fresh connection on retry, especially for connection-related errors.

2. Transaction retry limitations (packages/common/pools/src/utils/sql_query_macros.rs:225-253)

match query.$action(&mut **$tx).await {
    Err(err) => {
        // Retry logic for transactions
    }
}

Issue: Retrying within a transaction context can be problematic. If the transaction is in an error state, retries may not be effective and could mask real issues.

Recommendation: Consider different retry strategies for transactions vs. individual queries.

Performance Considerations 🚀

Positive aspects:

  • Exponential backoff: Prevents thundering herd problems
  • Reasonable retry limits: 6 attempts with 100ms base delay
  • Selective retrying: Only retries on specific error types

Potential concerns:

  • Increased latency: Failed queries now have up to 6 retry attempts, potentially adding significant latency
  • Resource usage: Multiple retries increase database load during issues

Security Concerns 🔒

No major security issues identified, but consider:

  • Error information leakage: Retry logs include error details - ensure sensitive information isn't exposed
  • Resource exhaustion: While retry limits exist, many concurrent failing queries could still cause resource issues

Test Coverage 📊

Missing areas:

  • Unit tests: No visible tests for the retry logic itself
  • Integration tests: No tests verifying retry behavior with actual database errors
  • Edge cases: No tests for retry limits, backoff timing, or error type handling

Recommendations:

  1. Add unit tests for retry logic with mocked database errors
  2. Add integration tests that inject database failures
  3. Test retry behavior with different error types
  4. Verify retry limits and backoff timing

Specific Code Improvements

1. Connection management for retries:

// Consider acquiring fresh connection on connection errors
match &err {
    Database(_) | Io(_) | Tls(_) | Protocol(_) | PoolTimedOut | PoolClosed | WorkerCrashed => {
        // For connection-related errors, consider getting a fresh connection
        if matches!(&err, PoolTimedOut | PoolClosed | WorkerCrashed) {
            conn = $crate::__sql_acquire!($ctx, driver);
        }
        tracing::warn!(?err, "query retry ({i}/{})", $crate::utils::sql_query_macros::MAX_QUERY_RETRIES);
        backoff.tick().await;
    }
}

2. More descriptive error messages:

// In packages/common/pools/src/utils/sql_query_macros.rs:8
#[error("max sql retries exceeded ({} attempts), last error: {0}", MAX_QUERY_RETRIES)]
MaxSqlRetries(sqlx::Error),

Overall Assessment

This is a solid reliability improvement that addresses transient database errors. The implementation is consistent and follows good patterns, but has some areas for improvement around connection management and testing.

Recommendation: ✅ Approve with suggested improvements

The retry logic will significantly improve system reliability. Consider addressing the connection reuse issue and adding comprehensive tests in a follow-up PR.

Copy link
Contributor Author

MasterPtato commented Jul 18, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

cloudflare-workers-and-pages bot commented Jul 18, 2025

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: f64697c
Status:🚫  Build failed.

View logs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/common/pools/src/utils/sql_query_macros.rs (1)

182-192: Consider additional retryable error types and reducing code duplication.

The retry error handling looks good, but consider:

  1. Should Configuration errors be retryable in some cases?
  2. Should AnyDriverError be examined to determine if the underlying error is retryable?
  3. The retry logic is duplicated across 4 different macro variants. Consider extracting this into a helper macro to reduce duplication and ease maintenance.

Example of extracting retry logic:

+#[macro_export]
+macro_rules! __sql_retry_loop {
+    ($query_builder:expr, $conn:expr, $max_retries:expr, $backoff:expr) => {{
+        let mut i = 0;
+        loop {
+            match $query_builder.await {
+                Err(err) => {
+                    i += 1;
+                    if i > $max_retries {
+                        break Err(
+                            sqlx::Error::Io(
+                                std::io::Error::new(
+                                    std::io::ErrorKind::Other,
+                                    $crate::utils::sql_query_macros::Error::MaxSqlRetries(err),
+                                )
+                            )
+                        );
+                    }
+
+                    use sqlx::Error::*;
+                    match &err {
+                        Database(_) | Io(_) | Tls(_) | Protocol(_) | PoolTimedOut | PoolClosed
+                        | WorkerCrashed => {
+                            tracing::warn!(?err, "query retry ({i}/{})", $max_retries);
+                            $backoff.tick().await;
+                        }
+                        _ => break Err(err),
+                    }
+                }
+                x => break x,
+            }
+        }
+    }};
+}

Also applies to: 239-249, 309-319, 366-376

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3493a1d and 6bc330c.

📒 Files selected for processing (32)
  • packages/common/chirp-workflow/core/src/db/crdb_nats/debug.rs (2 hunks)
  • packages/common/pools/src/utils/sql_query_macros.rs (2 hunks)
  • packages/core/services/build/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/build/src/ops/patch_tags.rs (2 hunks)
  • packages/core/services/cdn/ops/site-get/src/lib.rs (1 hunks)
  • packages/core/services/cf-custom-hostname/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/cloud/ops/game-config-get/src/lib.rs (1 hunks)
  • packages/core/services/cloud/ops/namespace-get/src/lib.rs (1 hunks)
  • packages/core/services/cloud/ops/version-get/src/lib.rs (1 hunks)
  • packages/core/services/cloud/worker/src/workers/version_name_reserve.rs (1 hunks)
  • packages/core/services/cluster/src/ops/datacenter/get.rs (1 hunks)
  • packages/core/services/cluster/src/ops/server/lost_list.rs (1 hunks)
  • packages/core/services/cluster/src/ops/server/prune_with_filter.rs (1 hunks)
  • packages/core/services/cluster/src/workflows/datacenter/mod.rs (1 hunks)
  • packages/core/services/cluster/src/workflows/server/mod.rs (1 hunks)
  • packages/core/services/game/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/game/ops/namespace-resolve-name-id/src/lib.rs (1 hunks)
  • packages/core/services/game/ops/version-get/src/lib.rs (1 hunks)
  • packages/core/services/linode/standalone/gc/src/lib.rs (1 hunks)
  • packages/core/services/load-test/standalone/sqlx/src/lib.rs (1 hunks)
  • packages/core/services/mm-config/ops/lobby-group-get/src/lib.rs (1 hunks)
  • packages/core/services/mm-config/ops/lobby-group-resolve-version/src/lib.rs (1 hunks)
  • packages/core/services/mm-config/ops/namespace-get/src/lib.rs (1 hunks)
  • packages/core/services/mm/ops/lobby-for-run-id/src/lib.rs (1 hunks)
  • packages/core/services/mm/ops/lobby-get/src/lib.rs (1 hunks)
  • packages/core/services/mm/ops/player-get/src/lib.rs (1 hunks)
  • packages/core/services/route/src/ops/upsert.rs (2 hunks)
  • packages/core/services/team/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/team/ops/member-get/src/lib.rs (1 hunks)
  • packages/core/services/upload/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/user/ops/get/src/lib.rs (1 hunks)
  • packages/core/services/user/ops/profile-validate/src/lib.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/core/services/game/ops/version-get/src/lib.rs (1)
packages/core/services/load-test/standalone/sqlx/src/lib.rs (1)
  • version_ids (97-101)
packages/core/services/mm-config/ops/namespace-get/src/lib.rs (1)
packages/core/services/load-test/standalone/sqlx/src/lib.rs (1)
  • namespace_ids (77-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: cargo-check-main
  • GitHub Check: claude-review
🔇 Additional comments (32)
packages/core/services/cloud/worker/src/workers/version_name_reserve.rs (1)

35-35: LGTM: Parameter passing updated to support retry functionality.

The change from date_prefix to &date_prefix is correct and aligns with the systematic refactoring to enable retry functionality for SQL query macros. Passing parameters by reference ensures they remain available for multiple query attempts.

packages/core/services/build/ops/get/src/lib.rs (1)

46-46: LGTM: Consistent parameter passing update.

The change from build_ids to &build_ids follows the consistent pattern across the codebase to support retry functionality. This ensures the vector remains available for multiple query attempts.

packages/core/services/mm/ops/lobby-for-run-id/src/lib.rs (1)

36-36: LGTM: Parameter reference update aligns with retry functionality.

The modification from run_ids to &run_ids maintains consistency with the codebase-wide refactoring to enable retry functionality for SQL query macros.

packages/core/services/team/ops/member-get/src/lib.rs (1)

40-40: LGTM: Serialization parameter updated for retry support.

The change from serde_json::to_value(members) to serde_json::to_value(&members) is correct and consistent with the broader refactoring. Passing the vector by reference ensures it remains available for potential retry attempts.

packages/core/services/mm-config/ops/namespace-get/src/lib.rs (1)

39-39: LGTM: Consistent parameter passing for retry functionality.

The change from namespace_ids to &namespace_ids follows the established pattern across the codebase to support retry functionality in SQL query macros.

packages/core/services/mm-config/ops/lobby-group-resolve-version/src/lib.rs (1)

45-45: Good optimization: pass vector by reference

This change improves performance by avoiding unnecessary cloning of the lobby_group_ids vector when passing it to the SQL query macro. The use of &lobby_group_ids follows Rust best practices and is consistent with the broader pattern applied across the codebase.

packages/core/services/cloud/ops/game-config-get/src/lib.rs (1)

35-35: Consistent optimization: pass vector by reference

This change mirrors the same optimization pattern seen across the codebase, passing &game_ids by reference to avoid unnecessary vector cloning. This is a performance improvement that follows Rust best practices.

packages/core/services/user/ops/profile-validate/src/lib.rs (1)

79-79: Consistent optimization: pass string by reference

This change extends the optimization pattern to string parameters, passing &display_name by reference to avoid unnecessary string cloning. This is efficient and consistent with the broader refactoring to standardize parameter passing to SQL macros.

packages/core/services/cf-custom-hostname/ops/get/src/lib.rs (1)

39-39: Consistent vector optimization

This change continues the systematic optimization pattern, passing &identifiers by reference to avoid unnecessary vector cloning. The change is efficient and consistent with the broader refactoring effort.

packages/core/services/game/ops/version-get/src/lib.rs (1)

46-46: Final vector optimization in the pattern

This change completes the systematic optimization pattern seen across all reviewed files, passing &version_ids by reference to avoid unnecessary vector cloning. The change is efficient and maintains consistency with the broader refactoring effort.

packages/core/services/cdn/ops/site-get/src/lib.rs (1)

30-30: LGTM: Efficient parameter passing.

Good change to pass the vector by reference instead of by value, avoiding unnecessary moves while maintaining the same functionality.

packages/core/services/mm-config/ops/lobby-group-get/src/lib.rs (1)

36-36: LGTM: Consistent parameter passing improvement.

The change to pass lobby_group_ids by reference aligns with the broader refactoring pattern and avoids unnecessary vector moves.

packages/core/services/cluster/src/ops/server/lost_list.rs (1)

171-171: LGTM: Consistent with refactoring pattern.

The change to pass server_ids by reference is consistent with the broader codebase refactoring and avoids unnecessary vector moves.

packages/core/services/mm/ops/player-get/src/lib.rs (1)

46-46: LGTM: Consistent parameter passing optimization.

The change to pass player_ids by reference follows the established pattern and improves efficiency by avoiding unnecessary vector moves.

packages/core/services/cloud/ops/namespace-get/src/lib.rs (1)

27-27: LGTM: Completes the consistent refactoring pattern.

The change to pass namespace_ids by reference is consistent with the broader codebase refactoring and provides the same efficiency benefits as the other files.

packages/core/services/mm/ops/lobby-get/src/lib.rs (1)

86-86: Good performance optimization by passing vector by reference

This change avoids unnecessary cloning of the lobby_ids vector when passing it to the SQL query. The ANY($1) clause works correctly with both owned values and references, making this a safe and beneficial optimization.

packages/core/services/team/ops/get/src/lib.rs (1)

41-41: Consistent optimization pattern - passing vector by reference

This change aligns with the broader pattern in this PR to pass vectors by reference to SQL queries, avoiding unnecessary cloning and improving performance.

packages/core/services/user/ops/get/src/lib.rs (1)

69-69: Performance optimization in cache context

This change is particularly beneficial since it's within a cache closure where avoiding unnecessary vector cloning can have a more significant performance impact.

packages/core/services/upload/ops/get/src/lib.rs (1)

63-63: Consistent optimization pattern applied

This change follows the same beneficial pattern of passing vectors by reference to SQL queries, avoiding unnecessary cloning while maintaining the same functionality.

packages/core/services/cluster/src/ops/datacenter/get.rs (1)

105-105: Consistent optimization pattern with function parameter

This change maintains the same beneficial pattern of passing vectors by reference to SQL queries. The optimization works equally well whether the vector is locally created or passed as a function parameter.

packages/core/services/game/ops/get/src/lib.rs (1)

71-71: LGTM: Efficient parameter passing.

Passing the vector by reference to the SQL query is more efficient and avoids unnecessary moves, especially since game_ids is used later in the function.

packages/core/services/cloud/ops/version-get/src/lib.rs (1)

27-27: LGTM: Consistent parameter passing improvement.

The change to pass the vector by reference aligns with the codebase-wide refactoring and improves efficiency by avoiding unnecessary moves.

packages/core/services/cluster/src/workflows/server/mod.rs (1)

600-600: LGTM: Consistent with codebase refactoring.

Passing the vector by reference is the correct approach and aligns with the systematic parameter passing improvements across the codebase.

packages/core/services/cluster/src/ops/server/prune_with_filter.rs (1)

130-130: LGTM: Efficient parameter passing.

The change to pass the datacenter IDs vector by reference is consistent with the codebase-wide refactoring and improves efficiency.

packages/core/services/game/ops/namespace-resolve-name-id/src/lib.rs (1)

33-33: LGTM: Consistent parameter handling.

The modification to extract and collect name_ids while maintaining the reference passing pattern is consistent with the codebase-wide refactoring for SQL query parameters.

packages/core/services/route/src/ops/upsert.rs (1)

114-114: LGTM! Parameter passing optimization for SQL macros.

The changes correctly pass the JSON parameter by reference instead of by value, which improves efficiency by avoiding unnecessary clones. This aligns with the broader pattern across the codebase to prepare for the enhanced SQL macros with retry functionality.

Also applies to: 143-143

packages/core/services/build/src/ops/patch_tags.rs (1)

136-136: LGTM! Consistent parameter passing optimization.

Both changes correctly pass JSON parameters by reference to the SQL macros, following the established pattern across the codebase. This improves efficiency and prepares the code for the enhanced SQL macros with retry functionality.

Also applies to: 150-150

packages/core/services/load-test/standalone/sqlx/src/lib.rs (1)

107-107: LGTM! Vector parameter optimization.

The change correctly passes the user_ids vector by reference instead of by value, which is more efficient and follows the consistent pattern across the codebase for SQL macro parameter passing.

packages/core/services/linode/standalone/gc/src/lib.rs (1)

109-109: LGTM! Efficient vector parameter passing.

The change correctly passes the image_ids vector by reference, avoiding unnecessary cloning and following the consistent pattern established across the codebase for SQL macro parameter passing.

packages/common/chirp-workflow/core/src/db/crdb_nats/debug.rs (1)

52-52: LGTM! Consistent vector parameter optimization.

Both changes correctly pass UUID vectors by reference to the SQL macros, following the established pattern across the codebase. This improves efficiency by avoiding unnecessary vector cloning, which is particularly beneficial for debugging operations that may handle large numbers of workflow and signal IDs.

Also applies to: 496-496

packages/core/services/cluster/src/workflows/datacenter/mod.rs (1)

231-237: LGTM! Correct parameter passing for retry mechanism.

The change to pass pools_buf, gph_dns_parent, and gph_static by reference is appropriate and necessary for the retry mechanism. Since the SQL macros now recreate queries and rebind parameters on each retry attempt, passing by reference prevents moving the values and allows multiple retries.

packages/common/pools/src/utils/sql_query_macros.rs (1)

420-424: No Action Needed for sql_fetch Macro

The sql_fetch macro (wrapping __sql_query_as_raw) currently has no usages in the repository, so its lack of retry logic does not introduce any inconsistency with other query macros. If in future you expose streaming or cursor-based fetches via sql_fetch, you can revisit adding retry support at that time.

Comment on lines +210 to +215
let mut backoff = $crate::__rivet_util::Backoff::new(
4,
None,
$crate::utils::sql_query_macros::QUERY_RETRY_MS,
50
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the Backoff initialization parameters.

The Backoff is initialized with parameters (4, None, QUERY_RETRY_MS, 50) in multiple places. The first parameter 4 is unclear - please verify what this represents in the __rivet_util::Backoff::new() constructor and ensure it's appropriate for SQL query retries.

Also applies to: 280-285, 337-342


🏁 Script executed:

#!/bin/bash
# Search for the Backoff struct definition to understand the parameters
ast-grep --pattern 'impl Backoff {
  $$$
  fn new($$$) {
    $$$
  }
  $$$
}'

# Also search for any documentation on Backoff
rg -A 10 "struct Backoff" --type rust

Length of output: 738


🏁 Script executed:

#!/bin/bash
# Show the Backoff::new signature and docs
rg -n -A5 -B2 "pub fn new" packages/common/util/core/src/lib.rs

# Show the implementation of next() to understand retry behavior
rg -n -A5 -B3 "fn next" packages/common/util/core/src/lib.rs

Length of output: 435


Clarify Backoff::new parameters

The call

Backoff::new(4, None, QUERY_RETRY_MS, 50)

maps to

pub fn new(
    max_exponent: usize,    // first arg
    max_retries: Option<usize>,
    wait: usize,            // base wait in ms
    randomness: usize,      // max jitter in ms
)
  • max_exponent = 4 ⇒ delays = wait * 2^n up to 2⁴ → 100 ms → 1 600 ms (± jitter)
  • max_retries = Noneunbounded retry attempts
  • wait = QUERY_RETRY_MS (100 ms) and randomness = 50 ms jitter

Please confirm that:

  • A maximum exponent of 4 (1.6 s cap) is sufficient.
  • Unlimited retries are desired. Otherwise, pass Some(MAX_QUERY_RETRIES) instead of None to bound attempts.

Applies at:

  • packages/common/pools/src/utils/sql_query_macros.rs:210–215
  • packages/common/pools/src/utils/sql_query_macros.rs:280–285
  • packages/common/pools/src/utils/sql_query_macros.rs:337–342
🤖 Prompt for AI Agents
In packages/common/pools/src/utils/sql_query_macros.rs around lines 210 to 215,
review the Backoff::new call parameters to ensure they match desired retry
behavior. Confirm if a max_exponent of 4 (capping delay at about 1.6 seconds) is
sufficient for your use case and whether unlimited retries (max_retries set to
None) are intended. If you want to limit retry attempts, replace None with
Some(MAX_QUERY_RETRIES) to bound the number of retries. Apply the same
confirmation and changes as needed at lines 280–285 and 337–342.

@MasterPtato MasterPtato force-pushed the 07-18-fix_make_query_as_macros_retry branch from 6bc330c to 2d84554 Compare July 21, 2025 17:46
Copy link

cloudflare-workers-and-pages bot commented Jul 21, 2025

Deploying rivet-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: f64697c
Status:🚫  Build failed.

View logs

Copy link

claude bot commented Jul 21, 2025

Claude finished @MasterPtato's task —— View job


PR Review: Make query_as macros retry

Reviewing this PR for code quality, potential bugs, performance, security, and test coverage.

Todo List:

  • Analyze main macro implementation in sql_query_macros.rs
  • Review usage patterns in affected service files
  • Assess code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide comprehensive feedback

Review Summary

This PR adds retry functionality to SQL query macros, improving reliability when dealing with transient database errors. The implementation adds retry logic to both __sql_query and __sql_query_as macros with exponential backoff.

Code Quality & Best Practices ✅

Positive aspects:

  • Consistent implementation: Both connection pool and transaction variants (@tx) have identical retry logic
  • Proper error categorization: Only retries on transient errors (Database, Io, Tls, Protocol, PoolTimedOut, PoolClosed, WorkerCrashed)
  • Comprehensive logging: Includes retry attempt logging with tracing::warn!
  • Backwards compatibility: All existing SQL macro calls automatically get retry functionality

Suggestions for improvement:

  1. Magic numbers: Consider making 50 (jitter parameter) a constant like QUERY_RETRY_MS
  2. Error wrapping: The custom error wrapping in std::io::Error feels indirect - consider a more direct approach

Potential Bugs & Issues ⚠️

1. Connection reuse on retry (packages/common/pools/src/utils/sql_query_macros.rs:162-196)

// Retry loop
let res = loop {
    let query = sqlx::query($crate::__opt_indoc!($sql))
    $(
        .bind($bind)
    )*;

    match query.execute(&mut *conn).await {
        // ... retry logic uses same connection
    }
};

Issue: The macro reuses the same connection for all retry attempts. If the connection is in a bad state (e.g., broken, timed out), retries will likely fail again.

Recommendation: Consider acquiring a fresh connection on retry, especially for connection-related errors.

2. Transaction retry limitations (packages/common/pools/src/utils/sql_query_macros.rs:225-253)

match query.$action(&mut **$tx).await {
    Err(err) => {
        // Retry logic for transactions
    }
}

Issue: Retrying within a transaction context can be problematic. If the transaction is in an error state, retries may not be effective and could mask real issues.

Recommendation: Consider different retry strategies for transactions vs. individual queries.

3. Backoff configuration concerns (packages/common/pools/src/utils/sql_query_macros.rs:153-158)

let mut backoff = $crate::__rivet_util::Backoff::new(
    4,    // max_exponent: caps at 1600ms delay
    None, // max_retries: unlimited retries!
    $crate::utils::sql_query_macros::QUERY_RETRY_MS, // 100ms base
    50    // 0-50ms jitter
);

Issue: The max_retries: None means unlimited retries, which conflicts with the MAX_QUERY_RETRIES: 6 constant used elsewhere in the code. The retry logic manually checks against MAX_QUERY_RETRIES but the backoff itself has no limit.

Recommendation: Pass Some(MAX_QUERY_RETRIES) to the backoff for consistency.

Performance Considerations 🚀

Positive aspects:

  • Exponential backoff: Prevents thundering herd problems
  • Reasonable retry limits: 6 attempts with 100ms base delay
  • Selective retrying: Only retries on specific error types

Potential concerns:

  • Increased latency: Failed queries now have up to 6 retry attempts, potentially adding significant latency
  • Resource usage: Multiple retries increase database load during issues

Security Concerns 🔒

No major security issues identified, but consider:

  • Error information leakage: Retry logs include error details - ensure sensitive information isn't exposed
  • Resource exhaustion: While retry limits exist, many concurrent failing queries could still cause resource issues

Test Coverage 📊

Missing areas:

  • Unit tests: No visible tests for the retry logic itself
  • Integration tests: No tests verifying retry behavior with actual database errors
  • Edge cases: No tests for retry limits, backoff timing, or error type handling

Recommendations:

  1. Add unit tests for retry logic with mocked database errors
  2. Add integration tests that inject database failures
  3. Test retry behavior with different error types
  4. Verify retry limits and backoff timing

Specific Code Improvements

1. Connection management for retries:

// Consider acquiring fresh connection on connection errors
match &err {
    Database(_) | Io(_) | Tls(_) | Protocol(_) | PoolTimedOut | PoolClosed | WorkerCrashed => {
        // For connection-related errors, consider getting a fresh connection
        if matches!(&err, PoolTimedOut | PoolClosed | WorkerCrashed) {
            conn = $crate::__sql_acquire!($ctx, driver);
        }
        tracing::warn!(?err, "query retry ({i}/{})", $crate::utils::sql_query_macros::MAX_QUERY_RETRIES);
        backoff.tick().await;
    }
}

2. More descriptive error messages:

// In packages/common/pools/src/utils/sql_query_macros.rs:8
#[error("max sql retries exceeded ({} attempts), last error: {0}", MAX_QUERY_RETRIES)]
MaxSqlRetries(sqlx::Error),

Overall Assessment

This is a solid reliability improvement that addresses transient database errors. The implementation is consistent and follows good patterns, but has some areas for improvement around connection management and testing.

Recommendation: ✅ Approve with suggested improvements

The retry logic will significantly improve system reliability. Consider addressing the connection reuse issue and adding comprehensive tests in a follow-up PR.


Copy link

cloudflare-workers-and-pages bot commented Jul 21, 2025

Deploying rivet-hub with  Cloudflare Pages  Cloudflare Pages

Latest commit: f64697c
Status:🚫  Deploy failed.

View logs

Copy link

claude bot commented Aug 21, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants