Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Dec 3, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Dec 11, 2025 7:17am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Dec 11, 2025 7:17am
rivet-inspector Ignored Ignored Preview Dec 11, 2025 7:17am
rivet-site Ignored Ignored Preview Dec 11, 2025 7:17am

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless December 3, 2025 01:58 Inactive
Copy link
Contributor Author

MasterPtato commented Dec 3, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Dec 3, 2025

Code Review - PR #3580: Fix runner config creation

Overview

This PR addresses a race condition where runner pool workflows might not exist when signaling them during runner config operations. The fix adds graceful handling and backfill logic to ensure workflows are created when needed.

✅ Strengths

  1. Proper Error Handling: Using .graceful_not_found() prevents failures when the workflow doesn't exist yet, which is the right pattern for this scenario.

  2. Backfill Logic: The upsert operation includes smart backfill logic (lines 182-192 in upsert.rs) to create the workflow if it doesn't exist when trying to signal it. This handles race conditions elegantly.

  3. Observability: Added debug logging when workflows aren't found, which helps with troubleshooting (line 57 in delete.rs).

  4. Logging Improvement: Changing chunk buffer GC logging from debug to trace is appropriate since this is routine maintenance logging that would be noisy at debug level.

🔍 Observations & Suggestions

1. Asymmetric Handling Between Delete and Upsert

In delete.rs, when the workflow isn't found, you only log a debug message. However, in upsert.rs, you create the workflow as a backfill. Consider if delete.rs needs similar handling:

Current behavior in delete.rs:56-58:

if res.is_none() {
    tracing::debug!(namespace_id=?input.namespace_id, name=%input.name, "no runner pool workflow to bump");
}

Question: Should delete also attempt to clean up or create a workflow? Or is the current behavior correct because deleting a config when no pool exists is a no-op? If the latter, consider adding a comment explaining why delete differs from upsert.

2. Race Condition in Backfill Logic

In upsert.rs:182-192, there's a potential race condition between checking res.is_none() and dispatching the workflow:

// Backfill
if res.is_none() {
    ctx.workflow(crate::workflows::runner_pool::Input {
        namespace_id: input.namespace_id,
        runner_name: input.name.clone(),
    })
    .tag("namespace_id", input.namespace_id)
    .tag("runner_name", input.name.clone())
    .unique()  // ✅ Good: prevents duplicates
    .dispatch()
    .await?;
}

The .unique() call is excellent and handles this, but it's worth confirming: does .unique() guarantee idempotency if multiple concurrent upserts try to create the same workflow? If so, this is perfect. If not, you might need additional synchronization.

3. Logging Consistency

The debug message in delete.rs:57 uses name=%input.name while upsert.rs would benefit from similar observability. Per CLAUDE.md conventions:

  • ✅ Lowercase message: "no runner pool workflow to bump"
  • ✅ Structured logging with field names
  • ✅ Uses ?namespace_id for debug formatting

Consider if upsert should also log when it performs a backfill for consistency:

if res.is_none() {
    tracing::debug!(namespace_id=?input.namespace_id, name=%input.name, "backfilling runner pool workflow");
    ctx.workflow(/* ... */).dispatch().await?;
}

4. Potential Enhancement: Metrics

Consider adding metrics for these scenarios:

  • Counter for graceful_not_found hits (signals sent to non-existent workflows)
  • Counter for backfill workflow creations

This would help monitor how often these edge cases occur in production.

🔒 Security & Performance

  • Security: ✅ No security concerns. The changes don't introduce new attack vectors.
  • Performance: ✅ The additional workflow dispatch in backfill is acceptable since it only happens when the workflow doesn't exist (rare case).
  • Error Propagation: ✅ Proper use of ? operator for error propagation.

📋 Testing Considerations

While the code looks correct, consider testing these scenarios:

  1. Race condition test: Multiple concurrent upserts for the same runner config
  2. Deletion test: Delete a config when the workflow doesn't exist
  3. Backfill test: Upsert a config that affects pool when workflow is missing
  4. Signal success test: Normal flow when workflow exists

Minor Style Notes

Per CLAUDE.md guidelines, everything looks good:

  • ✅ Structured logging follows conventions
  • ✅ Proper error handling with Result<>
  • ✅ Debug formatting (?) used appropriately for IDs
  • ✅ Display formatting (%) used for name strings

Verdict

✅ Approve with minor suggestions

The core fix is solid and addresses the race condition properly. The main question is whether the asymmetric handling between delete and upsert is intentional. If so, adding a comment explaining the design decision would be helpful for future maintainers.

The changes are:

  • Functionally correct: Handles the workflow-not-found case gracefully
  • Well-structured: Uses appropriate patterns (.unique(), .graceful_not_found())
  • Observable: Includes logging for troubleshooting

Recommended before merge:

  1. Clarify whether delete's behavior is intentionally different from upsert
  2. Consider adding a comment explaining the backfill logic
  3. Optional: Add observability logging when backfill occurs

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 3, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3580

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3580

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3580

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3580

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3580

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3580

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3580

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3580

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3580

commit: fa8ced8

@graphite-app
Copy link
Contributor

graphite-app bot commented Dec 12, 2025

Merge activity

  • Dec 12, 12:48 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Dec 12, 12:48 AM UTC: CI is running for this pull request on a draft pull request (#3634) due to your merge queue CI optimization settings.
  • Dec 12, 12:49 AM UTC: Merged by the Graphite merge queue via draft PR: #3634.

graphite-app bot pushed a commit that referenced this pull request Dec 12, 2025
@graphite-app graphite-app bot closed this Dec 12, 2025
@graphite-app graphite-app bot deleted the 12-02-chore_fix_runner_config_creation branch December 12, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants