Skip to content

Conversation

MasterPtato
Copy link
Contributor

Changes

@MasterPtato MasterPtato requested a review from NathanFlurry July 22, 2025 01:18
Copy link

vercel bot commented Jul 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivet-site Ready Ready Preview Comment Aug 21, 2025 8:38pm

Copy link

claude bot commented Jul 22, 2025

Claude finished @MasterPtato's task —— View job


PR Review Complete ✅

Reviewing the pull request changes and providing feedback on:

  • Read and analyze changed files
  • Review pegboard protocol changes
  • Review image download randomization changes
  • Check for code quality issues
  • Look for potential bugs
  • Assess performance considerations
  • Check for security concerns
  • Evaluate test coverage needs
  • Provide comprehensive review feedback

Review Feedback

Overview

This PR removes the image prewarming functionality from the pegboard protocol and changes image download URL selection from deterministic (hash-based) to random shuffling. The changes are generally well-implemented but have some considerations.

Code Quality & Best Practices ✅

Positive aspects:

  • Clean removal of the prewarm_image function in pegboard.rs:13-93
  • Proper error handling and logging throughout
  • Good use of async/await patterns
  • Sensible metric additions in metrics/mod.rs:125-147

Improvement needed:

  • Remove commented code: In image_download_handler.rs:538-543, the old hash-based randomization code should be deleted rather than commented out to avoid confusion and maintain clean code

Potential Bugs & Issues ⚠️

  1. Commented code cleanup: The commented hash-based randomization logic in image_download_handler.rs:538-543 should be removed entirely
  2. Dependency verification: Need to ensure no other parts of the system depend on the removed prewarm_image endpoint
  3. Fallback handling: The randomization change looks correct, but verify that fallback URLs are still properly handled

Performance Considerations ⚠️

Potential negative impact:

  • Removed prewarming may increase cold start times for images since ATS nodes won't have them cached in advance
  • Random vs deterministic shuffling could impact cache locality - deterministic shuffling based on image ID meant the same image would try the same servers first, potentially improving cache hit rates

Potential positive impact:

  • Better load distribution across servers since requests are now truly random rather than predictable
  • Simplified protocol with less complexity

Security Concerns ✅

  • Improved unpredictability: Random shuffling is actually better from a security perspective than deterministic hash-based ordering
  • No sensitive data exposure: The changes don't introduce any security vulnerabilities

Test Coverage 🔍

  • No visible tests in the changed files - recommend verifying that existing integration tests still pass
  • Consider adding tests for the new randomization behavior to ensure it works as expected
  • Prewarm removal testing: Ensure systems that might have depended on prewarming still function correctly

Minor Issues

  1. Node exporter version: The version bump from some previous version to "1.6.1" in node_exporter.sh:2 appears unrelated to the main changes - consider separating this into its own commit
  2. TODO comment: The TODO on line 545 "Replaced hash based randomizer with complete randomness for now" suggests this might be temporary - clarify if this is the final approach

Recommendations

  1. Clean up commented code in image_download_handler.rs:538-543
  2. Verify backwards compatibility - ensure removing the prewarm endpoint doesn't break any clients
  3. Consider performance monitoring to measure the impact of removing prewarming
  4. Update documentation if any exists that references the prewarming functionality

Summary

The changes are well-implemented but consider the performance trade-offs of removing prewarming. The randomization change is good but needs code cleanup. Overall, this appears to be a reasonable simplification of the pegboard protocol.


Copy link
Contributor Author

MasterPtato commented Jul 22, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

cloudflare-workers-and-pages bot commented Jul 22, 2025

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2c3eaf4
Status:🚫  Build failed.

View logs

Copy link

claude bot commented Aug 21, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link

claude bot commented Aug 21, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@@ -280,6 +280,10 @@ impl ImageDownloadHandler {
let image_size = self.download_inner(ctx, image_config).await?;
let download_duration = download_start_instant.elapsed().as_secs_f64();

crate::metrics::DOWNLOAD_IMAGE_RATE
.with_label_values(&[&(start_instant.elapsed().as_nanos() % 100).to_string()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a variable name inconsistency in this code. The metric is using image_download_size which isn't defined in the current scope, while the function call on line 280 returns a value stored in image_size. To fix the compilation error, please update the metric to use the correct variable:

crate::metrics::DOWNLOAD_IMAGE_RATE
    .with_label_values(&[&(start_instant.elapsed().as_nanos() % 100).to_string()])
    .set(image_size as f64 / download_duration);

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +73 to +90
if let Err(err) = client
.get(format!("http://{}:8080{}", lan_ip, &artifact_url_stub))
.send()
.await
.and_then(|res| res.error_for_status())
{
tracing::error!(
?err,
server_id=?server.server_id,
build_id=?build.build_id,
"failed prewarming",
);
}
}
})
.buffer_unordered(16)
.collect::<()>()
.await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation silently continues execution even if all prewarming HTTP requests fail. While errors are logged, there's no mechanism to report the overall success/failure rate or propagate errors to the caller. This could lead to a situation where the system believes prewarming occurred successfully when it actually failed completely.

Consider enhancing the error handling by:

  1. Tracking successful vs. failed requests
  2. Returning a meaningful response that indicates the prewarming status
  3. Potentially implementing retry logic for failed requests

This would provide better visibility into the prewarming process and allow callers to make informed decisions based on the actual outcome.

Suggested change
if let Err(err) = client
.get(format!("http://{}:8080{}", lan_ip, &artifact_url_stub))
.send()
.await
.and_then(|res| res.error_for_status())
{
tracing::error!(
?err,
server_id=?server.server_id,
build_id=?build.build_id,
"failed prewarming",
);
}
}
})
.buffer_unordered(16)
.collect::<()>()
.await;
let result = client
.get(format!("http://{}:8080{}", lan_ip, &artifact_url_stub))
.send()
.await
.and_then(|res| res.error_for_status());
match result {
Ok(_) => Ok(()),
Err(err) => {
tracing::error!(
?err,
server_id=?server.server_id,
build_id=?build.build_id,
"failed prewarming",
);
Err(())
}
}
}
})
.buffer_unordered(16)
.collect::<Vec<Result<(), ()>>>()
.await;
let total_requests = results.len();
let successful_requests = results.iter().filter(|r| r.is_ok()).count();
let failed_requests = total_requests - successful_requests;
tracing::info!(
total_requests,
successful_requests,
failed_requests,
"prewarming completed"
);
PrewarmingResult {
total_requests,
successful_requests,
failed_requests,
}

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link

cloudflare-workers-and-pages bot commented Aug 21, 2025

Deploying rivet-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2c3eaf4
Status:🚫  Build failed.

View logs

Copy link

cloudflare-workers-and-pages bot commented Aug 21, 2025

Deploying rivet-hub with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2c3eaf4
Status: ✅  Deploy successful!
Preview URL: https://201d7371.rivet-hub-7jb.pages.dev
Branch Preview URL: https://07-22-fix-make-pb-image-down.rivet-hub-7jb.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant