Skip to content

Conversation

MasterPtato
Copy link
Contributor

Changes

@MasterPtato MasterPtato requested a review from NathanFlurry May 27, 2025 20:45
Copy link
Contributor Author

MasterPtato commented May 27, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds comprehensive metrics tracking to the Rivet Guard service, focusing on monitoring system performance and request handling.

  • Added Prometheus metrics in metrics.rs for tracking TCP connections, active requests, route cache size, rate limiters, and in-flight counters
  • Modified server.rs to increment/decrement connection and request metrics at key lifecycle points
  • Changed rate limiting in proxy_service.rs to use IP addresses instead of actor IDs for better tracking
  • Added metrics server configuration in main.rs with a dedicated task for metrics collection
  • Configured Prometheus target for guard service on port 8091 with 15-second scrape interval

7 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 211 to 219
metrics::ACTIVE_REQUESTS_TOTAL.inc();

let result = service_clone.process(req).await.map_err(|err| GlobalErrorWrapper{err});

metrics::ACTIVE_REQUESTS_TOTAL.dec();

result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using a try-finally pattern to ensure metrics are always decremented, even if process() panics.

@@ -179,6 +190,8 @@ pub async fn run_server(

// Accept TLS connection in a separate task to avoid ownership issues
tokio::spawn(async move {
metrics::TCP_CONNECTIONS_TOTAL.inc();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: TCP connection metric incremented before TLS handshake. Consider moving after successful handshake to avoid counting failed connections.

Comment on lines +234 to +237
endpoint: "http://127.0.0.1:8091".into(),
scrape_interval: 15,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Port 8091 is already used by the worker service (see line 215). Consider using a different port for guard metrics to avoid conflicts.

Comment on lines +233 to +238
components::vector::PrometheusTarget {
endpoint: "http://127.0.0.1:8091".into(),
scrape_interval: 15,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing /metrics path in endpoint URL which is required for Prometheus scraping

Suggested change
components::vector::PrometheusTarget {
endpoint: "http://127.0.0.1:8091".into(),
scrape_interval: 15,
},
components::vector::PrometheusTarget {
endpoint: "http://127.0.0.1:8091/metrics".into(),
scrape_interval: 15,
},

// Start metrics server
let metrics_task = tokio::spawn(rivet_metrics::run_standalone(config.clone()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: No cleanup/shutdown handling for metrics task if main server exits or Ctrl+C is received. Should implement graceful shutdown.

Comment on lines +164 to 168
Err(e) => bail!(
"Failed to build dynamic hostname actor routing regex: {}",
e
),
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Error handling style is inconsistent with the static regex case below (lines 181-184). Consider using the same style for both cases.

Copy link

cloudflare-workers-and-pages bot commented May 27, 2025

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: bd93030
Status: ✅  Deploy successful!
Preview URL: https://73a1cb0c.rivet.pages.dev
Branch Preview URL: https://05-27-fix-guard-add-metrics.rivet.pages.dev

View logs

@MasterPtato MasterPtato force-pushed the 05-27-fix_guard_add_metrics branch 2 times, most recently from 6db518e to 240796d Compare May 27, 2025 23:17
Copy link
Member

@NathanFlurry NathanFlurry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MasterPtato MasterPtato force-pushed the 05-27-fix_guard_add_metrics branch 2 times, most recently from 6c38ea5 to 37074b9 Compare May 28, 2025 01:02
@NathanFlurry NathanFlurry changed the base branch from main to graphite-base/2480 May 31, 2025 00:39
@NathanFlurry NathanFlurry force-pushed the 05-27-fix_guard_add_metrics branch from 37074b9 to 4d3b056 Compare May 31, 2025 00:39
@NathanFlurry NathanFlurry changed the base branch from graphite-base/2480 to 05-30-chore_expose_namespace_in_otel_attributes May 31, 2025 00:39
Copy link

cloudflare-workers-and-pages bot commented May 31, 2025

Deploying rivet-hub with  Cloudflare Pages  Cloudflare Pages

Latest commit: bd93030
Status: ✅  Deploy successful!
Preview URL: https://a72cdbb1.rivet-hub-7jb.pages.dev
Branch Preview URL: https://05-27-fix-guard-add-metrics.rivet-hub-7jb.pages.dev

View logs

Copy link

cloudflare-workers-and-pages bot commented May 31, 2025

Deploying rivet-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: bd93030
Status: ✅  Deploy successful!
Preview URL: https://2c56ae00.rivet-studio.pages.dev
Branch Preview URL: https://05-27-fix-guard-add-metrics.rivet-studio.pages.dev

View logs

@MasterPtato MasterPtato force-pushed the 05-27-fix_guard_add_metrics branch from 4d3b056 to bd93030 Compare June 2, 2025 18:13
@MasterPtato MasterPtato force-pushed the 05-30-chore_expose_namespace_in_otel_attributes branch from 8f8d1e6 to 7b38fd6 Compare June 2, 2025 18:13
@NathanFlurry NathanFlurry force-pushed the 05-30-chore_expose_namespace_in_otel_attributes branch from 7b38fd6 to 3115384 Compare June 2, 2025 20:12
@NathanFlurry NathanFlurry force-pushed the 05-27-fix_guard_add_metrics branch from bd93030 to 617a367 Compare June 2, 2025 20:12
@graphite-app graphite-app bot closed this Jun 3, 2025
@graphite-app graphite-app bot deleted the 05-27-fix_guard_add_metrics branch June 3, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants