Skip to content

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Nov 5, 2025

Enable them in opentelemetry-cpp by defining
ENABLE_OTLP_RETRY_PREVIEW and also them to our GrpcClient. The values we're using with a default deadline of 10 seconds are:

  • max_attemps: 5
  • initial_backoff: 500ms
  • max_backoff: 5s
  • backoff_multiplier: 2

Add NSOLID_GRPC_DEADLINE env var for configurable gRPC timeouts. Specially useful for testing.

Refactor and dry code when creating grpc exporters inside the new GrpcAgent::configure_grpc_exporters() helper function.

Create parse_env_var_int() utility in nsolid_util.h for safe env parsing.

Summary by CodeRabbit

  • New Features

    • Configurable gRPC deadlines via environment variable to control RPC timeouts
    • gRPC retry policy applied to exporters/channels for more resilient requests
    • Enabled OTLP HTTP exporter retry preview flag
  • Bug Fixes

    • Faster agent shutdown with reduced exit timeout
  • Tests

    • Updated tests and timings to align with new gRPC deadline behavior

@santigimeno santigimeno self-assigned this Nov 5, 2025
@santigimeno santigimeno force-pushed the santi/grpc_retry_policy branch from 689c432 to 60e7889 Compare November 5, 2025 10:54
@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds configurable gRPC deadlines via NSOLID_GRPC_DEADLINE and cached parsing helper; new GrpcClient MakeClientContext overload that accepts deadline seconds; channels receive a JSON service-config for retry policy; AssetStream and send_exit set per-call deadlines; GrpcAgent centralizes OTLP exporter/client setup; tests' timeouts adjusted.

Changes

Cohort / File(s) Summary
gRPC client API & deadline helper
agents/grpc/src/grpc_client.h, agents/grpc/src/grpc_client.cc
Added MakeClientContext(..., int deadline_seconds) overload and get_grpc_deadline_seconds() env-backed cached helper; existing two-arg MakeClientContext now forwards using the env value; MakeChannel constructs and applies a JSON service-config with retry policy via channel args.
Asset stream deadline
agents/grpc/src/asset_stream.cc
AssetStream constructor sets context_.set_deadline(system_clock::now() + seconds(get_grpc_deadline_seconds())), making asset stream RPCs use the configured deadline.
GrpcAgent exporter configuration
agents/grpc/src/grpc_agent.h, agents/grpc/src/grpc_agent.cc
Added configure_grpc_exporters(const std::string&, bool) to centralize OTLP exporter and client setup; changed config() signature to accept const nlohmann::json&; send_exit() uses a client context with a 1s timeout; parsing of insecure flag uses utils::parse_env_var_int.
Env parsing utility
src/nsolid/nsolid_util.h
Added int parse_env_var_int(const char* env_name, int default_value, int min_value = INT_MIN, int max_value = INT_MAX) which reads, parses, bounds-checks and caches integer env vars; added necessary includes (<cerrno>, <cstdlib>, <climits>, and env.h).
Build flags
deps/opentelemetry-cpp/otlp-http-exporter.gyp
Added ENABLE_OTLP_RETRY_PREVIEW compile-time define to the otlp-http-exporter build settings.
Tests: deadline and timing adjustments
test/parallel/test-nsolid-grpc-heap-profile.js, test/parallel/test-nsolid-grpc-heap-sampling.js, test/parallel/test-nsolid-grpc-profile.js
Tests set NSOLID_GRPC_DEADLINE='2' and increased specific timeouts (e.g., 100ms → 2000ms) to align with the shorter gRPC deadline and sequencing requirements.

Sequence Diagram(s)

sequenceDiagram
    participant Env as Environment
    participant Util as parse_env_var_int()
    participant GrpcClient as GrpcClient
    participant Channel as gRPC Channel
    participant Context as ClientContext
    participant Asset as AssetStream
    participant Server as gRPC Server

    Env->>Util: read NSOLID_GRPC_DEADLINE
    Util->>Util: parse, validate & cache seconds

    Note over GrpcClient,Context: MakeClientContext with deadline
    GrpcClient->>GrpcClient: get_grpc_deadline_seconds()
    GrpcClient->>Context: MakeClientContext(agent_id, saas, deadline)
    Context->>Context: set_deadline(now + seconds(deadline))
    Context->>Context: add agent_id / saas metadata

    Note over Channel: Channel created with service-config (retry JSON)
    GrpcClient->>Channel: SetServiceConfigJSON(retry policy)

    Asset->>Context: AssetStream uses context with deadline
    Context->>Server: Invoke RPC (deadline enforced)
    Server-->>Context: respond or deadline error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Review points:
    • Correct parsing, caching and bounds handling in parse_env_var_int() / get_grpc_deadline_seconds().
    • Ensure all RPC call sites use the new MakeClientContext overload or appropriate timeouts (AssetStream, send_exit, exporters).
    • Verify service-config JSON structure and that SetServiceConfigJSON applies as intended.
    • Check configure_grpc_exporters() extraction and the impact of config() signature change.
    • Validate test timing adjustments for flakiness.

Poem

🐰 I sniff the env for a number true,
Deadlines set and cached anew,
Context ticks and retries learn,
Channels hum while timers turn,
A tiny hop — RPCs return.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'agents: implement gRPC retry policies' accurately reflects the main objective: enabling gRPC retry policies with ENABLE_OTLP_RETRY_PREVIEW, configurable deadlines, and retry parameters.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch santi/grpc_retry_policy

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
agents/grpc/src/grpc_client.cc (1)

53-91: Review service config JSON formatting for maintainability.

The current approach uses fixed-width placeholders (%0000000000u, %0000000000.1f) and snprintf with clamping to format the retry policy configuration. While this works correctly, it has some maintainability concerns:

  • The placeholder format strings must exactly match the allocation size
  • Changes to the template require careful coordination with buffer sizing
  • The const_cast is necessary but somewhat obscure (though commented)

Consider using a more maintainable approach in the future, such as:

  • JSON library serialization (nlohmann::json is already in use elsewhere)
  • std::ostringstream for type-safe formatting
  • C++20 std::format when available

Example alternative:

nlohmann::json service_config = {
  {"methodConfig", {{
    {"name", {{}}},
    {"retryPolicy", {
      {"maxAttempts", options.retry_policy_max_attempts},
      {"initialBackoff", std::to_string(initial_backoff) + "s"},
      {"maxBackoff", std::to_string(max_backoff) + "s"},
      {"backoffMultiplier", backoff_multiplier},
      {"retryableStatusCodes", {"CANCELLED", "DEADLINE_EXCEEDED", ...}}
    }}
  }}}
};
grpc_arguments.SetServiceConfigJSON(service_config.dump());

However, the current implementation is correct and handles edge cases properly with the min/max clamping.

agents/grpc/src/grpc_agent.cc (1)

996-1068: Excellent refactoring! Centralized exporter configuration.

The new configure_grpc_exporters() method effectively consolidates OTLP exporter setup, making the code more maintainable. The retry policy configuration (max_attempts=5, initial_backoff=500ms, max_backoff=5s, multiplier=2.0) is correctly applied to both the client and exporters.

Optional: Extract retry policy constants to reduce duplication.

The retry policy parameters are duplicated at lines 1018-1021 and 1045-1048. Consider extracting these to named constants if you want to improve maintainability:

// At class or file scope
static constexpr size_t kRetryMaxAttempts = 5;
static constexpr auto kRetryInitialBackoff = std::chrono::milliseconds(500);
static constexpr auto kRetryMaxBackoff = std::chrono::seconds(5);
static constexpr float kRetryBackoffMultiplier = 2.0f;

Then reference these constants in both locations. This is a nice-to-have improvement that can be deferred.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ed7616 and 60e7889.

📒 Files selected for processing (10)
  • agents/grpc/src/asset_stream.cc (3 hunks)
  • agents/grpc/src/grpc_agent.cc (4 hunks)
  • agents/grpc/src/grpc_agent.h (1 hunks)
  • agents/grpc/src/grpc_client.cc (4 hunks)
  • agents/grpc/src/grpc_client.h (2 hunks)
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp (2 hunks)
  • src/nsolid/nsolid_util.h (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-profile.js (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-sampling.js (3 hunks)
  • test/parallel/test-nsolid-grpc-profile.js (3 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-07-15T16:09:04.521Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/nsolid-code-event-hook.js:47-73
Timestamp: 2025-07-15T16:09:04.521Z
Learning: In the nsolid test suite, hardcoded timeouts are acceptable for timing-sensitive tests. The preferred approach when tests become flaky is to increase timeout values rather than implementing more complex synchronization mechanisms.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:46:47.806Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:13-14
Timestamp: 2025-07-08T14:46:47.806Z
Learning: In the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:47:34.724Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:16-18
Timestamp: 2025-07-08T14:47:34.724Z
Learning: In the nsolid test suite, external tool invocations (e.g., execSync to run `readelf`) are intentionally left uncaught so that any failure causes the test to fail rather than being skipped.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:48:04.827Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/binding.cc:13-28
Timestamp: 2025-07-08T14:48:04.827Z
Learning: In nsolid test native addons (e.g., `test/addons/*/binding.cc`), additional JavaScript-facing argument validation is typically omitted because the tests supply well-formed inputs.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-06-10T09:52:49.401Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/binding.cc:64-78
Timestamp: 2025-06-10T09:52:49.401Z
Learning: In the nsolid codebase, test files in test/addons/ consistently use assert() for input validation and runtime checks, not CHECK() macros. This is the established testing convention.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-10-14T10:18:23.123Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 377
File: test/parallel/test-nsolid-heap-profile-stream.js:81-88
Timestamp: 2025-10-14T10:18:23.123Z
Learning: In nsolid.heapProfileStream() and nsolid.heapSamplingStream(), synchronous throws occur for argument validation errors and immediate state validation checks (like assets being disabled), while async error events are emitted for runtime errors that manifest during execution (like invalid threadId).

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
🧬 Code graph analysis (6)
agents/grpc/src/grpc_agent.h (1)
agents/grpc/src/grpc_agent.cc (2)
  • configure_grpc_exporters (996-1068)
  • configure_grpc_exporters (996-997)
agents/grpc/src/asset_stream.cc (1)
agents/grpc/src/grpc_client.cc (2)
  • get_grpc_deadline_seconds (27-33)
  • get_grpc_deadline_seconds (27-27)
agents/grpc/src/grpc_client.cc (1)
src/nsolid/nsolid_util.h (3)
  • parse_env_var_int (72-88)
  • string (91-99)
  • string (146-150)
agents/grpc/src/grpc_agent.cc (2)
agents/grpc/src/grpc_client.cc (6)
  • MakeNSolidServiceStub (140-143)
  • MakeNSolidServiceStub (141-141)
  • MakeClientContext (113-118)
  • MakeClientContext (114-115)
  • MakeClientContext (123-135)
  • MakeClientContext (124-126)
src/nsolid/nsolid_util.h (1)
  • parse_env_var_int (72-88)
test/parallel/test-nsolid-grpc-heap-profile.js (2)
agents/grpc/src/grpc_client.h (1)
  • nsolid (26-162)
src/nsolid/nsolid_util.h (1)
  • nsolid (40-277)
agents/grpc/src/grpc_client.h (1)
agents/grpc/src/grpc_client.cc (6)
  • MakeClientContext (113-118)
  • MakeClientContext (114-115)
  • MakeClientContext (123-135)
  • MakeClientContext (124-126)
  • get_grpc_deadline_seconds (27-33)
  • get_grpc_deadline_seconds (27-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: test-macOS
  • GitHub Check: test-linux (ubuntu-24.04)
  • GitHub Check: test-linux (ubuntu-24.04-arm)
  • GitHub Check: coverage-linux-without-intl
  • GitHub Check: lint-js-and-md
  • GitHub Check: lint-cpp
  • GitHub Check: coverage-windows
  • GitHub Check: coverage-linux
  • GitHub Check: build-tarball
🔇 Additional comments (16)
deps/opentelemetry-cpp/otlp-http-exporter.gyp (1)

66-66: LGTM! Retry preview feature flag enabled correctly.

The ENABLE_OTLP_RETRY_PREVIEW define is properly added to both the target compilation settings and direct dependent settings, which will enable the OTLP retry functionality as intended.

Also applies to: 79-79

test/parallel/test-nsolid-grpc-heap-profile.js (2)

3-5: Good practice: Environment variable set before module requires.

Setting NSOLID_GRPC_DEADLINE before requiring modules ensures the cached deadline value is initialized correctly, since get_grpc_deadline_seconds() caches the value on first access.


113-132: LGTM! Timing adjustment aligns with new gRPC deadline behavior.

The 2000ms delay ensures the previous heap profile operation completes before starting new profile tests. This is consistent with the shortened 2-second gRPC deadline and follows the established test suite pattern of using hardcoded timeouts for timing-sensitive tests.

Based on learnings.

test/parallel/test-nsolid-grpc-heap-sampling.js (1)

3-5: LGTM! Consistent deadline configuration and timing adjustments.

The test follows the same pattern as other gRPC tests in this PR, setting a shortened deadline and adjusting wait times accordingly.

Also applies to: 156-175

src/nsolid/nsolid_util.h (2)

16-18: Required includes added for environment variable parsing.

The new includes support the parse_env_var_int utility function implementation.


72-88: Robust environment variable parsing utility.

The implementation properly validates integer environment variables with:

  • Empty string handling
  • strtol parsing with errno checking
  • Range validation
  • Debug logging for invalid values
agents/grpc/src/grpc_client.cc (2)

25-33: Verify: Cached deadline cannot be changed at runtime.

The static cache in get_grpc_deadline_seconds() means the deadline value is read once and cached forever. If runtime reconfiguration of the deadline is needed in the future, this implementation will need to be updated.

Is the immutable caching behavior intentional? If so, consider documenting this limitation.


114-135: LGTM! Deadline support added with clean overload pattern.

The two-overload design cleanly separates the default (environment-driven) deadline behavior from explicit deadline specification, and the forwarding implementation avoids code duplication.

agents/grpc/src/asset_stream.cc (1)

2-2: LGTM! Asset stream deadline configured consistently.

The deadline is applied in the constructor, ensuring all asset stream RPCs respect the configured gRPC timeout. This is consistent with the deadline handling in other gRPC client contexts.

Also applies to: 12-13, 33-35

test/parallel/test-nsolid-grpc-profile.js (1)

3-5: LGTM! Timing adjustments consistent across gRPC test suite.

This test follows the same deadline configuration and timing adjustment pattern as the other gRPC tests modified in this PR (test-nsolid-grpc-heap-profile.js and test-nsolid-grpc-heap-sampling.js).

Based on learnings.

Also applies to: 101-120

agents/grpc/src/grpc_agent.h (1)

205-205: Good refactoring: Centralized exporter configuration.

The new configure_grpc_exporters method consolidates OTLP exporter setup logic, improving maintainability by reducing code duplication in the configuration flow.

agents/grpc/src/grpc_client.h (2)

71-77: LGTM! Clean deadline support added.

The new MakeClientContext overload with configurable deadline is well-documented and integrates cleanly with the existing API. The implementation correctly forwards the 2-parameter version to this 3-parameter version using the environment-driven deadline.


159-160: LGTM! Helper function properly declared.

The helper function declaration is clear and will enable consistent deadline configuration across the codebase via the NSOLID_GRPC_DEADLINE environment variable.

agents/grpc/src/grpc_agent.cc (3)

2-2: LGTM! Include added for utility function.

The nsolid_util.h include is correctly added to access the parse_env_var_int utility used on line 1098.


1070-1104: LGTM! Config method improved with helper call and safer parsing.

The changes correctly refactor exporter setup to use the new configure_grpc_exporters() helper, and the switch to utils::parse_env_var_int() provides safer environment variable parsing with proper bounds checking.


1802-1802: LGTM! Appropriate short deadline for exit event.

Using a 1-second deadline for the exit event is a smart choice to ensure fast shutdown without hanging. This is a good use of the new deadline parameter.

@santigimeno santigimeno force-pushed the santi/grpc_retry_policy branch 2 times, most recently from fc50f8f to a41deaf Compare November 5, 2025 11:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/nsolid/nsolid_util.h (1)

72-91: Inconsistency with AI summary regarding logging.

The AI summary states "Logging: logs an invalid-value message via Debug when parsing fails or value is out of range," but the function does not contain any logging code. Consider adding logging to help with debugging invalid environment variable values, or update the summary.

Consider using long instead of int64_t for the parsed value.

The function uses std::strtol which returns long, but assigns the result to int64_t. While this works correctly (overflow is caught by the errno check), it's clearer to use long as the intermediate type to match strtol's return type.

Apply this diff for better type consistency:

       errno = 0;
       char* endptr = nullptr;
-      int64_t parsed = std::strtol(raw.c_str(), &endptr, 10);
+      long parsed = std::strtol(raw.c_str(), &endptr, 10);
       if (errno == 0 && endptr != nullptr && *endptr == '\0' &&
           parsed >= min_value && parsed <= max_value) {
         return static_cast<int>(parsed);
       }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60e7889 and a41deaf.

📒 Files selected for processing (10)
  • agents/grpc/src/asset_stream.cc (3 hunks)
  • agents/grpc/src/grpc_agent.cc (5 hunks)
  • agents/grpc/src/grpc_agent.h (1 hunks)
  • agents/grpc/src/grpc_client.cc (4 hunks)
  • agents/grpc/src/grpc_client.h (2 hunks)
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp (2 hunks)
  • src/nsolid/nsolid_util.h (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-profile.js (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-sampling.js (3 hunks)
  • test/parallel/test-nsolid-grpc-profile.js (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/parallel/test-nsolid-grpc-profile.js
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • agents/grpc/src/asset_stream.cc
  • test/parallel/test-nsolid-grpc-heap-sampling.js
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-07-15T16:09:04.521Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/nsolid-code-event-hook.js:47-73
Timestamp: 2025-07-15T16:09:04.521Z
Learning: In the nsolid test suite, hardcoded timeouts are acceptable for timing-sensitive tests. The preferred approach when tests become flaky is to increase timeout values rather than implementing more complex synchronization mechanisms.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:46:47.806Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:13-14
Timestamp: 2025-07-08T14:46:47.806Z
Learning: In the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:48:04.827Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/binding.cc:13-28
Timestamp: 2025-07-08T14:48:04.827Z
Learning: In nsolid test native addons (e.g., `test/addons/*/binding.cc`), additional JavaScript-facing argument validation is typically omitted because the tests supply well-formed inputs.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-07-08T14:47:34.724Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:16-18
Timestamp: 2025-07-08T14:47:34.724Z
Learning: In the nsolid test suite, external tool invocations (e.g., execSync to run `readelf`) are intentionally left uncaught so that any failure causes the test to fail rather than being skipped.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-06-10T09:52:49.401Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/binding.cc:64-78
Timestamp: 2025-06-10T09:52:49.401Z
Learning: In the nsolid codebase, test files in test/addons/ consistently use assert() for input validation and runtime checks, not CHECK() macros. This is the established testing convention.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
📚 Learning: 2025-10-14T10:18:23.123Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 377
File: test/parallel/test-nsolid-heap-profile-stream.js:81-88
Timestamp: 2025-10-14T10:18:23.123Z
Learning: In nsolid.heapProfileStream() and nsolid.heapSamplingStream(), synchronous throws occur for argument validation errors and immediate state validation checks (like assets being disabled), while async error events are emitted for runtime errors that manifest during execution (like invalid threadId).

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
🧬 Code graph analysis (5)
test/parallel/test-nsolid-grpc-heap-profile.js (2)
agents/grpc/src/grpc_client.h (1)
  • nsolid (26-162)
src/nsolid/nsolid_util.h (1)
  • nsolid (40-280)
agents/grpc/src/grpc_client.cc (1)
src/nsolid/nsolid_util.h (3)
  • parse_env_var_int (72-91)
  • string (94-102)
  • string (149-153)
agents/grpc/src/grpc_agent.h (1)
agents/grpc/src/grpc_agent.cc (2)
  • configure_grpc_exporters (1001-1073)
  • configure_grpc_exporters (1001-1002)
agents/grpc/src/grpc_client.h (1)
agents/grpc/src/grpc_client.cc (6)
  • MakeClientContext (113-118)
  • MakeClientContext (114-115)
  • MakeClientContext (123-135)
  • MakeClientContext (124-126)
  • get_grpc_deadline_seconds (27-33)
  • get_grpc_deadline_seconds (27-27)
agents/grpc/src/grpc_agent.cc (2)
agents/grpc/src/grpc_client.cc (6)
  • MakeNSolidServiceStub (140-143)
  • MakeNSolidServiceStub (141-141)
  • MakeClientContext (113-118)
  • MakeClientContext (114-115)
  • MakeClientContext (123-135)
  • MakeClientContext (124-126)
src/nsolid/nsolid_util.h (1)
  • parse_env_var_int (72-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test-macOS
  • GitHub Check: test-linux (ubuntu-24.04-arm)
  • GitHub Check: test-linux (ubuntu-24.04)
  • GitHub Check: build-tarball
  • GitHub Check: coverage-windows
  • GitHub Check: coverage-linux-without-intl
  • GitHub Check: coverage-linux
  • GitHub Check: lint-js-and-md
🔇 Additional comments (17)
src/nsolid/nsolid_util.h (1)

16-18: LGTM!

The new includes are necessary for the parse_env_var_int function: <cerrno> for errno handling, <cstdlib> for strtol, and <climits> for INT_MIN/INT_MAX constants.

agents/grpc/src/grpc_agent.h (1)

205-205: LGTM! Clean refactoring to centralize exporter configuration.

The new public method appropriately extracts OTLP exporter and client setup logic from the config() method, improving code organization and maintainability.

test/parallel/test-nsolid-grpc-heap-profile.js (2)

3-5: LGTM! Appropriate test optimization.

Setting a shorter gRPC deadline (2 seconds) for tests avoids unnecessary delays while still providing sufficient time for operations to complete.


113-132: LGTM! Necessary timing adjustment for new deadline behavior.

The 2-second delay ensures the previous heap profile operation completes before starting new profile tests. This aligns with the shortened gRPC deadline and prevents EInProgressError from concurrent profile attempts.

agents/grpc/src/grpc_client.h (2)

71-77: LGTM! Well-designed API extension.

The new overload provides flexibility for custom per-call deadlines while preserving backward compatibility through the existing two-argument version that forwards to this implementation.


159-160: LGTM! Appropriate helper function.

The helper function encapsulates deadline retrieval from the environment with caching, promoting code reuse across the gRPC subsystem.

agents/grpc/src/grpc_client.cc (5)

5-7: LGTM! Necessary headers for deadline support.

The added headers support the environment variable parsing and provide necessary constants for the deadline implementation.


18-19: LGTM! Solid deadline caching implementation.

The environment-based deadline support with static caching is thread-safe and provides sensible defaults (10 seconds) with appropriate bounds (1 second minimum).

Also applies to: 25-33


113-118: LGTM! Appropriate delegation pattern.

The two-argument overload correctly forwards to the new three-argument version using the environment-configured deadline, preserving backward compatibility while enabling the new functionality.


120-135: LGTM! Clean implementation.

The three-argument overload correctly sets the gRPC context deadline based on the provided seconds value and maintains the existing metadata handling.


53-91: Code is safe; original review concern was unfounded.

The verification confirms the buffer allocation is sufficient. The format specifiers %0000000000u and %0000000000.1f effectively reserve space within the template string (490 bytes). Even with worst-case values (max uint32 and floats clamped to 999999999.0), the formatted JSON output (479 bytes) fits safely within the allocated buffer with an 11-byte safety margin. No buffer overflow or truncation occurs. The code requires no changes.

agents/grpc/src/grpc_agent.cc (6)

2-2: LGTM! Necessary include.

The include provides the parse_env_var_int utility function used later in the file.


66-70: LGTM! Appropriate retry configuration.

The retry policy constants are well-defined and match the PR description (max 5 attempts, 500ms to 5s exponential backoff with 2x multiplier). These values align with gRPC retry best practices.


1001-1073: LGTM! Solid refactoring.

The new method effectively centralizes OTLP exporter and client setup, reducing code duplication through the set_common_options lambda. The retry policy is consistently applied across all exporters, and the comment about CommandStream creation order helps prevent potential race conditions.


1075-1075: LGTM! Improved type safety and error handling.

Line 1075: Making the JSON type explicit removes ambiguity and improves readability.

Line 1103: Using parse_env_var_int instead of stoull provides safer environment variable parsing with bounds checking (0-1 for the boolean insecure flag) and graceful error handling.

Also applies to: 1103-1103


1109-1109: LGTM! Clean refactoring.

Replacing inline exporter setup with the call to configure_grpc_exporters() improves code organization while maintaining the same functionality.


1807-1808: LGTM! Appropriate short deadline for exit.

Using a 1-second deadline for the exit event ensures the process terminates quickly rather than waiting for the default 10-second timeout. This is reasonable since the process is shutting down regardless of success or failure.

@santigimeno santigimeno force-pushed the santi/grpc_retry_policy branch from a41deaf to 3c83261 Compare November 5, 2025 12:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a41deaf and 3c83261.

📒 Files selected for processing (10)
  • agents/grpc/src/asset_stream.cc (3 hunks)
  • agents/grpc/src/grpc_agent.cc (5 hunks)
  • agents/grpc/src/grpc_agent.h (1 hunks)
  • agents/grpc/src/grpc_client.cc (4 hunks)
  • agents/grpc/src/grpc_client.h (2 hunks)
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp (2 hunks)
  • src/nsolid/nsolid_util.h (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-profile.js (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-sampling.js (3 hunks)
  • test/parallel/test-nsolid-grpc-profile.js (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • agents/grpc/src/grpc_client.h
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • agents/grpc/src/grpc_agent.h
  • agents/grpc/src/asset_stream.cc
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-07-15T16:09:04.521Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/nsolid-code-event-hook.js:47-73
Timestamp: 2025-07-15T16:09:04.521Z
Learning: In the nsolid test suite, hardcoded timeouts are acceptable for timing-sensitive tests. The preferred approach when tests become flaky is to increase timeout values rather than implementing more complex synchronization mechanisms.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:46:47.806Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:13-14
Timestamp: 2025-07-08T14:46:47.806Z
Learning: In the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:48:04.827Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/binding.cc:13-28
Timestamp: 2025-07-08T14:48:04.827Z
Learning: In nsolid test native addons (e.g., `test/addons/*/binding.cc`), additional JavaScript-facing argument validation is typically omitted because the tests supply well-formed inputs.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:47:34.724Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:16-18
Timestamp: 2025-07-08T14:47:34.724Z
Learning: In the nsolid test suite, external tool invocations (e.g., execSync to run `readelf`) are intentionally left uncaught so that any failure causes the test to fail rather than being skipped.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-06-10T09:52:49.401Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/binding.cc:64-78
Timestamp: 2025-06-10T09:52:49.401Z
Learning: In the nsolid codebase, test files in test/addons/ consistently use assert() for input validation and runtime checks, not CHECK() macros. This is the established testing convention.

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-10-14T10:18:23.123Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 377
File: test/parallel/test-nsolid-heap-profile-stream.js:81-88
Timestamp: 2025-10-14T10:18:23.123Z
Learning: In nsolid.heapProfileStream() and nsolid.heapSamplingStream(), synchronous throws occur for argument validation errors and immediate state validation checks (like assets being disabled), while async error events are emitted for runtime errors that manifest during execution (like invalid threadId).

Applied to files:

  • test/parallel/test-nsolid-grpc-heap-profile.js
  • test/parallel/test-nsolid-grpc-profile.js
🧬 Code graph analysis (3)
test/parallel/test-nsolid-grpc-heap-profile.js (2)
agents/grpc/src/grpc_client.h (1)
  • nsolid (26-162)
src/nsolid/nsolid_util.h (1)
  • nsolid (41-281)
agents/grpc/src/grpc_agent.cc (2)
agents/grpc/src/grpc_client.cc (6)
  • MakeNSolidServiceStub (140-143)
  • MakeNSolidServiceStub (141-141)
  • MakeClientContext (113-118)
  • MakeClientContext (114-115)
  • MakeClientContext (123-135)
  • MakeClientContext (124-126)
src/nsolid/nsolid_util.h (1)
  • parse_env_var_int (73-92)
agents/grpc/src/grpc_client.cc (1)
src/nsolid/nsolid_util.h (3)
  • parse_env_var_int (73-92)
  • string (95-103)
  • string (150-154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: coverage-windows
  • GitHub Check: lint-js-and-md
  • GitHub Check: test-macOS
  • GitHub Check: test-linux (ubuntu-24.04-arm)
  • GitHub Check: test-linux (ubuntu-24.04)
  • GitHub Check: coverage-linux-without-intl
  • GitHub Check: coverage-linux
  • GitHub Check: build-tarball
🔇 Additional comments (2)
deps/opentelemetry-cpp/otlp-http-exporter.gyp (2)

62-67: LGTM!

The ENABLE_OTLP_RETRY_PREVIEW flag is correctly added to the top-level defines, enabling the retry preview feature at compile time. The syntax and placement are consistent with existing flags.


75-80: LGTM!

Correctly propagates ENABLE_OTLP_RETRY_PREVIEW to direct dependent targets, ensuring the retry feature is enabled consistently across the build. This follows the same pattern as other feature flags in the configuration.

@santigimeno santigimeno force-pushed the santi/grpc_retry_policy branch 3 times, most recently from 9c2e3f4 to 3c83261 Compare November 5, 2025 13:33
Enable them in opentelemetry-cpp by defining
`ENABLE_OTLP_RETRY_PREVIEW` and also them to our GrpcClient.
The values we're using with a default deadline of 10 seconds are:
  - max_attemps: 5
  - initial_backoff: 500ms
  - max_backoff: 5s
  - backoff_multiplier: 2

Add NSOLID_GRPC_DEADLINE env var for configurable gRPC timeouts.
Specially useful for testing.

Refactor and dry code when creating grpc exporters inside the new
GrpcAgent::configure_grpc_exporters() helper function.

Create parse_env_var_int() utility in nsolid_util.h for safe env
parsing.
@santigimeno santigimeno force-pushed the santi/grpc_retry_policy branch from 3c83261 to e74e183 Compare November 5, 2025 13:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c83261 and e74e183.

📒 Files selected for processing (10)
  • agents/grpc/src/asset_stream.cc (3 hunks)
  • agents/grpc/src/grpc_agent.cc (5 hunks)
  • agents/grpc/src/grpc_agent.h (1 hunks)
  • agents/grpc/src/grpc_client.cc (4 hunks)
  • agents/grpc/src/grpc_client.h (2 hunks)
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp (2 hunks)
  • src/nsolid/nsolid_util.h (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-profile.js (2 hunks)
  • test/parallel/test-nsolid-grpc-heap-sampling.js (3 hunks)
  • test/parallel/test-nsolid-grpc-profile.js (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • test/parallel/test-nsolid-grpc-heap-sampling.js
  • test/parallel/test-nsolid-grpc-heap-profile.js
  • agents/grpc/src/grpc_agent.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • agents/grpc/src/grpc_client.h
  • src/nsolid/nsolid_util.h
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-07-15T16:09:04.521Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/nsolid-code-event-hook.js:47-73
Timestamp: 2025-07-15T16:09:04.521Z
Learning: In the nsolid test suite, hardcoded timeouts are acceptable for timing-sensitive tests. The preferred approach when tests become flaky is to increase timeout values rather than implementing more complex synchronization mechanisms.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:46:47.806Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:13-14
Timestamp: 2025-07-08T14:46:47.806Z
Learning: In the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:47:34.724Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/nsolid-elf-utils.js:16-18
Timestamp: 2025-07-08T14:47:34.724Z
Learning: In the nsolid test suite, external tool invocations (e.g., execSync to run `readelf`) are intentionally left uncaught so that any failure causes the test to fail rather than being skipped.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T14:48:04.827Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: test/addons/nsolid-elf-utils/binding.cc:13-28
Timestamp: 2025-07-08T14:48:04.827Z
Learning: In nsolid test native addons (e.g., `test/addons/*/binding.cc`), additional JavaScript-facing argument validation is typically omitted because the tests supply well-formed inputs.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-06-10T09:52:49.401Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 327
File: test/addons/nsolid-code-event-hook/binding.cc:64-78
Timestamp: 2025-06-10T09:52:49.401Z
Learning: In the nsolid codebase, test files in test/addons/ consistently use assert() for input validation and runtime checks, not CHECK() macros. This is the established testing convention.

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-10-14T10:18:23.123Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 377
File: test/parallel/test-nsolid-heap-profile-stream.js:81-88
Timestamp: 2025-10-14T10:18:23.123Z
Learning: In nsolid.heapProfileStream() and nsolid.heapSamplingStream(), synchronous throws occur for argument validation errors and immediate state validation checks (like assets being disabled), while async error events are emitted for runtime errors that manifest during execution (like invalid threadId).

Applied to files:

  • test/parallel/test-nsolid-grpc-profile.js
📚 Learning: 2025-07-08T16:05:45.341Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 339
File: src/nsolid/nsolid_elf_utils.cc:18-18
Timestamp: 2025-07-08T16:05:45.341Z
Learning: In the NSolid project, the `GetBuildId` function in `src/nsolid/nsolid_elf_utils.cc` is designed to be called only from the NSolid thread, so the static cache doesn't require thread safety mechanisms like mutex protection.

Applied to files:

  • agents/grpc/src/grpc_client.cc
📚 Learning: 2025-11-05T13:30:16.195Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 387
File: agents/grpc/src/grpc_client.cc:81-97
Timestamp: 2025-11-05T13:30:16.195Z
Learning: In gRPC's ChannelArguments::SetServiceConfigJSON(), trailing NUL bytes in the std::string argument don't cause issues because downstream gRPC code uses c_str() to treat it as a C NUL-terminated string, reading only up to the first NUL terminator.

Applied to files:

  • agents/grpc/src/grpc_client.cc
📚 Learning: 2025-08-20T11:17:56.528Z
Learnt from: santigimeno
Repo: nodesource/nsolid PR: 354
File: test/agents/test-grpc-blocked-loop.mjs:224-224
Timestamp: 2025-08-20T11:17:56.528Z
Learning: In the gRPC agent token parsing (agents/grpc/src/grpc_agent.cc), CONSOLE_ID_SIZE = 36 is used for generating console IDs, not for validating console ID length during SAAS token parsing. Token parsing does not enforce console ID length constraints, so test tokens with console ID portions of different lengths (like 32 characters) work correctly.

Applied to files:

  • agents/grpc/src/grpc_client.cc
🔇 Additional comments (2)
agents/grpc/src/asset_stream.cc (2)

2-2: LGTM: Include and using declarations are appropriate.

The include of grpc_client.h and the using declarations for chrono types are necessary for the deadline functionality and follow standard C++ practices.

Also applies to: 12-13


33-35: Verify deadline behavior for long-running streaming RPCs.

The deadline is set once in the constructor and applies to the entire RPC lifetime. For streaming RPCs like ExportContinuousProfile (line 39), this means the stream will terminate when the deadline is reached (default 10 seconds per PR summary).

Please verify:

  1. Are continuous profiling streams designed to be short-lived and periodically recreated?
  2. If streams need to stay open longer than the deadline, consider either:
    • Using different deadline strategies per RPC type
    • Relying on gRPC keepalive settings for long-running streams
    • Documenting that NSOLID_GRPC_DEADLINE must be set appropriately for continuous profiling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants