Skip to content

Multi-gateway partially connected network test fails in CI with channel closed errors #2029

@sanity

Description

@sanity

Problem

The multi-gateway variant of the partially connected network test fails in CI with "channel closed" errors during gateway startup, despite passing consistently in local testing.

Background

This issue was discovered while working on #2022 to fix partially connected network tests. The primary blocker (missing contract feature flag) has been resolved, which enabled the single-gateway test variant to pass reliably. However, the multi-gateway variant exhibits CI-specific failures.

Test Details

  • Single-gateway test (run_app_partially_connected_network.rs): 1 gateway, 7 nodes - ✅ Works reliably in both local and CI
  • Multi-gateway test (run_app.rs): 3 gateways, 7 nodes - ❌ Fails in CI, passes locally

Error Pattern

CI failure logs show:

ERROR freenet::operations::connect: Failed while attempting connection to gateway, 
  gateway: v6MWKgqJ66B21v7a, error: failed notifying, channel closed

Gateway node failed: channel closed

The error occurs during gateway initialization (within ~90 seconds), not during the test logic execution.

Observations

  1. Works locally: Test passes consistently on local development machines
  2. Fails in CI: GitHub Actions environment shows gateway coordination failures
  3. Timing dependency: Increasing initialization delays (2s → 10s) did not resolve the issue
  4. Gateway-specific: Only affects tests with multiple (3+) gateways

Root Cause Hypothesis

The multi-gateway test appears to have a race condition or resource coordination issue when multiple gateways attempt to initialize simultaneously in resource-constrained CI environments. The failure pattern suggests:

  • Gateways successfully start but internal channels close unexpectedly
  • Possible issue with gateway peer discovery/handshake coordination
  • May be related to gateway-to-gateway communication timing

Investigation Needed

  1. Gateway initialization logic: Review how multiple gateways coordinate during startup
  2. Channel lifecycle: Investigate why gateway internal channels close prematurely in CI
  3. Resource constraints: Determine if CI environment limitations trigger the issue
  4. Test architecture: Consider if test setup needs modification for multi-gateway scenarios

Workaround

The test is currently marked as #[ignore] with reference to this issue. The single-gateway variant provides partial test coverage for partially connected network functionality.

Files

  • apps/freenet-ping/app/tests/run_app.rs - Multi-gateway test (currently ignored)
  • apps/freenet-ping/app/tests/run_app_partially_connected_network.rs - Single-gateway test (working)

Related

Acceptance Criteria

  • Multi-gateway test passes reliably in CI
  • Understanding of why CI environment behaves differently than local
  • Remove #[ignore] annotation from test
  • Document any necessary test setup changes for multi-gateway scenarios

[AI-assisted issue creation]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-developer-xpArea: developer experienceA-networkingArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateP-mediumMedium priorityT-bugType: Something is broken

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions