-
Notifications
You must be signed in to change notification settings - Fork 30
Tunnel state handling on quic connection related failures. #3655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pronebird reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed (commit messages unreviewed), all discussions resolved (waiting on @jmwample)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pronebird reviewed 1 of 3 files at r2.
Reviewable status: 2 of 4 files reviewed, 1 unresolved discussion (waiting on @doums, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
nym-vpn-core/crates/nym-vpn-lib/src/tunnel_state_machine/tunnel_monitor.rs
line 891 at r2 (raw file):
let bridge_conn = transports::BridgeConn::try_connect( entry_bridge_params, self.shutdown_token.clone(),
Prefer child token to avoid uncontroller propagation of cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 2 unresolved discussions (waiting on @doums, @jmwample, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
nym-vpn-core/crates/nym-vpn-lib/src/tunnel_state_machine/tunnel_monitor.rs
line 899 at r2 (raw file):
transports::UdpForwarder::launch(bridge_conn, None, self.shutdown_token.clone()) .await .inspect_err(|_| self.shutdown_token.cancel())?;
Why is it needed to call cancellation, is it not enough to return an error from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pronebird reviewed 1 of 3 files at r2.
Reviewable status: 3 of 4 files reviewed, 2 unresolved discussions (waiting on @doums, @jmwample, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pronebird reviewed 1 of 3 files at r2.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @doums, @jmwample, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @doums, @mrkiwi, @neacsu, @pronebird, and @rokas-ambrazevicius)
nym-vpn-core/crates/nym-vpn-lib/src/tunnel_state_machine/tunnel_monitor.rs
line 891 at r2 (raw file):
Previously, pronebird (Andrej Mihajlov) wrote…
Prefer child token to avoid uncontroller propagation of cancellation.
This is the same issue as returning an error, these steps are on the critical connection path, so if they fail then the connection needs a clean retry. Using a child token here means I have to rely on the returned error triggering the retry properly, which it is not doing.
wrt. uncontrolled propagation of cancel - that seems like an issue for calling funtions to scope their token / child tokens properly.
nym-vpn-core/crates/nym-vpn-lib/src/tunnel_state_machine/tunnel_monitor.rs
line 899 at r2 (raw file):
Previously, pronebird (Andrej Mihajlov) wrote…
Why is it needed to call cancellation, is it not enough to return an error from here?
Returning an error didn't work in tests. When I just return an error then something doesn't get shutdown properly as it proceeds to retries. The first error will be reported properly and it moves to retries, it then continuously fails with the error:
2025-10-09T18:16:16.333164Z INFO perform_initial_authentication: nym_gateway_client::client: close time.busy=11.6ms time.idle=1.32s gateway=<GATEWAY_ID> gateway_address=wss://mainnet-gateway-de.dev.nymte.ch:9001/
2025-10-09T18:16:16.333356Z ERROR nym_client_core::client::base_client: Could not authenticate and start up the gateway connection - gateway returned an error response: There is already an open connection to this client
2025-10-09T18:16:16.334095Z ERROR nym_vpn_lib::tunnel_state_machine::tunnel_monitor: Error: Tunnel monitor exited with error
Caused by: tunnel error
Caused by: registration client error
Caused by: failed to connect to mixnet
Caused by: gateway client error (<GATEWAY_ID>): gateway returned an error response: There is already an open connection to this client
Caused by: gateway returned an error response: There is already an open connection to this client
2025-10-09T18:16:16.335200Z INFO nym_vpn_lib::tunnel_state_machine::states::connecting_state: Tunnel closed
2025-10-09T18:16:16.335295Z INFO nym_vpn_lib::tunnel_state_machine::states::connecting_state: Reconnecting, attempt 2
Cancelling the token ensures that on heading towards retry things are stopped and then started again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @doums, @jmwample, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
nym-vpn-core/crates/nym-vpn-lib/src/tunnel_state_machine/tunnel_monitor.rs
line 891 at r2 (raw file):
Previously, jmwample (Jack Wampler) wrote…
This is the same issue as returning an error, these steps are on the critical connection path, so if they fail then the connection needs a clean retry. Using a child token here means I have to rely on the returned error triggering the retry properly, which it is not doing.
wrt. uncontrolled propagation of cancel - that seems like an issue for calling funtions to scope their token / child tokens properly.
Alright I think I'll add a drop guard in TunnelMonitor::run_inner()
instead of doing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed (commit messages unreviewed), all discussions resolved (waiting on @doums, @mrkiwi, @neacsu, and @rokas-ambrazevicius)
Ticket
Net-596 / NET-512 / NET-343
Description
While connecting using the Quic transport, there was an issue (specifics don’t really matter, the gateway was incompatible with quic). While it was waiting in attempting to connect to a gateway I knew wouldn’t work I hit ctrl-c (or sent disconnect) to stop the connect attempt. The cancel blocked on the quic connection failng, i.e. timing out. This should definitely not block this way. The quic connection should be aborted when the connect attempt is cancelled.
If an error occurs in the UdpForwarder it will stop and cancel it’s (local) CancellationToken. However, there is no mechanism at the moment by which that error is propagated back to the tunnel state to inform it that the tunnel has entered an error state and should shut down and potentially attempt to reconnect.
Checklist:
This change is