Skip to content

Conversation

@Aurashk
Copy link
Contributor

@Aurashk Aurashk commented Dec 5, 2025

Description

Fixes #658

Modifies both flavours of SSH process manager (shell and paramiko) to query/kill remote processes directly rather than through the local ssh client. This is achieved by running headless remote processes and storing the pid of the remote process in metadata. Then the pid can be used to send signals through ssh directly to the remote process.

This has some desired effects:

  • We get informative exit codes from the remote processes rather than the ssh client processes
  • More control of cleanup through sending signals directly, otherwise all we can do is kill the client which sends a SIGHUP to the remote process

Less desirable effects:

  • Some commands in drunc-unified-shell are slower e.g. on my laptop ps now takes a couple of seconds rather than instant. Killing each process is around a second. I would expect some degree of slowdown as we went from completely local process communication to having to talk remotely through ssh to each process. The implementation can probably be optimised better if this is a concern.

Note: There is an adjacent comment in the issue about fixing the terminate order to match K8s. I think that would be straightforward to add to this PR but I will wait for feedback on the approach first

Type of change

  • Documentation (non-breaking change that adds or improves the documentation)
  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass (eg. python -m pytest)
  • Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added or an issue has been opened to tackle that in the future.
    (Indicate issue here: # (issue))

@Aurashk Aurashk marked this pull request as ready for review December 8, 2025 10:10
Copy link
Contributor

@jamesturner246 jamesturner246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you can still see our meeting chat any more, but I think I have signal forwarding working without any monitor threads or other hacks.

The secret is to run the ssh client with the -t option, which forces remote to create a PTY, and then run the command inside of that, instead of without, where it runs the command directly. Advantage of this way is signals like SIGTERM et al are actually forwarded to remote command properly, thanks to the PTY's built-in signal handling.

So all it means in practice is using ssh -t ... instead of ssh ..., and it should work without the hacks.

@jamesturner246
Copy link
Contributor

One caveat though, SIGTERMing the local ssh means that in fact a SIGHUP is actually what appears on the remote command side. Butt from this use case I don't think it matters which of SIGTERM or SIGHUP reaches remote command, just the fact that a TERM-ish signal is reaching remote reliably.

@jamesturner246
Copy link
Contributor

jamesturner246 commented Dec 8, 2025

One can in fact be EXTRA safe (probably would recommend) by sending the ^C byte through explicitly, and letting the remote program deal with shutting itself down. The ssh return code would be the return code of the remote command. It would fall back to getting SIGHUP if that fails though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Terminate incorret implementation in SSH PM

3 participants