Skip to content

Conversation

Nino-K
Copy link
Contributor

@Nino-K Nino-K commented Sep 10, 2025

This PR aims to capture the published ports within the VM for various container engines. It directly subscribes to the corresponding container engine APIs (Docker, containerd and Kubernetes) to detect published ports immediately as a container is created.

I'm also planning to move the iptables and procnet settings under the portMonitor property; however, that will be addressed in a follow-up PR.

@AkihiroSuda
Copy link
Member

Seems overengineering.
What is the problem with the existing event watcher?

go.mod Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many new dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there are many indirect dependencies being used for just a few direct ones. However, I went ahead and compared the guest agent sizes from the master build and my PR, and I’m happy to share the results below:

total 231384
-rw-r--r--  1 ninok  wheel    55M Sep 11 10:27 guestagent_master
-rw-r--r--  1 ninok  wheel    58M Sep 11 10:26 guestagent_pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is mostly about the maintenance burdens rather than the binary footprint; it is hard to handle dependabot PRs and detect potential supply chain attacks.

Also, the inflation of the Go deps is also challenging for getting Lima in Debian, as Debian makes dpkg for each of those deps.

Copy link
Member

@jandubois jandubois Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is mostly about the maintenance burdens rather than the binary footprint; it is hard to handle dependabot PRs and detect potential supply chain attacks.

True, but these are all dependencies of the containerd and dockerd client libraries, so should be subject to some scrutiny from upstream already.

Also, independent of this issue, we should maybe consider setting a dependabot cooldown period, e.g. 7 days. It will tell dependabot not to make a PR until there have been no new releases of the dependency for those 7 days. This also helps with accidental breakage in dependencies, that is normally fixed quickly.

I've found the cooldown more important for node dependencies, but it can be setup for each eco system.

Also, the inflation of the Go deps is also challenging for getting Lima in Debian, as Debian makes dpkg for each of those deps.

I just looked at https://go-team.pages.debian.net/packaging.html but did not find any rationale for why they would go to all that work. What is the benefit?

Anyways, I assume that Debian has packaged all the prerequisites for docker and containerd clients already, so this should not block anything.

return ipPorts, nil
}
// If the label is not present, we check the network config in the following path:
// <DATAROOT>/<ADDRHASH>/containers/<NAMESPACE>/<CID>/network-config.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't expected to be parsed externally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that since containerd/nerdctl#4290 the information is no longer available via labels. How are you expected to get the port mapping information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is not to depend on container engine implementations; audit events or eBPF should work (see my comments below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is not to depend on container engine implementations; audit events or eBPF should work (see my comments below)

I think this is a poor excuse for nerdctl breaking backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What backwards compatibility is broken?

Copy link
Member

@jandubois jandubois Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backwards compatibility issue is that you used to be able to query container labels for the list of exposed ports. That functionality has been removed in nerdctl 2.1.3 because it broke in an edge case when a large number of ports were exposed.

There is no accommodation for exposing ranges as ranges, they were unrolled to create a label for each port, hitting the containerd limitation when using large ranges. So the labels were removed completely, making it impossible to determine exposed ports via inspection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels are private implementation details too.

The correct way to inspect the port is to:

$ nerdctl container inspect foo | jq .[0].HostConfig.PortBindings
{
  "80/tcp": [
    {
      "HostIp": "0.0.0.0",
      "HostPort": "8080"
    }
  ]
}

This works for Docker too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you do it over the API? You can do this via the API with docker (and you used to be able to do it with nerdctl containers too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No containerd API, as the port forwarding is not handled by the daemon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to argue this anymore, and maybe it is Hyrum's law, but in the end the fact remains that information about the container was available via an API and now is only available by running a subprocess, with the inherent brittleness of that approach (error handling).

@jandubois
Copy link
Member

Seems overengineering. What is the problem with the existing event watcher?

There are no event watchers for docker or containerd; we rely on polling /proc/net/tcp every couple of seconds (3?), so there is always a delay between the container opening a port and it being forwarded to the host.

See e.g. abiosoft/colima#71

This PR is basically a port of the event watchers from Rancher Desktop on Windows, which does not suffer from this issue.

@AkihiroSuda
Copy link
Member

we rely on polling /proc/net/tcp every couple of seconds (3?), so there is always a delay between the container opening a port and it being forwarded to the host.

We also monitor audit events:

auditClient, err := libaudit.NewMulticastAuditClient(nil)

Maybe the audit monitor has a bug ?

@AkihiroSuda
Copy link
Member

There is also a proposal to use eBPF to monitor ports

cc @balajiv113

@Nino-K
Copy link
Contributor Author

Nino-K commented Sep 11, 2025

Seems overengineering. What is the problem with the existing event watcher?

In addition to what @jandubois also pointed out, the current implementation
of the port collection mechanism uses a 3-second polling ticker, although this is configurable. However, this PR subscribes directly to the APIs, allowing port exposure events to be handled more immediately.

@jandubois
Copy link
Member

This PR is in response to #2536

I thought I read a note from @balajiv113 that the audit approach didn't work out because most cloud images did not have the required kernel modules for it installed, but can't find the reference right now.

And #3067 also sounds like it doesn't work anymore for Kubernetes.

@jandubois
Copy link
Member

#1855 may explain why audit monitoring doesn't seem to work.

Monitor container creation and deletion events by subscribing to the container engine's API.
Upon receiving a container creation or deletion event, the system immediately forwards the
port mappings through the aggregated channel. This ensures that the ports are opened on the
host without any latency.

Signed-off-by: Nino Kodabande <[email protected]>
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Sep 17, 2025

Probably we can revisit this TODO to improve the ticking lag

newTicker := func() (<-chan time.Time, func()) {
// TODO: use an equivalent of `bpftrace -e 'tracepoint:syscalls:sys_*_bind { printf("tick\n"); }')`,
// without depending on `bpftrace` binary.
// The agent binary will need CAP_BPF file cap.
ticker := time.NewTicker(tick)
return ticker.C, ticker.Stop
}

(AUDIT_SYSCALL may work too?)

@balajiv113
Copy link
Member

@jandubois & @AkihiroSuda

On the eBPF PR this is the current state,

  • guest direct ports are working perfectly without issues
  • docker/ containers with nftables was able to get it working (need to see if it works for all cases)
  • But with respect to kubernetes I got no luck. As I previously mentioned kubernetes are removing all entries in iptable and reconstructing back due to this we couldn't identify missing one's properly.

@jandubois
Copy link
Member

  • docker/ containers with nftables was able to get it working (need to see if it works for all cases)

Thanks, that sounds promising!

So what I remember about some distros not having the prerequisite kernel modules, or missing permissions, was not, or is no longer correct?

@@ -17,10 +17,13 @@ require (
github.com/cpuguy83/go-md2man/v2 v2.0.7
github.com/digitalocean/go-qemu v0.0.0-20221209210016-f035778c97f7
github.com/diskfs/go-diskfs v1.7.0 // gomodjail:unconfined
github.com/docker/docker v28.3.3+incompatible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it help with respect to code maintenance if we can create a go mod to handle all these.

I did create trackport to handle multiple various implementation of port identification

https://github.com/balajiv113/trackport/blob/main/pkg/trackapi/api.go

https://github.com/balajiv113/trackport/tree/main/pkg/internal

@balajiv113
Copy link
Member

So what I remember about some distros not having the prerequisite kernel modules, or missing permissions, was not, or is no longer correct?

I was able to solve it using kprobe. This is available in almost all distro I checked.

Even our FT were passing except docker that's when it hit the kubernetes issue

@jandubois
Copy link
Member

I was able to solve it using kprobe. This is available in almost all distro I checked.

So you are saying we should not merge this PR, but instead wait for you to finish yours?

Even our FT were passing except docker that's when it hit the kubernetes issue

We already have a Kubernetes port monitor in the current guest agent; we will just need to keep it.

@AkihiroSuda
Copy link
Member

The combination of @balajiv113 's eBPF PR (#3067) w/ the existing kubernetesservice watcher seems the most robust and compatible solution for now?

I wish we could remove the dependency on Kubernetes client libraries, but it can be discussed separately in future.

@jandubois
Copy link
Member

The combination of @balajiv113 's eBPF PR (#3067) w/ the existing kubernetesservice watcher seems the most robust and compatible solution for now?

Yes, if it works and is indeed robust, then a generic solution is obviously preferable. I was under the impression that this experiment had failed. I'll be happy to learn if I was wrong.

@AkihiroSuda AkihiroSuda marked this pull request as draft September 18, 2025 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants