-
Notifications
You must be signed in to change notification settings - Fork 323
Description
Jenkins and plugins versions report
Noticed in the latest version of jenkins (2.528.3) and this plugin (1308.vff6e33248305).
While connecting with the Public IP of docker host VM, if for some reason the connection on controller becomes zombie (i.e. socket connection present on controller, but not on docker), the build triggers get stuck in contacting docker host at,
thread dump
"jenkins.util.Timer [#5]" #68 [103] daemon prio=5 os_prio=0 cpu=1486.31ms elapsed=7407.21s tid=0x000078fc40004630 nid=103 runnable [0x000078fce68fb000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll([email protected]/Native Method)
at sun.nio.ch.NioSocketImpl.park([email protected]/NioSocketImpl.java:191)
at sun.nio.ch.NioSocketImpl.timedFinishConnect([email protected]/NioSocketImpl.java:548)
at sun.nio.ch.NioSocketImpl.connect([email protected]/NioSocketImpl.java:592)
at java.net.SocksSocketImpl.connect([email protected]/SocksSocketImpl.java:327)
at java.net.Socket.connect([email protected]/Socket.java:751)
at org.apache.hc.client5.http.impl.io.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:205)
at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:490)
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:164)
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:174)
at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:144)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:195)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
at com.github.dockerjava.httpclient5.ApacheDockerHttpClientImpl.execute(ApacheDockerHttpClientImpl.java:189)
at com.github.dockerjava.httpclient5.ApacheDockerHttpClient.execute(ApacheDockerHttpClient.java:9)
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228)
at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:202)
at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:74)
at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:44)
at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:15)
at com.github.dockerjava.core.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:33)
at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:638)
at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedAgent(DockerCloud.java:656)
at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:394)
- locked <0x000000069217bb88> (a com.nirima.jenkins.plugins.docker.DockerCloud)
at io.jenkins.docker.FastNodeProvisionerStrategy.applyToCloud(FastNodeProvisionerStrategy.java:71)
at io.jenkins.docker.FastNodeProvisionerStrategy.apply(FastNodeProvisionerStrategy.java:41)
at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:327)
at hudson.slaves.NodeProvisioner.lambda$suggestReviewNow$4(NodeProvisioner.java:199)
at hudson.slaves.NodeProvisioner$$Lambda/0x000078fcedd2ea28.run(Unknown Source)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:572)
at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:317)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith([email protected]/Thread.java:1596)
at java.lang.Thread.run([email protected]/Thread.java:1583)
Locked ownable synchronizers:
- <0x000000068c284050> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x000000068ea724c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x00000006ac631450> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
No connections were seen on docker host VM when checked with netstat -natup.
In this case the connectionTimeout seemed to ineffective.
What Operating System are you using (both controller, and any agents involved in the problem)?
Controller was CloudBees CI running in k8s uses RHEL 9, docker host was Debian 12.
Reproduction steps
It's hard to reproduce as the JVM should still keep waiting for the other side (ie. docker host), but docker host should have already dropped the connection.
Tried on docker host,
sudo apt install iptables iptables-persistent
sudo iptables -A OUTPUT -p tcp -d <ip of the controller> --sport 2375 -j DROP
But doesn't reproduce systematically
Expected Results
Some kind of timeout should unblock the provision method being stuck.
Actual Results
Stuck waiting for other side - a zombie connection.
Anything else?
Currently it seems there is no SO_TIMEOUT possibility to detect dead connection.
Perhaps at https://github.com/docker-java/docker-java/blob/faa88e16460a8cb321c9695cdbc34cb7a662458e/docker-java-transport-httpclient5/src/main/java/com/github/dockerjava/httpclient5/ApacheDockerHttpClientImpl.java#L117-L122 ?
Are you interested in contributing a fix?
No response