Temporary loss of socket is not recoverable

We're running the Datadog agent as a DaemonSet (although I suspect the same issue will happen when running as a sidecar) in Kubernetes and the client is a singleton in a JVM process running in a pod. 

The communication between the client and agent is done via UDS.

When instantiating `NonBlockingStatsDClient` it `open()`s the DatagramChannel: 
https://github.com/DataDog/java-dogstatsd-client/blob/master/src/main/java/com/timgroup/statsd/NonBlockingStatsDClient.java#L642

Everything works as expected.

Now let's say we updated our Datadog **agent**'s template. We apply the template and that causes the DaemonSet to gradually restart the agent processes running on nodes.

When the agent restarts, the socket path is temporarily unavailable and we see a
```
 java.io.IOException: No such file or directory
    at jnr.unixsocket.UnixDatagramChannel.send(UnixDatagramChannel.java:164)
    at com.timgroup.statsd.StatsDSender.blockingSend(StatsDSender.java:88)
    at com.timgroup.statsd.StatsDSender.run(StatsDSender.java:71)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
```
The exception handler we supply to `NonBlockingStatsDClient` cannot do anything to tell the client to re-open the `DatagramChannel`, or retry in any way.

When we've seen this happen before, we had to restart **all of our pods**.

From a code perspective, our only option in this case is to mark the client as "dead" and have other code observe that state and re-instantiate it. But, if we do that, we lose all of the metrics that are currently in the queue.

Instead, the client should be able to recover from such an error.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Temporary loss of socket is not recoverable #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Temporary loss of socket is not recoverable #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions