-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Description
Issue description
It seems that, if there is no connectivity to the internet, but the system has the DNS configured (e.g. 8.8.8.8
), in a single ZMQ context with 2 sockets:
- Direct IP
tcp://192.168.0.1:4321
- Domain name
tcp://backup.acme.com:4321
Connection (1) experiments huge delays (~20 - 30s) on TX and RX of packets. It seems that the root cause is that worker_routine
calls gethostbyname2_r()
synchronously, hence hanging until the routine times-out.
#1 0x00007ffff1adbcc0 in send_dg (ansp2_malloced=0x0, resplen2=0x0, anssizp2=0x0, ansp2=0x0, anscp=0x7fffe3ffd3a0, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, ns=0, terrno=0x7fffe3ffc278, anssizp=0x7fffe3ffc3b0, ansp
=0x7fffe3ffc268, buflen2=0, buf2=0x0, buflen=35, buf=0x7fffe3ffc3e0 "G\327\001", statp=0x7fffe3fffdb8) at res_send.c:1200
#2 __libc_res_nsend (statp=statp@entry=0x7fffe3fffdb8, buf=buf@entry=0x7fffe3ffc3e0 "G\327\001", buflen=35, buf2=buf2@entry=0x0, buflen2=buflen2@entry=0, ans=ans@entry=0x7fffe3ffcf70 "n", anssiz=anssiz@entry=1024, ansp=ansp@entry=0x7fffe
3ffd3a0, ansp2=ansp2@entry=0x0, nansp2=nansp2@entry=0x0, resplen2=resplen2@entry=0x0, ansp2_malloced=ansp2_malloced@entry=0x0) at res_send.c:545
#3 0x00007ffff1ad9c0c in __GI___libc_res_nquery (statp=statp@entry=0x7fffe3fffdb8, name=0x7fffdc000b98 "backup.acme.com", class=class@entry=1, type=type@entry=1, answer=answer@entry=0x7fffe3ffcf70 "n", anslen=anslen@entry=1024, answerp
=answerp@entry=0x7fffe3ffd3a0, answerp2=answerp2@entry=0x0, nanswerp2=nanswerp2@entry=0x0, resplen2=resplen2@entry=0x0, answerp2_malloced=answerp2_malloced@entry=0x0) at res_query.c:227
#4 0x00007ffff1ada210 in __libc_res_nquerydomain (statp=statp@entry=0x7fffe3fffdb8, name=name@entry=0x7fffdc000b98 "backup.acme.com", domain=domain@entry=0x0, class=class@entry=1, type=type@entry=1, answer=answer@entry=0x7fffe3ffcf70 "
n", anslen=anslen@entry=1024, answerp=answerp@entry=0x7fffe3ffd3a0, answerp2=answerp2@entry=0x0, nanswerp2=nanswerp2@entry=0x0, resplen2=resplen2@entry=0x0, answerp2_malloced=answerp2_malloced@entry=0x0) at res_query.c:594
#5 0x00007ffff1ada7a9 in __GI___libc_res_nsearch (statp=0x7fffe3fffdb8, name=name@entry=0x7fffdc000b98 "backup.acme.com", class=class@entry=1, type=type@entry=1, answer=answer@entry=0x7fffe3ffcf70 "n", anslen=anslen@entry=1024, answerp
=0x7fffe3ffd3a0, answerp2=answerp2@entry=0x0, nanswerp2=nanswerp2@entry=0x0, resplen2=resplen2@entry=0x0, answerp2_malloced=answerp2_malloced@entry=0x0) at res_query.c:381
#6 0x00007ffff1e0b67d in __GI__nss_dns_gethostbyname3_r (name=0x7fffdc000b98 "backup.acme.com", af=2, result=0x7fffe3ffddc0, buffer=0x7fffe3ffd8f0 "\177", buflen=912, errnop=0x7fffe3fff668, h_errnop=h_errnop@entry=0x7fffe3ffdd9c, ttlp=
ttlp@entry=0x0, canonp=canonp@entry=0x0) at nss_dns/dns-host.c:192
#7 0x00007ffff1e0b924 in _nss_dns_gethostbyname2_r (name=<optimized out>, af=<optimized out>, result=<optimized out>, buffer=<optimized out>, buflen=<optimized out>, errnop=<optimized out>, h_errnop=0x7fffe3ffdd9c) at nss_dns/dns-host.c:
257
#8 0x00007ffff6845be9 in __gethostbyname2_r (name=0x7fffdc000b98 "backup.acme.com", af=af@entry=2, resbuf=resbuf@entry=0x7fffe3ffddc0, buffer=buffer@entry=0x7fffe3ffd8f0 "\177", buflen=buflen@entry=912, result=result@entry=0x7fffe3ffdd
b8, h_errnop=0x7fffe3ffdd9c) at ../nss/getXXbyYY_r.c:266
#9 0x00007ffff6820d1c in gaih_inet (name=<optimized out>, name@entry=0x7fffdc000b98 "backup.acme.com", service=<optimized out>, req=req@entry=0x7fffe3ffdff0, pai=pai@entry=0x7fffe3ffde98, naddrs=naddrs@entry=0x7fffe3ffde94) at ../sysde
ps/posix/getaddrinfo.c:622
#10 0x00007ffff682185d in __GI_getaddrinfo (name=0x7fffdc000b98 "backup.acme.com", service=0x0, hints=0x7fffe3ffdff0, pai=0x7fffe3ffe020) at ../sysdeps/posix/getaddrinfo.c:2426
#11 0x0000000001e4a3c0 in zmq::tcp_address_t::resolve_hostname (this=0x7fffdc007720, hostname_=0x7fffdc000b98 "backup.acme.com", ipv6_=false, is_src_=false) at /home/marc/volta/builder/code/libzmq/src/tcp_address.cpp:378
#12 0x0000000001e4a9e3 in zmq::tcp_address_t::resolve (this=0x7fffdc007720, name_=0x7fffc8001268 "backup.acme.com:7007", local_=false, ipv6_=false, is_src_=false) at /home/marc/volta/builder/code/libzmq/src/tcp_address.cpp:478
#13 0x0000000001e67f13 in zmq::tcp_connecter_t::open (this=0x7fffdc0056c0) at /home/marc/volta/builder/code/libzmq/src/tcp_connecter.cpp:227
#14 0x0000000001e67b52 in zmq::tcp_connecter_t::start_connecting (this=0x7fffdc0056c0) at /home/marc/volta/builder/code/libzmq/src/tcp_connecter.cpp:164
#15 0x0000000001e67b2c in zmq::tcp_connecter_t::timer_event (this=0x7fffdc0056c0, id_=1) at /home/marc/volta/builder/code/libzmq/src/tcp_connecter.cpp:158
#16 0x0000000001e3d543 in zmq::poller_base_t::execute_timers (this=0x7fffe8003010) at /home/marc/volta/builder/code/libzmq/src/poller_base.cpp:99
#17 0x0000000001e30e30 in zmq::epoll_t::loop (this=0x7fffe8003010) at /home/marc/volta/builder/code/libzmq/src/epoll.cpp:152
#18 0x0000000001e310b0 in zmq::epoll_t::worker_routine (arg_=0x7fffe8003010) at /home/marc/volta/builder/code/libzmq/src/epoll.cpp:189
#19 0x0000000001e4c5ea in thread_routine (arg_=0x7fffe8003090) at /home/marc/volta/builder/code/libzmq/src/thread.cpp:96
#20 0x00007ffff7321064 in start_thread (arg=0x7fffe3fff700) at pthread_create.c:309
#21 0x00007ffff683462d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Environment
- libzmq version (commit hash if unreleased): v4.1.6
- OS: Debian 8
Minimal test code / Steps to reproduce the issue
- Configure an unreachable DNS server
- Create a ZMQ context
- Open 2 dealers against a router socket, one to a reachable IP and another to a domain name
- Attempt to use socket (1)
What's the actual result? (include assertion message & call stack if applicable)
The rest of the sockets are blocked for ~30seconds, they work for some time, and then are blocked again. The cycle continues forever
What's the expected result?
Rest of sockets are not blocked if one (or more) of the socket endpoints can't be resolved