Created attachment 513500 [details] wireshark sniffed DFNS traffic $ i=0; while :; do ((i=i+1)); echo $i; date; strace -o strace.$i.log host 625292.bugs.gentoo.org && break; sleep 30; done 1 Sat Jan 6 10:46:44 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 2 Sat Jan 6 10:47:16 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 3 Sat Jan 6 10:47:46 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 4 Sat Jan 6 10:48:16 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 5 Sat Jan 6 10:48:46 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 6 Sat Jan 6 10:49:17 CET 2018 Host 625292.bugs.gentoo.org not found: 2(SERVFAIL) 7 Sat Jan 6 10:49:47 CET 2018 625292.bugs.gentoo.org is an alias for bugs-gossamer.gentoo.org. bugs-gossamer.gentoo.org is an alias for gannet.gentoo.org. gannet.gentoo.org has address 204.187.15.4 gannet.gentoo.org has IPv6 address 2607:fcc0:4:ffff::4 The root cause seems to be : $ grep exit_group strace.* strace.1.log:exit_group(1) = ? strace.2.log:exit_group(1) = ? strace.3.log:exit_group(1) = ? strace.4.log:exit_group(1) = ? strace.5.log:exit_group(1) = ? strace.6.log:exit_group(1) = ? <---- bad strace.7.log:exit_group(0) = ? <---- good pcapng file (from wireshark) and straces will attached. $ grep -v -e '^$' -e '^#' /etc/dnsmasq.conf domain-needed conf-file=/usr/share/dnsmasq/trust-anchors.conf dnssec dnssec-check-unsigned no-resolv server=9.9.9.10 server=2620:fe::10 cache-size=10000 $ qlist -ICv bind-tools gcc dnsmasq net-dns/bind-tools-9.11.1_p3 net-dns/dnsmasq-2.78 sys-devel/gcc-6.4.0 sys-devel/gcc-config-1.8-r1
Created attachment 513502 [details] strace 6
Created attachment 513504 [details] strace 7
A, gorgot to say: commenting out "dnssec" avoid the issue but isn't an option.
From upstream: Looking further, the upstream server is definitely broken. In the correct case we have an RRSET consisting of one NSEC record for *.bugs.gentoo.org and signature for that which verifies. In the broken case we have an RRset consisting of two NSEC records for *.bugs.gentoo.org and the signature is the SAME as in the correct case. It may be OK to repeat the same NSEC record twice in the RRset, but the signature has to reflect that. It doesn't, so it will fail validation. Cheers, Simon.
From upstream (via email): OK. The experts have spoken, and the conclusion is that this is an error on the part of quad9, but there is also a method that dnsmasq can use to handle it. The part of RFC 4034 which matters says: 6.3. Canonical RR Ordering within an RRset For the purposes of DNS security, RRs with the same owner name, class, and type are sorted by treating the RDATA portion of the canonical form of each RR as a left-justified unsigned octet sequence in which the absence of an octet sorts before a zero octet. [RFC2181] specifies that an RRset is not allowed to contain duplicate records (multiple RRs with the same owner name, class, type, and RDATA). Therefore, if an implementation detects duplicate RRs when putting the RRset in canonical form, it MUST treat this as a protocol error. If the implementation chooses to handle this protocol error in the spirit of the robustness principle (being liberal in what it accepts), it MUST remove all but one of the duplicate RR(s) for the purposes of calculating the canonical form of the RRset. I chose to be robust in the face of this error, and I've just committed a patch at http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=e5412459871a0d4095aa6058791d2b9005474645 which I think does that. Since I still can't get quad9 to fail for me, I'd really appreciate it if you could test that code. Cheers, Simon.
and for the record, from upstream via email : It would be good to file a big with Quad9 about this too. As you're the only one seeing the problem, it might be better for you to do it. The Pcap is the Gentoo bug report contains the evidence required. Point out RFC 4034 para 6.3 and note the contents of the AUTHORITY sections of the packets 38, 40, 42 and 44. Note also the correct reply in packet 49. https://www.quad9.net/#/contact seems the best they have as a bug reporting mechanism. Cheers, Simon.
I can confirm, that the proposed patch works fine and solves the problem here.
This has been reported via our support alias @quad9.net. We do NOT have it resolved but we are looking into it. It will probably take us some time to resolve it but we will address it on our side as soon as we can. Thanks, -Daniele
From upstream: More info for quad9. The conclusion is that quad9 is running powerDNS, and the bug in powerDNS is fixed by this changeset. https://twitter.com/Quad9DNS/status/931531341606346752 Cheers, Simon.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=36950876fc0da1b5ced6a6c58508f5bc2c8be572 commit 36950876fc0da1b5ced6a6c58508f5bc2c8be572 Author: Patrick McLean <chutzpah@gentoo.org> AuthorDate: 2018-03-19 18:10:03 +0000 Commit: Patrick McLean <chutzpah@gentoo.org> CommitDate: 2018-03-19 18:11:38 +0000 net-dns/dnsmasq: Version bump to 2.79 Closes: https://bugs.gentoo.org/586454 Closes: https://bugs.gentoo.org/633496 Closes: https://bugs.gentoo.org/643670 Gentoo-Bug: https://bugs.gentoo.org/645704 Package-Manager: Portage-2.3.24, Repoman-2.3.6 net-dns/dnsmasq/Manifest | 1 + net-dns/dnsmasq/dnsmasq-2.79.ebuild | 198 +++++++++++++++++++++++++++++ net-dns/dnsmasq/files/dnsmasq-init-dhcp-r3 | 35 +++++ net-dns/dnsmasq/files/dnsmasq-init-r4 | 29 +++++ net-dns/dnsmasq/files/dnsmasq.logrotate | 7 + 5 files changed, 270 insertions(+)