Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 67385 - name resolution failures with sys-libs/glibc-2.3.4.20041006
Summary: name resolution failures with sys-libs/glibc-2.3.4.20041006
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
: 67445 69305 69525 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-10-13 07:06 UTC by Chris Smith
Modified: 2004-10-31 09:52 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Smith 2004-10-13 07:06:37 UTC
After installing sys-libs/glibc-2.3.4.20041006 there were DNS name resolution failures in many components. KDE components such as Konqueror and Kmail could not resolve names at all. Firefox could resolve Internet domains but not local domains. Ping had the same problems as Firefox - any DNS resolution for my local domain would fail but Internet domains would resolve. Dig, on the other hand worked perfectly for both local and Internet domains.
I recompiled sys-libs/glibc-2.3.4.20041006 with the nptlonly flag with the same results.

Reproducible: Always
Steps to Reproduce:
1.energe =sys-libs/glibc-2.3.4.20041006
2.Watch KDE go blind.
3.

Actual Results:  
See details section above.
Dropping back to sys-libs/glibc-2.3.4.20040808-r1 (where I am currently)
resolved these issues.

Expected Results:  
Normal system operation.

# emerge info
Portage 2.0.51_rc9 (gcc34-x86-2004.2, gcc-3.4.2, glibc-2.3.4.20040808-r1,
2.6.8-gentoo-r6 i686)
=================================================================
System uname: 2.6.8-gentoo-r6 i686 Intel(R) Pentium(R) 4 CPU 2.53GHz
Gentoo Base System version 1.5.3
distcc 2.17 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
Autoconf: sys-devel/autoconf-2.59-r5
Automake: sys-devel/automake-1.8.5-r1
Binutils: sys-devel/binutils-2.15.92.0.2-r1
Headers:  sys-kernel/linux26-headers-2.6.8.1-r1
Libtools: sys-devel/libtool-1.5.2-r5
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -pipe -s"
CHOST="i686-pc-linux-gnu"
COMPILER=""
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config
/usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown
/usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config
/usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/
/usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/
/usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -pipe -s"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distlocks sandbox"
GENTOO_MIRRORS="http://mirrors.tds.net/gentoo http://gentoo.seren.com/gentoo
http://open-systems.ufl.edu/mirrors/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X aalib acpi alsa apache2 arts audiofile avi berkdb bitmap-fonts bonobo cdr
crypt cups curl dv dvd dvdr encode esd f77 flac gdbm gif gphoto2 gpm gstreamer
gtk gtk2 gtkhtml guile imap imlib jack java jpeg kde lcms ldap libg++ libwww mad
mikmod mmx motif mozilla mpeg ncurses nls nptl oggvorbis opengl oss pam pda
pdflib perl png ppds python qt quicktime readline samba sasl scanner sdl slang
spell sse ssl svg svga tcltk tcpd tetex theora threads tiff truetype unicode usb
wmf x86 xml xml2 xmms xprint xv zlib"
Comment 1 Chris Smith 2004-10-13 10:29:05 UTC
Additionally after I recompiled with +nptlonly my nfs mounts failed during the next boot. The shortname (hostname) is in fstab and DNS resolution apparently failed. Again, dropping back to sys-libs/glibc-2.3.4.20040808-r1 solved the problem.
Comment 2 Seemant Kulleen (RETIRED) gentoo-dev 2004-10-13 14:11:58 UTC
*** Bug 67445 has been marked as a duplicate of this bug. ***
Comment 3 Travis Tilley (RETIRED) gentoo-dev 2004-10-13 17:25:46 UTC
this release is -very- close to the upstream (fedora) release. could you also file a bug with redhat and let them know that this problem is occurring with a very lightly patched 20041006 snapshot of fedora-branch? it might also help to let them know it's the same version that appears in the fedora 2.3.3-66 rpm just in case they dont have snapshot dates memorised. ;)

I remember another dev filing a name resolving bug with redhat and redhat replying with "this isnt a bug". apparently this is the first version to parse the config as intended. perhaps this is another case of the same. :/

http://sources.redhat.com/bugzilla/
Comment 4 Chris Smith 2004-10-14 07:02:01 UTC
Sorry, I don't totally follow. To parse what "config" as intended?
This release totally hoses my network that uses standard name resolution tools. 
A forum post mentions the SuSE patch listed in bug 66295 as the possible culprit.
Also, I don't quite understand the relationship between Redhat/Fedora and this Gentoo release. Exactly which Redhat version of glibc that I don't use am I supposed to bug?
Comment 5 Chris Smith 2004-10-14 09:11:31 UTC
I can pretty much confirm that the source of the problem is indeed the glibc-2.3.3-mdns-resolver patch introduced with this ebuild. I commented out the line that applies this patch in the ebuild and all works as expected.
KDE did go blind, but Firefox and ping worked normally, and KDE was OK after I restarted it.
Comment 6 Ulrich Dobramysl 2004-10-14 09:24:31 UTC
Same problem here. Glibc tried to resolve all local adresses with mDNS.
I also commented out the patch in the ebuild and all works perfectly now.
Comment 7 Chris Smith 2004-10-14 09:35:24 UTC
As an aside, new problems similar to Bug 67166 are now rearing their ugly heads (can't compile kdeutils because of: checking for Qt... configure: error: Qt (>= Qt 3.3) (library qt-mt) not found). So I'm dropping back to 2.3.4.20040808-r1. Hard to believe 2.3.4.20041006 isn't now masked. 
Comment 8 Stefan Karg 2004-10-14 12:32:47 UTC
I had the same problems
no local name resolution
after changing my local domain from
death-star.local to death-star.home
local name resolution works again
Comment 9 Chris Smith 2004-10-16 13:27:55 UTC
I just dumped my top level ".local" domain and compiled glibc-2.3.4.20041006 as written - with the mdns patch. Konqueror went blind (didn't check Kmail) but just restarting KDE/X was enough to bring it back.
Also was able to remerge Qt as well - which I couldn't do previously.
Comment 10 Cameron Blackwood 2004-10-17 20:18:12 UTC
I dont know if this is related, but I have a script which monitors config files and it reports:

File is in multiple dbs: ('/etc/init.d/nscd', ['/etc/init.d/._cfg0000_nscd'],
   OLD: [('b27ffb8e5b59cbd0375b0f5e6d7c3e1e', 1094190327, 'sys-apps', 'baselayout-1.10.4')], 
   NEW: [('b27ffb8e5b59cbd0375b0f5e6d7c3e1e', 1094190327, 'sys-apps', 'baselayout-1.10.4'), 
         ('bedcd868a9462009158714238594173c', 1098063249, 'sys-libs', 'glibc-2.3.4.20041006')])

It looks like /etc/init.d/nscd has been added to the glibc-2.3.4 package (by mistake?) and changed. Or is it? Im trying to make sense of this... ?

root@kali /data/setup/etc-fixer # epm -qf /etc/init.d/nscd               
baselayout-1.10.4
glibc-2.3.4.20041006
root@kali /data/setup/etc-fixer # epm -qf /etc/nscd.conf 
glibc-2.3.4.20041006

So maybe the init.d should be in the glibc (if the config file is).

Before the upgrade, I just got:

root@kali / # epm -qf /etc/init.d/nscd            
baselayout-1.10.4
root@kali / # epm -qf /etc/nscd.conf 
glibc-2.3.4.20040808

diffing the init.d files I get:
2c2
< # Copyright 1999-2004 Gentoo Technologies, Inc.
---
> # Copyright 1999-2004 Gentoo Foundation
4c4
< # $Header: /home/cvsroot/gentoo-src/rc-scripts/init.d/nscd,v 1.9 2004/04/21 17:09:18 vapier Exp $
---
> # $Header: /var/cvsroot/gentoo-x86/sys-libs/glibc/files/nscd,v 1.3 2004/09/29 05:24:47 vapier Exp $
25c25,29
<       start-stop-daemon --start --quiet --exec /usr/sbin/nscd -- $secure
---
>       local pidfile="$(strings /usr/sbin/nscd | grep nscd.pid)"
>       mkdir -p "$(dirname ${pidfile})"
>       start-stop-daemon --start --quiet \
>               --exec /usr/sbin/nscd --pid ${pidfile} \
>               -- $secure
29c33
< stop () {
---
> stop() {
31c35
<       start-stop-daemon --stop --quiet --pid /var/run/nscd.pid
---
>       start-stop-daemon --stop --quiet --exec /usr/sbin/nscd
35d38
< 

On a side note, isnt pidfile="$(strings /usr/sbin/nscd | grep nscd.pid)" not exactly the best way to work out where to write a pid file? :) I must remember not to put in comments such as "cant find nscd.pid file: atleast its not as bad as `rm -rf /`" :)

:)
Comment 11 Maurice van der Pot (RETIRED) gentoo-dev 2004-10-23 07:13:16 UTC
I had this problem, but in addition to this whoami/id were unable to find groups and users on my system (making portage & friends unusable). Should that be part of this bug or do I need to open another one?
Comment 12 Timo Gurr (RETIRED) gentoo-dev 2004-10-28 09:22:27 UTC
This bug still exists in 2.3.4.20041021 and should _really_ be fixed, downgrading back to 2.3.4.20040808-r1 b0rked my entire system and made it unbootable, but that's another story.
Comment 13 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 14:50:32 UTC
straight from the suse release notes mentioning the mdns change:

Change in Resolver Library

Incompatible change: the resolver library treats the .local top level domain as link-local domain and sends multicast DNS requests to the multicast address 224.0.0.251 port 5353 instead of normal DNS requests. If you already use the .local domain in your nameserver configuration you will have to switch to another domain name. See http://www.multicastdns.org for more information on multicast DNS. 

perhaps we need to make the mdns updates USE-controlled?
Comment 14 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 18:08:12 UTC
*** Bug 69305 has been marked as a duplicate of this bug. ***
Comment 15 Harris Landgarten 2004-10-28 18:08:55 UTC
Microsoft strongly encourages the use of the .local top level domain on Windows networks. I just installed a Windows 2003 Small Business server yesterday and it defaulted to a .local domain. Once established, changing a top level domain name in any network with Windows domain controllers and Windows workstations is a major pain. Forcing this change on mixed Linux - Microsoft networks is lunacy. Give us a use flag to turn of mDNS.

Could this issue also be the cause of 10-20 sec delays in Internet name resolution with the new glibc? That is what I am seeing in 20041021 and this occured in 20041006 even with the mDNS line in the ebuild commented out.

BTW, the constant seq fault problem that happens when you downgrade 20041021 to 200408-r1 can be fixed by replacing the libraries in /lib/tls with equivalent names from /lib while booted from a live CD. You can then chroot and re-emerge glibc-2.3.4.20040408-r1.
Comment 16 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 19:03:30 UTC
thanks for the update harris, i had no idea. i'm almost glad that this ebuild wasnt anywhere near ready for stable...

it seems suse 9.2 has a way to enable/disable the mdns update at runtime, -if- i understand correctly. i'll poke at what they've changed between suse 9.1 and 9.2 and see if there's something useful we can add... if not, i'll make the mdns patch USE-dependant.
Comment 17 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 22:16:30 UTC
alright, i've updated the multicast dns patch from suse and included an example host.conf that disables it by default. sync up in 30 mins to an hour and re-emerge glibc. the md5sum for the new ebuild:

ayanami glibc # md5sum glibc-2.3.4.20041021.ebuild
266cbed202c424608c25316331e19a03  glibc-2.3.4.20041021.ebuild

a copy and paste from the example /etc/host.conf:

# Valid values are on and off. If set to on, the resolv+ library treats
# the .local top level domain as link-local domain and sends multicast
# DNS requests to the multicast address 224.0.0.251 port 5353 instead
# of normal DNS requests. If you already use the .local domain in your
# nameserver configuration you will have to switch this option off.
#
mdns off


let me know if this bug is or isnt fixed for you in the new ebuild.
Comment 18 Harris Landgarten 2004-10-29 05:25:41 UTC
First try on the new build is mostly positive. Local is now resolving and DNS slowness seems to be gone. I will check it out further later today.
Comment 19 Maurice van der Pot (RETIRED) gentoo-dev 2004-10-29 10:24:25 UTC
After upgrade I get this:

> ping www.gentoo.org
/etc/host.conf: line 24: bad command `mdns off'
PING www.gentoo.org (198.63.211.235) 56(84) bytes of data.
64 bytes from 198.63.211.235: icmp_seq=1 ttl=56 time=131 ms
etc
Comment 20 Chris Smith 2004-10-29 11:54:20 UTC
>/etc/host.conf: line 24: bad command `mdns off'

Also a bad command if set to 'on'. The error is not just with ping, seems to happen with all name resolution requests. 
Comment 21 Travis Tilley (RETIRED) gentoo-dev 2004-10-29 13:26:47 UTC
ok... fixed again. this time the md5sum you want is:

ayanami glibc # md5sum files/2.3.4/glibc-2.3.3-mdns-resolver2.diff
9db90105eb74d75834d25a599cba97ea  files/2.3.4/glibc-2.3.3-mdns-resolver2.diff
Comment 22 Harris Landgarten 2004-10-29 17:14:48 UTC
I'm in the process of compiling the latest, but one oddity I noticed from this mornings build. ssh connections are hanging for 5-15 seconds while reading the /etc/ssh/ssh_config file (at least that is the last line that prints before the wait) when running with the -v flag. After the wait the command completes successfully.
Comment 23 Harris Landgarten 2004-10-29 19:02:57 UTC
After the new glibc compiled, I ran prelink again and all of my problems seem to be gone.
Comment 24 Haldir 2004-10-31 07:27:16 UTC
*** Bug 69525 has been marked as a duplicate of this bug. ***
Comment 25 Travis Tilley (RETIRED) gentoo-dev 2004-10-31 09:52:26 UTC
so the solution is to re-emerge glibc. fixed.