I upgraded my box with the shining new 2006.1 profile. It seems that the new Glibc 2.4 / GCC 4.1.1 combination breaks nss_ldap. Before the upgrade (done by following the GCC Upgrading Guide to the letter with a switch of profile beforehand) my LDAP authentification worked like a charm. Now, as soon as the box tries to look up something in the LDAP server the process involved segfaults. nsswitch.conf: -- BEGIN -- passwd: files ldap shadow: files ldap group: files ldap hosts: files dns networks: files dns services: db files protocols: db files rpc: db files ethers: db files netmasks: files netgroup: files bootparams: files automount: files aliases: files -- END -- On the server side, I see this (for an outside connection to the ssh server which is immediately closed : the forked sshd process is dead before the password prompt): -- BEGIN -- Sep 7 12:23:39 quiet slapd[15484]: conn=3325 fd=12 ACCEPT from IP=127.0.0.1:39203 (IP=0.0.0.0:389) Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=0 BIND dn="" method=128 Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=0 RESULT tag=97 err=0 text= Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=1 SRCH base="dc=home,dc=bouton,dc=name" scope=2 deref=0 filter="(uid=root)" Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=2 BIND dn="uid=root,ou=Users,dc=home,dc=bouton,dc=name" method=128 Sep 7 12:23:39 quiet slapd[15484]: slap_global_control: unrecognized control: 1.3.6.1.4.1.42.2.27.8.5.1 Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=2 RESULT tag=97 err=49 text= Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=3 BIND dn="" method=128 Sep 7 12:23:39 quiet slapd[15484]: conn=3325 op=3 RESULT tag=97 err=0 text= Sep 7 12:23:42 quiet slapd[15484]: conn=3325 fd=12 closed (connection lost) Sep 7 12:23:44 quiet slapd[15484]: conn=3326 fd=12 ACCEPT from IP=127.0.0.1:39226 (IP=0.0.0.0:389) Sep 7 12:23:44 quiet slapd[15484]: conn=3326 op=0 BIND dn="" method=128 Sep 7 12:23:44 quiet slapd[15484]: conn=3326 op=0 RESULT tag=97 err=0 text= Sep 7 12:23:44 quiet slapd[15484]: conn=3326 op=1 SRCH base="dc=home,dc=bouton,dc=name" scope=2 deref=0 filter="(uid=root)" Sep 7 12:23:44 quiet slapd[15484]: conn=3326 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= Sep 7 12:23:44 quiet slapd[15484]: conn=3326 fd=12 closed (connection lost) -- END -- I've another box with LDAP authentication on the same server working correctly (no GCC/Glibc upgrade for it) so the problem is clearly on the client side. As soon as I remove any ldap reference from the nsswitch.conf file, everything becomes stable. I've tried without and with nscd running. When nscd is running there's a twist: with ldap in nsswitch.conf, processes still segfault but the nscd process seems to be able to do lookups because when I remove the ldap references from the nsswitch.conf file, getent calls return users and groups that are only defined in the LDAP server. I tried both the x86 nss_ldap-249 and ~x86 nss_ldap-252. Even a fresh reboot with 252 doesn't solve the problem (assuming an old library might have been loaded in memory). emerge --info: Portage 2.1-r2 (default-linux/x86/2006.1, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r7 i686) ================================================================= System uname: 2.6.17-gentoo-r7 i686 Unknown CPU Typ Gentoo Base System version 1.12.4 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] app-admin/eselect-compiler: [Not Present] dev-lang/python: 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.3 dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r5 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O3 -march=athlon-xp -fomit-frame-pointer -ffast-math -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/NX/etc /usr/NX/home /usr/share/X11/xkb /var/bind" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O3 -march=athlon-xp -fomit-frame-pointer -ffast-math -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict userpriv usersandbox" GENTOO_MIRRORS="http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ http://ftp.roedu.net/pub/mirrors/gentoo.org/" LINGUAS="en fr" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="x86 3dnow 3dnowext X aac aalib aim alsa bash-completion bitmap-fonts bzip2 cairo cli crypt cups djbfft dlloader dri dvd dvdread emacs fam fastcgi firefox flac glut gtkhtml icq imagemagick iproute2 ipv6 isdnlog jabber java jikes jpeg lcms lzo makecheck matroska memcache mmx mmx2 mmxext mozilla mp4 msn ncurses nls nptl nptlonly nsplugin pam pcre png ppds pppd rdesktop readline real reflection rrdtool ruby scanner session speex spl sse ssl theora threads tiff truetype truetype-fonts type1-fonts udev unicode usb vnc xorg xvid yahoo zlib elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux linguas_en linguas_fr userland_GNU video_cards_mga" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS
I just did the following: - emerge gcc-3.4.6 - gcc-config i686-pc-linux-gnu-3.4.6 - source /etc/profile - emerge nss_ldap (x86 249) - activate ldap in nsswitch.conf - opened a new ssh session successfully It rules out a glibc-2.4 problem. This is definitely a nss_ldap/gcc-4.1.1 problem.
Build BOTH openldap and nss_ldap with the same version of GCC4, and test again.
Both were originally compiled with gcc-4.1.1 as I followed the GCC Upgrade Guide. I believe that after: - gcc-config i686-pc-linux-gnu-4.1.1 - source /etc/profile - emerge -e system && emerge -e world - a reboot. there's no way to have openldap and nss_ldap compiled with anything other than gcc-4.1.1 both on disk or in memory, is there? Anyway, the machine as time to spare. I'll make a tbz2 of the currently working, 3.4.6-compiled nss_ldap and do another emerge openldap nss_ldap with gcc-4.1.1 just to be sure. I'll post the result shortly (if I don't lose my ssh access in the process...).
After rebuilding both openldap and nss_ldap with 4.1.1 the segfaults are back as soon as ldap is in nsswitch.conf. Reverting to a nss_ldap compiled with 3.4.6 solves the problem (again).
Hmm, I'll dig more. The reason I asked for rebuilding, is that I saw some weirdness if during the GCC4 upgrade, nss_ldap was rebuilt before openldap. 1. Please attach your /etc/ldap.conf file. 2. What version of openldap were you using?
Ok, I understand the concern for a mismatch between the openldap used for build and run. I use openldap-2.3.24-r1 (and nss_ldap-249). Here's the ldap.conf content (grep -v '^ *\(#\|$\)' /etc/ldap.conf) to save you the comments actually) : host 127.0.0.1 base dc=home,dc=bouton,dc=name nss_reconnect_tries 1 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 1 nss_reconnect_maxconntries 3
could you please provide the output of: ldd /lib/libnss_ldap-2.4.so also, try the latest ~arch nss_ldap while you are at it. I still haven't reproduced this problem at all, and my entire machine is built with GCC4.1.1.
x86 nss_ldap (249) compiled with gcc-3.4.6 (ok): ldd /lib/libnss_ldap-2.4.so linux-gate.so.1 => (0xffffe000) libldap-2.3.so.0 => /usr/lib/libldap-2.3.so.0 (0xb7f9c000) liblber-2.3.so.0 => /usr/lib/liblber-2.3.so.0 (0xb7f90000) libdl.so.2 => /lib/libdl.so.2 (0xb7f8c000) libnsl.so.1 => /lib/libnsl.so.1 (0xb7f77000) libresolv.so.2 => /lib/libresolv.so.2 (0xb7f65000) libc.so.6 => /lib/libc.so.6 (0xb7e49000) libssl.so.0.9.7 => /usr/lib/libssl.so.0.9.7 (0xb7e16000) libcrypto.so.0.9.7 => /usr/lib/libcrypto.so.0.9.7 (0xb7d07000) /lib/ld-linux.so.2 (0x80000000) with 4.1.1 (segfaults): linux-gate.so.1 => (0xffffe000) libldap-2.3.so.0 => /usr/lib/libldap-2.3.so.0 (0xb7ef1000) liblber-2.3.so.0 => /usr/lib/liblber-2.3.so.0 (0xb7ee5000) libdl.so.2 => /lib/libdl.so.2 (0xb7ee1000) libnsl.so.1 => /lib/libnsl.so.1 (0xb7ecc000) libresolv.so.2 => /lib/libresolv.so.2 (0xb7eba000) libc.so.6 => /lib/libc.so.6 (0xb7d9e000) libssl.so.0.9.7 => /usr/lib/libssl.so.0.9.7 (0xb7d6b000) libcrypto.so.0.9.7 => /usr/lib/libcrypto.so.0.9.7 (0xb7c5c000) /lib/ld-linux.so.2 (0x80000000) ~x86 nss_ldap (253) with 4.1.1 (segfaults): linux-gate.so.1 => (0xffffe000) libldap-2.3.so.0 => /usr/lib/libldap-2.3.so.0 (0xb7ea2000) liblber-2.3.so.0 => /usr/lib/liblber-2.3.so.0 (0xb7e96000) libdl.so.2 => /lib/libdl.so.2 (0xb7e92000) libnsl.so.1 => /lib/libnsl.so.1 (0xb7e7d000) libresolv.so.2 => /lib/libresolv.so.2 (0xb7e6b000) libc.so.6 => /lib/libc.so.6 (0xb7d4f000) libssl.so.0.9.7 => /usr/lib/libssl.so.0.9.7 (0xb7d1c000) libcrypto.so.0.9.7 => /usr/lib/libcrypto.so.0.9.7 (0xb7c0d000) /lib/ld-linux.so.2 (0x80000000) I've got another box (another Athlon-XP) broken since my last report. But for this one, I've no solution: although the first two were fixed by emerging nss_ldap with gcc 3.4.6, on this third box even this doesn't fix the problem. All boxes have got the emerge -e system && emerge -e world treatment, even twice for some of them. Seems like a corner case linked to CFLAGS or USE-flags. Here's the emerge --info of the third box: Portage 2.1.1 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.16-gentoo-r6 i686) ================================================================= System uname: 2.6.16-gentoo-r6 i686 AMD Athlon(tm) XP 1600+ Gentoo Base System version 1.12.5 Last Sync: Thu, 21 Sep 2006 17:50:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] ccache version 2.3 [enabled] app-admin/eselect-compiler: [Not Present] dev-java/java-config: 1.3.6-r1, 2.0.28-r1 dev-lang/python: 2.3.5-r2, 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.3 dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r1 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=athlon-xp -fomit-frame-pointer -ffast-math -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/NX/etc /usr/NX/home /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=athlon-xp -fomit-frame-pointer -ffast-math -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distcc distlocks metadata-transfer parallel-fetch sandbox sfperms strict userpriv usersandbox" GENTOO_MIRRORS="http://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ http://ftp.club-internet.fr/pub/mirrors/gentoo http://pandemonium.tiscali.de/pub/gentoo/" LINGUAS="fr" MAKEOPTS="-j13" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync-gentoo.inet6-interne.fr/gentoo-portage" USE="x86 3dnow 3dnowext X aac aalib alsa arts bash-completion berkdb bindist bitmap-fonts browserplugin bzip2 cairo cdr cjk cli crypt cups dbus dga dlloader dri dvd dvdr eds elibc_glibc emacs emboss encode esd fam firefox flac gdbm gif gimpprint gpm gstreamer hal input_devices_evdev input_devices_keyboard input_devices_mouse isdnlog java jikes jpeg kernel_linux lcms ldap libcaca libg++ linguas_fr mad matroska mikmod mmx mmxext mng mp3 mpeg ncurses nls nptl nptlonly offensive ogg opengl oss pam pcre perl png ppds pppd qt3 qt4 quicktime readline real reflection ruby samba sdl session spell spl sse ssl threads tiff truetype truetype-fonts type1-fonts udev unicode userland_GNU video_cards_mga video_cards_s3 video_cards_s3virge vorbis win32codecs xinerama xml xorg xprint xv xvid xvmc zlib" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS
why are you using -ffast-math - this is a known cause of brokenness in systems. please rebuild without fast-math - because we don't want stuff from the ccache that used fast-math before to break things now. rebuilding glibc, openssl, openldap, and then nss_ldap should be sufficent to test this.
emerging without fast-math solves the case!
closing as invalid since it was fast-math causing the brokeness. please heed the warnings in the handbook in future about fast-math.