Summary: | nss_ldap-234 doesn't query the ldap server | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Michael Hanselmann (hansmi) (RETIRED) <hansmi> |
Component: | Current packages | Assignee: | Robin Johnson <robbat2> |
Status: | RESOLVED TEST-REQUEST | ||
Severity: | major | CC: | liverbugg, nielchiano, toolchain |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | emerge-info |
Description
Michael Hanselmann (hansmi) (RETIRED)
2005-03-02 13:40:26 UTC
Huh? "I've updated nss_ldap from 226 to 234 today and I wasn't able to log in anymore. Remerging nss_ldap-234 fixed it." Did you mean 226 in the second case? nss_ldap-234 does work fine for me. Uh, sorry, yes. nss_ldap-226 works great, while 234 doesn't. I'm using 234 on production boxes, with no problems at all. I did see the behavior you are noting on 233. Just on a hunch, could you reboot your 234 machine, to make sure any copy of 226 in the cache gets flushed? I've rebooted after remerging nss_ldap-234 and it still gives me "illegal user" in the log. But I've noticed another problem with the ldap credentials, so I'll have to investigate more. Maybe the configuration has an error. Don't do anything on this bug until further notice. :-) I just hit this bug. I was setting up a new box using 2005.0 and it wouldn't login. htpc ~ # emerge -p nss_ldap pam_ldap These are the packages that I would merge, in order: Calculating dependencies ...done! [ebuild R ] net-libs/nss_ldap-234 [ebuild R ] net-libs/pam_ldap-176 htpc ~ # su htpc Unknown id: htpc I went back and checked my other boxes and this is what I get: poweredge ~ # emerge -p nss_ldap pam_ldap These are the packages that I would merge, in order: Calculating dependencies ...done! [ebuild R ] net-libs/nss_ldap-234 [ebuild R ] net-libs/pam_ldap-176 poweredge ~ # su htpc Unknown id: htpc delltop ~ # emerge -p nss_ldap pam_ldap These are the packages that I would merge, in order: Calculating dependencies ...done! [ebuild U ] net-libs/nss_ldap-234 [226] [ebuild R ] net-libs/pam_ldap-176 delltop ~ # su htpc Creating directory '/home/htpc'. htpc@delltop root $ The configs on all the boxes are identical. liverbugg: first of all on the affected machine, diagnose it down to nss or pam. and make sure nscd is running. on all of your boxes, I want your 'emerge -v info' output, as well as your version of linux-headers. getent passwd also works/doesn't work the same as su on all the boxes. As that's provided by glibc it should have nothing to do with PAM, so it's definatly nss_ldap that's not working. nscd isn't running on any boxes and never was running on any boxes. But starting it changed nothing. On the ldap server, nss_ldap-234 works, using the same ldap.conf as the other boxes. All clents have linux-headers-2.6.8.1-r4, the server has linux-headers-2.4.22-r1. I'll attach the emerge -v info output from all the boxes in one file. Created attachment 54737 [details]
emerge-info
emerge -v info from 4 boxes
do either of you use /etc/ldap.secret? I see that upstream has been busy with versions, and is up to 238 now, that fixes a glitch in the handling of /etc/ldap.secret (the last character was getting removed sometimes). It also makes some other undocuments changes, so I'll put it into the tree tommorrow, in case it fixes things for you. I agree that there is something weird. I shelled into my work box (that was using nss_ldap-234), and it worked fine. I then assembled the ebuild for 238 and tried it, and found that the machine did NOT connect the ldap server at all (checking via tcpdump). I downgraded to 234 again, and found it also now didn't work. Then I told the box to reboot to check if that helped, but my machine didn't come back on it's own. I probably left a CD in my drive, so it'll have to wait for tommorow to get checked further. I dont use /etc/ldap.secret. I added nss_ldap-238 into my overlay and it behaves exactly as 234 does for me. liverbugg: if you have some time on your hands, could you please give the following nss_ldap versions a quick try, via copying the other ebuild into your overlay and just seeing they compile, and if 'genent -s ldap passwd (some-ldap-only-user)' works? 227 228 229 230 232 I'd suggest rebooting between each version, just to be 100% certain. this should help us narrow it down to a specific change in nss_ldap (or at least cut down the search field significently). Here's my test process: rm /etc/ldap.conf emerge "=nss_ldap-2xx" reboot cp /etc/ldap.conf2 /etc/ldap.conf getent -s ldap passwd htpc The reason for rming the conf is many of the broken versions hang during emerge waiting for the timeout, and emerge causes lots of queries so it takes forever for them all to timeout. 226-227 - works fine 228-233 - hangs with "nss_ldap: reconnecting to LDAP server (sleeping xx seconds)..." in syslog, then times out. 234+ - doesn't hang but doesn't work. no messages in syslog Rebooting seemed to be unneccesary, although I still did it just to be sure. If I didn't rm ldap.conf, emerging a broken version would immediately hang during the unmergeing of the old version. If I had nscd running and I upgrade to a broken version it still works, I assume because of the cacheing. If I then reboot it's broken. toolchain/glibc folk: did something in glibc's nss code change, read this comment please. after compiling with debugging, the problem gets even more confusing... It works under SOME combinations of getent/nsswitch, but not others!. I think something in glibc may have changed, and be responsible for some of this... If I have 'ldap' on the nsswitch line for passwd, then this works 100% under nss_ldap-234 and nss_ldap-238: 'getent passwd $LDAPUSER' it returns the correct output. I haven't tested with SSL at all, as my LDAP setup is non-SSL. but, run 'getent -s ldap passwd $LDAPUSER' and it hangs when NSCD is not running. (227 is the last version that I actively tested this command against, and it work there, it definetly doesn't work for me in 234/238). if nscd is running, it just doesn't give any results. From the point of view of the nss_ldap code, it should make NO difference that I run 'getent -s ldap ...' or had the ldap entry in nsswitch instead, UNLESS there was a change in glibc. Output as follows: x29 tests # getent -s ldap passwd pat nss_ldap: ==> _nss_ldap_enter nss_ldap: <== _nss_ldap_enter nss_ldap: ==> _nss_ldap_getbyname nss_ldap: ==> _nss_ldap_search_s nss_ldap: ==> do_init nss_ldap: ==> do_close_no_unbind nss_ldap: <== do_close_no_unbind (connection was not open) nss_ldap: ==> ldap_init nss_ldap: ==> _nss_ldap_enter nss_ldap: <== _nss_ldap_enter nss_ldap: ==> _nss_ldap_getbyname nss_ldap: ==> _nss_ldap_search_s nss_ldap: ==> do_init nss_ldap: ==> ldap_init (hang here) My emerge info output from my testing box: Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.4.3, glibc-2.3.4.20050125-r1, 2.6.10-gentoo-r4 i686) ================================================================= System uname: 2.6.10-gentoo-r4 i686 AMD Athlon(tm) XP 3000+ Gentoo Base System version 1.6.10 Python: dev-lang/python-2.3.5 [2.3.5 (#1, Feb 20 2005, 02:21:17)] distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] dev-lang/python: 2.3.5 sys-devel/autoconf: 2.59-r6, 2.13 sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.5 sys-devel/binutils: 2.15.92.0.2-r7 sys-devel/libtool: 1.5.14 virtual/os-headers: 2.6.8.1-r4 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-O3 -march=athlon-xp -ggdb3 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -march=athlon-xp -ggdb3 -pipe" DISTDIR="/usr/portage-distfiles" FEATURES="autoaddcvs autoconfig buildpkg ccache collision-protect confcache cvs digest distlocks sandbox sfperms userpriv nostrip" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j16" PKGDIR="/usr/portage-packages" PORTAGE_TMPDIR="/dev/shm" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://yamato/gentoo-portage" USE="x86 3dnow X Xaw3d aalib acl acpi alsa amd apache2 apm arts avi berkdb bitmap-fonts caps cdr cgi clearpasswd crypt cscope cups curl divx4linux dri dts dvd dvdr emboss encode erandom escreen esd ethereal expat f77 faac faad fam flac flash foomaticdb fortran gcj gd gdbm gif glx gnome gpm gstreamer ieee1394 imagemagick imap imlib innodb ipalias ipv6 jabber jack java javascript jikes jpeg junit kde ldap libwww lm_sensors mad maildir mcal md5sum mikmod mmx motif mozcalendar mozdevelop mozsvg mozxmlterm mp3 mpeg multitarget nas ncurses nls nptl oav objc offensive oggvorbis opengl pam pcap pda pdflib perl pic plotutils png pnp ppds python quicktime rdesktop readline rpc samba scanner sdl slang slp snmp socks5 speex spell sqlite sse ssl tcltk tcpd tetex theora tidy tiff truetype truetype-fonts type1 type1-fonts ungif usb userlocales v4l v4l2 wifi wmf wxwindows xinerama xml xml2 xmms xosd xrandr xscreensaver xv xvid zlib linguas_en" Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS i dont think any of us toolchain peeps pay attention to the nss code ... i'm pretty sure we've never patched it I've put nss_ldap-238 into the tree now, please try it. toolchain: there is definitely a change in the getent/nss stuff. previously 'getent -s SOURCE TYPE [FOO]' would use SOURCE only for TYPE data, and use the other sources in nsswitch.conf for other data during the same call (eg nss_ldap might need to do a host lookup to find the server). The newer versions of glibc apply SOURCE for ALL types of data during the getent call. so 'getent -s ldap passwd' fails when nss_ldap needs to use the file-based or dns-based data to find the LDAP server. This brokenness is definetly a result of glibc changes, not nss_ldap changes. I've tried a few glibc versions now, and it seems to work in some of them, but not others - and sometimes it works, sometimes it doesn't. please test as requested 2 months ago. I just installed a nss_ldap-239 machine and came across this problem. Downgrade to 226 fixed the problem. Does this help? or is your fix not in the 239 but only in 238? nss_ldap-239-r1 works for me on ~ppc. |