99564 – sys-fs/udev hangs when running udevstart at boot, due to non-existant groups and nss search timeouts

Bug 99564 - sys-fs/udev hangs when running udevstart at boot, due to non-existant groups and nss search timeouts

Summary: sys-fs/udev hangs when running udevstart at boot, due to non-existant groups ...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	High critical (vote)
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Duplicates (9):	111495 128907 129680 132940 137535 138561 138901 142877 153456 (view as bug list)
Depends on:
Blocks:	udev-meta
	Show dependency tree

Reported:	2005-07-19 12:14 UTC by Robin Johnson
Modified:	2008-05-25 06:43 UTC (History)
CC List:	41 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Stop udev resolving names when loading rules (udev-noresolve.patch,3.76 KB, patch) 2006-07-07 06:51 UTC, Roy Marples (RETIRED)	Details \| Diff
Stop udev resolving names when loading rules (udev-lookup.patch,4.72 KB, patch) 2006-08-04 16:53 UTC, Roy Marples (RETIRED)	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Robin Johnson archtester

2005-07-19 12:14:02 UTC

The 50-udev.rules file contains this rule:
KERNEL=="tpm*",   NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"

This causes a nss lookup of 'tss' in the passwd data, and 'tss' in the group data.
In the case that nsswitch.conf contains:
group: files ldap
passwd: files ldap
(or 'compat' instead of 'files')
an ldap lookup is attempted for each item.

nss_ldap defaults to an indefinate search timeout, and each lookup is performed
multiple times. Even if you change the search timeout, you must still wait for
it to be performed multiple times (at least 4, but I lost count).

This should be resolved by making sure that EVERY user and group mentioned in
the 50-udev.rules file is in the base system /etc/passwd and /etc/group.

Reproducible: Always
Steps to Reproduce:
1. emerge udev nss_ldap
2. configure nss_ldap in nsswitch.conf and ldap.conf
3. reboot

Actual Results:  
machine hangs when running udevstart, waiting infinitely for nss_ldap to return
the data for the 'tss' user and group.

Expected Results:  
should boot properly.

Portage 2.0.51.22-r1 (default-linux/x86/2005.0, gcc-3.4.3, glibc-2.3.5-r0,
2.6.12-gentoo-r5 i686)
=================================================================
System uname: 2.6.12-gentoo-r5 i686 AMD Athlon(tm) Processor
Gentoo Base System version 1.6.13
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
dev-lang/python:     2.3.5, 2.4.1-r1
sys-apps/sandbox:    1.2.11
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.18-r1
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env
/usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config
/var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=i686 -fomit-frame-pointer"
DISTDIR="/mnt/distfiles"
FEATURES="autoconfig ccache distlocks fixpackages sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org
http://distro.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://mirror.iat.sfu.ca/gentoo-portage"
USE="x86 # 3dnow X a a52 aac aalib alsa apm audio avi bitmap-fonts cdr codecs
crypt cups dbus divx4linux eds emboss encode faad fam flac foomaticdb gdbm gif
gnome gpm gstreamer gtk gtk2 hal i'm imagemagick imlib ipv6 java jpeg junit ldap
libg++ libwww matroska mikmod mmx mono mozilla moznocompose moznoirc moznomail
mp3 mpeg ncurses nls not nptl nptlonly nvidia ogg oggvorbis opengl oss pam
pdflib perl png ppds python qt quicktime readline sdl server slang snmp spell
sqlite ssl svga tcpd truetype truetype-fonts type1-fonts usb v4l v4l2 vorbis
win32codecs wxwindows xine xinerama xml xml2 xmms xv xvid zeroconf zlib
userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS

Comment 1 SpanKY gentoo-dev

2005-07-19 13:07:30 UTC

how exactly is that a fix ?  if your system doesnt use /etc/{passwd,group} for
authentication, then why should udev ?  and really, why should udev care beyond
the API that the libc presents it ?

Comment 2 Robin Johnson archtester

2005-07-19 13:13:49 UTC

/etc/passwd and /etc/group ARE still used. LDAP contains only user accounts, no
system accounts (because system accounts are NOT easily portable over multiple
UNIX systems, eg IRIX, Solarix and Linux).

tss is a system account that should be shipped with baselayout instead of only
being created by app-crypt/trousers.

I'm saying the entire bug is fixed by simply shipping with a 'tss' entry in both
/etc/passwd and /etc/group.

The nsswitch.conf I noted makes NSS/libc look in the files FIRST, then check
LDAP if the entry is not found in the files.

Comment 3 SpanKY gentoo-dev

2005-07-19 17:47:18 UTC

sorry, got your logic reversed

regardless, i dont think tss belongs in baselayout (especially since we're
trying to cut out a lot of the crap users/groups)

Comment 4 Robin Johnson archtester

2005-07-19 18:00:45 UTC

ok, if we won't add tss to baselayout, could we please remove it's use from the
default udev, so that udev doesn't try it by default (it would make sense to
only probe it if the related packages are installed).

Comment 5 Martin Schlemmer (RETIRED) gentoo-dev

2005-07-21 00:23:50 UTC

I would have thought that if the user patches his kernel with tpm or enable it
in 2.6.12 (which includes it), that he would merge trousers as well?

Comment 6 Robin Johnson archtester

2005-07-21 01:14:52 UTC

I don't use TPM at all. It's the nss-lookups themselves that are causing 
trouble. Either they should have an entry in the baselayout /etc/passwd (so the 
lookup doesn't hit ldap), or the lookup shouldn't take place.

Comment 7 SpanKY gentoo-dev

2005-07-21 09:17:09 UTC

seems reasonable that udev shouldnt look up group info unless the rule is used
... any thoughts gregkh ? :)

Comment 8 Christophe PEREZ 2005-09-08 18:25:31 UTC

Hi,

I have this problem too. Can't boot with udev-068 and it's default config.
I had to comment the line :
KERNEL=="tpm*",   NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"
in 50-udev.rules
to boot.

But, what is the real good solution please ?

English is not my native language and I can't understand all what you mean here...

Comment 9 Dane 2005-10-29 21:34:18 UTC

To fix this problem:
- press ctrl+c while your system is hung waiting for udev; this will give you 
a prompt to enter your root password or press ctrl+d. If you don't have initrd 
you'll need to use your livecd.
- enter your root password.
- type 'mount -o rw,remount /'
- then comment the entry for the tss user/group in /etc/udev/rules.d/50-
udev.rules (line 244 on my revision).
- type 'reboot'

Comment 10 Dane 2005-10-29 21:43:27 UTC

This is still a problem on udev-70-r1. Anyone that does the openldap 
authentication guide will be effected by this problem. It's critical because 
their system is not booting. An inline sed command should be added to the next 
udev ebuild revision... sed -
i 's/"KERNEL==\"tpm/# "KERNEL==\"tpm/' /etc/udev/rules.d/50-udev.rules

Comment 11 Greg Kroah-Hartman (RETIRED) gentoo-dev

2005-10-31 23:16:23 UTC

No, the "correct" fix is to not lookup the group information unless we are going
to apply a rule.

But, we just added the ability to "compile" the rules so we don't parse them
all every time we get a device, and that will be in the next release, which
should cause things like this to only timeout once (not perfict, but better.)

If you wish to fix this in the "real" way, patches are always accepted
upstream, otherwise it will have to wait until I get some spare time.

Comment 12 Greg Kroah-Hartman (RETIRED) gentoo-dev

2005-11-10 23:04:55 UTC

*** Bug 111495 has been marked as a duplicate of this bug. ***

Comment 13 Bachelier Vincent 2005-12-09 01:11:22 UTC

udevstart don't run at all at start with only version 077 and 077-r1
I mask this, and 073 appear ... then all done with it (I use ~x86 of udev
because of they need be kde 3.5)
what's up in this version ?

Comment 14 Bachelier Vincent 2005-12-09 05:18:37 UTC

ok
udev-077 and plus after missing dependancy to baselayout-0.12 and plus ...
in 0.11 they miss to launch udevstart correctly with new udev

adding dependancy should correct it

Comment 15 Greg Kroah-Hartman (RETIRED) gentoo-dev

2005-12-09 19:03:05 UTC

Yes, there's nothing I can do in the udev package for the ldap issues, unless
someone else has some ideas.

If you want to do this, I recommend changing all of the "GROUP=" portions in your 
udev config file to be numbers.  That way we don't look up the id.

And if you have other udev issues unrelated to the bug subject, please
open new bugs for it.  077-r3 is out which should fix the udevd and 
baselayout problem.

Comment 16 SpanKY gentoo-dev

2005-12-09 19:11:34 UTC

i think the idea was that udev should only look up GROUPs if they are actually
needed ... that would "resolve" this bug for most people

Comment 17 Greg Kroah-Hartman (RETIRED) gentoo-dev

2005-12-09 19:56:31 UTC

Ah, got it, that makes more sense, I'll reopen this...

Comment 18 Stefaan De Roeck (RETIRED) gentoo-dev

2005-12-21 02:41:45 UTC

My problem is almost identical, but on my configuration it triggers with enabling ldap-lookups for hosts in nsswitch.conf (ldap is also enabled for passwd/group etc. but that doesn't seem to cause an error):
hosts:       files dns ldap

I have no self-written udev-rules whatsoever, the standard ones I have in /etc/udev/rules.de are: 05-udev-early.rules  30-svgalib.rules  50-udev.rules  60-vmware.rules

What could trigger this behaviour?

Comment 19 Dane Watson 2006-01-09 02:32:40 UTC

Status update? 

Can we get an item added to the openldap authentication guide? I attempted to contact the author but he seems out of gentoo now

Comment 20 Kevin Parent 2006-01-22 14:47:31 UTC

I too use ldap authentication for users.  I was first bitten by this bug upgrading from 058 to 068 (I think it was 068).  Found the solution here and commented out the "KERNEL==tmp..." line in /etc/udev/rules.d/50-udev.rules.

I just updated from 070 to 079.  I don't have any custom rules nor do I ever change the udev rules, so I blinded replaced my 50-udev.rules with the update using etc-update.  MY BAD!  So I spent 2 hours scouring the forums.  Should have checked here first!

At the very least, there should be a BIG FAT RED einfo warning at the end of the emerge process regarding the problem and the simple solution.

IMHO, a better solution would be to remove the "KERNEL==tpm..." rule from udev completely and have the trousers ebuild add it to 50-udev.rules or maybe its own  rule file like 99-trousers.rules.

The reason?  How many Gentoo users use trousers?  I did a search for trousers in the forums - only 7 results, none having to do with the app.  Did a search in bugs.gentoo.org - 2 bugs filed, both resolved.  The results may be misleading I admit....

So how about it?  Can we at least get the einfo warning in the ebuild?

Comment 21 Kevin Parent 2006-01-22 14:59:13 UTC

OK, did a little more digging... Seems like I was talking out of my rear end.  tpm is kernel related.

How about a kernel sanity check for the udev ebuld?  If the kernel is configured for tpm, then insert the needed udev rule.  If not, don't add it

Comment 22 SpanKY gentoo-dev

2006-01-22 15:07:07 UTC

not acceptable ... kernel can be changed later

Comment 23 Kevin Parent 2006-01-22 15:10:10 UTC

(In reply to comment #22)
> not acceptable ... kernel can be changed later
> 

??? Care to elaborate?

I dont have tpm in my kernel, so why should I get a udev rule that causes my system to hang?

Comment 24 Kevin Parent 2006-01-22 15:14:16 UTC

(In reply to comment #23)
> (In reply to comment #22)
> > not acceptable ... kernel can be changed later
> > 
> 
> ??? Care to elaborate?
> 
> I dont have tpm in my kernel, so why should I get a udev rule that causes my
> system to hang?
> 

Nevermind, I see ur point.

How about the einfo warning?

Comment 25 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-01-22 16:13:51 UTC

No, we are working on the proper solution in udev, sorry for the delay...

Comment 26 Dane Watson 2006-01-22 17:26:35 UTC

I may be a goof here but isn't it bad to have a blocker severity bug open since July 19th 2005? Sure it should be downgraded a bit but nobody has done that & then it just looks bad to have this still open. I think there are some very obvious, easy to implement, solutions listed here and you're ignoring them in what I assume is an attempt to get upstream to fix the problem... I normally don't rant to devs but frankly *I think you're making bad choices regarding this bug* - it's fine to persue it upstream but lets get a working solution in like 'yesturyear'...

Comment 27 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-01-22 19:25:37 UTC

There, it's not a "blocker" bug anymore.

And if you don't like to "rant" at developers, do not do so, it only makes us
want to go work on other things instead...

Comment 28 Vieri 2006-02-28 05:23:01 UTC

Comenting out the line
KERNEL=="tpm*",   NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"
is a quick solution to avoid udev eternal lookups.
However there's another step that also blocks the init process: "Cleaning /tmp"
/etc/init.d/bootmisc
on the line
chown 0:0 /tmp/.{ICE,X11}-unix

If I comment that line out then the system boots ok (not a definite solution though).

System is EM64T (amd64 iso), latest udev and latest baselayout.
nsswitch.conf needs ldap in my case.

Comment 29 Daniel Schindler 2006-02-28 06:00:39 UTC

i have the same problem. i have changed chown 0:0 ... back in root:root and now it works again (i am using ldap for authentification but root is in passwd file).

Comment 30 Sumit Khanna 2006-03-04 00:36:26 UTC

I ran into this problem as well on two machines, a Pentium 3 and a PentiumD-64 which use ldap for nss and pam. On top of that, I had trouble remounting / as rw even with -o remount,rw (said / wasn't all ready mounted...leave out remount and it said it was all ready mounted)

I had to boot from a liveCD to fix this. ldap obviously can't be queried before ldap or the network have a chance to come up. This is a pretty serious error

Comment 31 archiebald 2006-03-18 02:50:32 UTC

Same problem here on 2 of 3 machines, but no real solution. Only a init-script with a mount --bind makes the system booting and using ldap. But on one machine, i still cannot login while getent is getting the right information.

Comment 32 Serge 2006-03-19 14:18:16 UTC

(In reply to comment #31)
Same problem here on 1 machine with ldap
It got a long time to understand the problem, the first cause was that this machine is running unattended without display and kb. And it's only after searching in the forums that I found this Bug.
It's difficult to understand why this "old" bug has no solution or has not any red flag during installation. For me it's the first time that my gentoo didn't Boot.
I have learned to always have a LiveCd whis me.....

Comment 33 Kevin Parent 2006-03-20 08:55:09 UTC

As I stated earlier in Comment #20, commenting out the "tpm" line in /etc/udev/rules.d/50-udev.rules works for me, however, I was upgrading an old dormant machine and came across info on the web regarding an option in /etc/nsswitch.conf

According to the manpage for nsswitch.con ( # man nsswitch.conf ) there is a paramter for the status of the lookup and an action.

....Snippet from manpage nsswitch.conf begin.....

       hosts:          dns [!UNAVAIL=return] files
       networks:       nis [NOTFOUND=return] files
       ethers:         nis [NOTFOUND=return] files
       protocols:      nis [NOTFOUND=return] files
       rpc:            nis [NOTFOUND=return] files
       services:       nis [NOTFOUND=return] files

       The configuration specification for each database can contain two  dif-
       ferent items:
       * The service specification like `files', `db', or `nis'.
       * The reaction on lookup result like `[NOTFOUND=return]'.

.....later in manpage....

       The second item in the specification gives the user much finer  control
       on  the  lookup  process.   Action items are placed between two service
       names and are written within brackets.  The general form is

       `[' ( `!'? STATUS `=' ACTION )+ `]'

       where

       STATUS => success | notfound | unavail | tryagain
       ACTION => return | continue

       The case of the keywords is insignificant. The STATUS  values  are  the
       results  of  a  call  to a lookup function of a specific service.  They
       mean:

       success
              No error occurred and the wanted entry is returned. The  default
              action for this is `return'.

       notfound
              The  lookup process works ok but the needed value was not found.
              The default action is `continue'.

       unavail
              The service is permanently unavailable.  This  can  either  mean
              the needed file is not available, or, for DNS, the server is not
              available or does not allow  queries.   The  default  action  is
              `continue'.

       tryagain
              The  service is temporarily unavailable.  This could mean a file
              is locked or a server currently cannot accept more  connections.
              The default action is `continue'.

....End of Snippet....

That being the case, perhaps a setting such as

passwd:    ldap [NOTFOUND=return] db files

or

passwd:    ldap [!UNAVAIL=return] db files

in /etc/nsswitch.conf may be the correct workaround for this problem.  I haven't tried it my self yet since commenting out the "tpm" line in the udev rules did the trick for me.

Anyone try this or have experience with this paramater?

Comment 34 Alastair Tse (RETIRED) gentoo-dev

2006-03-31 18:19:44 UTC

there's an alternate solution to this:

http://www.nabble.com/Re%3A-nss_ldap-and-udevd-p3202151.html

works for me with nss_ldap-249 and udev-079

Comment 35 Jakub Moc (RETIRED) gentoo-dev

2006-04-05 06:41:14 UTC

*** Bug 128907 has been marked as a duplicate of this bug. ***

Comment 36 Jakub Moc (RETIRED) gentoo-dev

2006-04-12 05:26:39 UTC

*** Bug 129680 has been marked as a duplicate of this bug. ***

Comment 37 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-05-10 14:05:57 UTC

*** Bug 132940 has been marked as a duplicate of this bug. ***

Comment 38 Dane 2006-05-10 19:38:31 UTC

Switched to Ubuntu. No longer interested in gentoo issues.

Comment 39 Sam Walliser 2006-05-23 05:05:27 UTC

Make the UDEV ebuild react to the ldap USE flag. (and comment that line out if ldap is being used)

Its so simple you should praise and worship me. 

Hail to the king.

Comment 40 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-05-23 11:56:15 UTC

No praise is worthy without a patch :(

Comment 41 Evert 2006-05-27 14:14:01 UTC

Since yesterday, I had this problem too. After commenting out the tss line from /etc/udev/rules.d/50-udev.rules, no problem anymore with udev, but wiping /tmp took far too long. Also named seemed to hang, at least long enough for me too press ctrl-alt-del.
After digging emerge.log I found out sys-auth/nss_ldap was upgraded from 239-r1 to 249 since the last reboot. Degraded solved the problem, but of course I want to stay up2date so I tried the alternate solution from comment #34 and after upgrading sys-auth/nss_ldap to 249 again, the problem didn't come back.
So, I think the solution from comment #34 rocks, just add the next line to /etc/ldap.conf:

bind_policy soft

Comment 42 Andrej Filipcic 2006-05-27 14:27:02 UTC

"bind_policy soft" is not really a good solution. If something happens with the network connection on the client, user processes would break. On a large computing cluster, the jobs are starting to crash instead of waiting some time for connection to come up. I had to downgrade to 239-r1 due to that problem.

Comment 43 Evert 2006-05-28 07:14:02 UTC

So if I understand right, the whole problem of this bug is caused by sys-auth/nss_ldap-249 or am I wrong now?

Comment 44 Robin Johnson archtester

2006-05-28 13:00:24 UTC

evert: no, it's not a bug with nss_ldap.
this bug will occur with any network based NSS lookup, and that service is not yet available at the time udev is doing lookups of entries not in the basic system.

As it stands, there are two solutions better than messing with the the NSS service.

1. (this is the proper solution) fix udev to not look up the tss entry (that it really doesn't need to). see comment #11 from gregkh.
2. add tss entries back to /etc/{passwd,group}. this is against gentoo policy, so they were removed from the files provided by baselayout, and are with the tss/tpm package instead.
3. comment out the tss/tpm entries in the udev rules.

Using 'bind_policy soft' is dangerous as it can break applications later.
Messing with nsswitch.conf is not desirable because you still need to wait for the 60 second timeout for each lookup.

Comment 45 Christopher Hogan 2006-05-31 02:00:17 UTC

I was hit by this bug after updating over the weekend. I didn't find this bug report until after I fixed it on my system. My thought was that services should not be trying to access the network until the network is up. I created a solution that prevents this problem on LDAP systems. It could be modified for other network-based authentication schemes.

I added the following to /etc/conf.d/net:

postup() {
	if test -a /etc/nsswitch.conf.net; then
		mv /etc/nsswitch.conf.net /etc/nsswitch.conf
	fi
}

postdown() {
	if grep ldap /etc/nsswitch.conf > /dev/null ; then
		cp -a /etc/nsswitch.conf /etc/nsswitch.conf.net
		sed 's/ldap//' /etc/nsswitch.conf.net > /etc/nsswitch.conf
	fi
}

This solution prevents hangs while waiting for the ldap server to fail due to the interface not being up.

Comment 46 Evert 2006-05-31 08:23:46 UTC

I had a similar workaround before I notice this bug, but then using /etc/conf.d/local.{start,stop}. This workaround only doesn't work after a system crash since in that case, local.stop isn't called and nsswitch.conf is still refering to ldap at boot.

Comment 47 Sébastien Fabbro (RETIRED) gentoo-dev

2006-06-02 08:41:27 UTC

I experienced also hangs with udev-087, baselayout-1.11.14-r8 and nss_ldap-249 when booting on our systems. I fixed them temporarily with the solutions in this bug:
- hang on starting udev, fixed with commenting out the tpm line in /etc/udev/rules.d/50-udev.rules
- hang on cleaning out /tmp, fixed with changing the line chown 0:0 to chown root.root in /etc/init.d/bootmisc
Sorry, I can't help more on this for a better solution.

Comment 48 Bel Zébute 2006-06-03 03:17:22 UTC

Thanks all.  Just got hit by this bug.  For myself, since there is obviously no definitive fix, I'll simply downgrade nss_ldap until someone puts their pants on.

Comment 49 Kevin Bryan 2006-06-06 21:47:28 UTC

I took the other approach and simply added a tss user/group.  If it's supposed to be a system level thing, it might as well be there.  Also this method means I don't have to worry about missing upgrades to nss_ldap or udev or whatever.

Comment 50 Lindsay Haisley 2006-06-08 20:36:03 UTC

I had the same problem on a desktop system here which, after working just fine for a year or so, became nearly unbootable and unusable after an emerge -uD world today.  Booting took nearly 45 minutes and many operations involving authentication took many minutes to complete.  nsswitch.conf contained:

passwd: files ldap
shadow: files ldap
group:  files ldap

I took ldap out of the auth mix here, commented out the tpm entry in /etc/udev/rules.d/50-udev.rules, backversioned nss_ldap and things seem to be back to normal.

Part of the problem here is that I need ldap client behavior in several user apps such as evolution, and need basic ldap tools for working with a remote ldap server, however I don't need ldap authentication or to have ldap client functionality built into system components.  Setting USE=ldap in make.conf, as I have it, seems like a one-size-fits-all solution that doesn't always fit.  Perhaps applications which can optionally be built with ldap client capabilities could USE an alternate flag such as "ldap-client".  

USE flag or not, I'm not sure why udev and nss_ldap are trying to bind, or even allowed to try to bind to _any_ ldap server before either the network or the local ldap server are started during boot.  Like Sebastien, I'm waiting for Greg KH and others to get their pants on and figure out just where this needs to be addressed.  It's really nasty!  If it had happened on my commercial server rather than my desktop it would have cost me customers.

Comment 51 Bryan Jacobs 2006-06-11 13:25:17 UTC

Why is this bug still not resolved?

I remember having this problem when I switched to LDAP users around a year ago.  It was an issue then - and now upgrading to the "stable" nss_ldap-249 breaks my entire system boot.

Why not just remove the trousers line from udev rules, already?  Would that be such a terrible consequence?  The trousers ebuild could add it back in, or put it in its own udev rulefile - then at least you wouldn't be breaking us normal users who don't even have the component that's indirectly causing the issue.

Or remask nss_ldap-249, since it makes this issue resurface.

Comment 52 Robin Johnson archtester

2006-06-14 18:23:36 UTC

for everybody following here: nss_ldap-250-r1 is in the tree, and documents the timeout functionality, as well as using much shorter defaults (timeout 15 seconds per lookup instead of 124 seconds). Please do read my blog post linked from the ChangeLog about this.

Comment 53 Jakub Moc (RETIRED) gentoo-dev

2006-06-22 00:17:02 UTC

*** Bug 137535 has been marked as a duplicate of this bug. ***

Comment 54 Neagul Marian 2006-06-30 00:35:47 UTC

Hy,

   I have the same problem: udev-087-r1 and nss_ldap-249.
   Is there a solution to this problem? I have 2 production servers that were hit by this problem!!!

Comment 55 Jakub Moc (RETIRED) gentoo-dev

2006-06-30 02:17:14 UTC

*** Bug 138561 has been marked as a duplicate of this bug. ***

Comment 56 Robin Johnson archtester

2006-06-30 02:37:36 UTC

Neagul Marian: upgrade to nss_ldap-250-r1

Comment 57 Christopher Lee Thomas 2006-07-02 02:35:28 UTC

There is no real fix yet! - but there is a workaround. The workaround has two steps:

1. You have to do either of this two:
  a. Create user "tss" and group "tss"
  b. comment the line following line in '/etc/udev/rules.d/50-udev.rules':
     "KERNEL=="tpm*",   NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"

2. Emerge "nss_ldapnss_ldap-250-r1" - else it hangs forever on "Cleaning /tmp directory ..."

Comment 58 Jakub Moc (RETIRED) gentoo-dev

2006-07-02 12:46:59 UTC

*** Bug 138901 has been marked as a duplicate of this bug. ***

Comment 59 Christian Fernandez 2006-07-06 14:38:21 UTC

(In reply to comment #57)
> There is no real fix yet! - 

and what are they waiting for?? common LDAP today is a mayor componet of a corporate network enviroment, as me I have 340 desktops and 67 servers running gentoo/ldap and other services... everyone I know have move from NIS to ldap
why is this been treat as a second level issue?

if I had the time and understanding of the udev code/ldap I will give a hand but in this one I have to sit back and wait.. PLEASE DO SOMETHING

Comment 60 Roy Marples (RETIRED) gentoo-dev

2006-07-07 06:51:31 UTC

Created attachment 91130 [details, diff]
Stop udev resolving names when loading rules

This should solve the problem of udev looking up the uid/gid's unless adding a device, but I don't have a spare LDAP server to test against, and cannot set one up easily.

Comment 61 Roy Marples (RETIRED) gentoo-dev

2006-07-27 02:54:19 UTC

What, no-one's even tested this patch to fix this "critical" bug?

Well, I was able to test against our production LDAP today as our server needed to be rebooted for a hardware upgrade so I took the opportunity to test the patch and it seems to work.

Greg, is this OK to add to portage?

Comment 62 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-07-28 10:37:14 UTC

Sorry, been busy with different conferences right now.

That patch doesn't look right, it takes out all group and owner lookups, which
would break non-ldap systems, right?

Comment 63 Roy Marples (RETIRED) gentoo-dev

2006-07-28 10:56:53 UTC

(In reply to comment #62)
> Sorry, been busy with different conferences right now.
> 
> That patch doesn't look right, it takes out all group and owner lookups, which
> would break non-ldap systems, right?

The patch basically only looks up uid/gid when a rule is being applied.
However, this lookup is made each time the rule is applied.

I've tested this on non-LDAP and LDAP systems and it works 100% for me.

Comment 64 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-07-28 16:56:38 UTC

Ok, care to post it upstream at the linux-hotplug-devel mailing list so that
the upstream developers can comment on it?  If they accept it, I'll add
it to the build.

Comment 65 Roy Marples (RETIRED) gentoo-dev

2006-08-04 16:53:43 UTC

Created attachment 93451 [details, diff]
Stop udev resolving names when loading rules

New patch remembers looked up uid/gid in the rule, making us more efficient than before :)

C'mon guys, there's plently of you here! Do these patches fix it for you?

Comment 66 Bel Zébute 2006-08-04 16:57:23 UTC

(In reply to comment #65)

> C'mon guys, there's plently of you here! Do these patches fix it for you?

I disabled upgrading of nss-ldap until this is fixed. :P  I simply don't have the time to break things.

I want to salute your effort though.  It's truly appreciated.

Comment 67 Roy Marples (RETIRED) gentoo-dev

2006-08-04 19:39:09 UTC

Comment on attachment 93451 [details, diff]
Stop udev resolving names when loading rules

Addition to patch does not work.

Comment 68 Jakub Moc (RETIRED) gentoo-dev

2006-08-05 07:58:20 UTC

*** Bug 142877 has been marked as a duplicate of this bug. ***

Comment 69 Chan Min Wai 2006-08-05 10:03:46 UTC

After using the new nss_ldap

I've pump into another bug....
This help
http://forums.gentoo.org/viewtopic-t-477895-highlight-chroot+named.html

My dear Friends, it have been a year and it HAVE TO BE SOLVE, Please.

Comment 70 SpanKY gentoo-dev

2006-08-05 14:00:24 UTC

that issue is totally unrelated

if you have nothing to contribute, then dont bother speaking, it just pisses off people

Comment 71 Roy Marples (RETIRED) gentoo-dev

2006-08-08 02:23:29 UTC

(In reply to comment #64)
> Ok, care to post it upstream at the linux-hotplug-devel mailing list so that
> the upstream developers can comment on it?  If they accept it, I'll add
> it to the build.
> 

Upstream (well, Kay) seems to think that this patch is bad as it would slow down users who have thousands of devices and that the issue should be fixed by adding any user/group combos to passwd/group files.

http://sourceforge.net/mailarchive/forum.php?thread_id=28927892&forum_id=3157
http://sourceforge.net/mailarchive/forum.php?thread_id=29547555&forum_id=3157

As previously discused here, that is a bad idea.

So upstream doesn't like the patch and based on that discussion will probably never fix it. An impasse has been reached. Interesting to note that a debian guy piped up saying that this issue also applies to Debian and could not see anything wrong with my patch.

If anyone has any more bright ideas then I'm all ears. However, my patch does solve the immediate issue so if no-one speaks up (ie Greg) then I'll add it to portage in a few days.

Comment 72 Doug Goldstein (RETIRED) gentoo-dev

2006-08-08 06:32:33 UTC

Just noticed this patch. I missed it when you first posted it Uberlord. Works awesome for me. Now the only issue is the pause after Cleaning /tmp... But I believe nss_ldap-250 made this pause more managable since it was only about 15 seconds.

Good job with the patch. :)

Comment 73 Roy Marples (RETIRED) gentoo-dev

2006-08-08 06:41:32 UTC

(In reply to comment #72)
> Now the only issue is the pause after Cleaning /tmp...

That's fixed with baselayout-1.12.4-r2
We don't do a chown anymore as bsd doesn't have root:root and 0:0 caused network lookups for names 0:0 due to chown design - see bug #139411

Comment 74 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-08-08 09:03:35 UTC

I'll just remove the tpm group/user stuff from our rules file, don't add the
patch to the package.

Then the tpm userspace package can add it as a rules file, if the user installs
it, as then there will be such a user.

That sound ok to everyone?

And yes, I agree with Kay, we shouldn't be shipping a rules file with groups/users
specified in it that we do not include in our baselayout.

Comment 75 Doug Goldstein (RETIRED) gentoo-dev

2006-08-08 15:36:54 UTC

Uberlord... baselayout-1.12.4-r2 might fix it but it's not stable... And I haven't heard of 1.12.x becoming stable anytime soon. There's been a lot of requests for it to become stable....

Comment 76 Jakub Moc (RETIRED) gentoo-dev

2006-08-13 11:23:01 UTC

*** Bug 143795 has been marked as a duplicate of this bug. ***

Comment 77 Jakub Moc (RETIRED) gentoo-dev

2006-08-13 11:33:21 UTC

*** Bug 143795 has been marked as a duplicate of this bug. ***

Comment 78 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-08-30 21:37:02 UTC

Will be fixed in 098 udev release, please try this.  The offending entry is
now gone from the file.

The TPM package will have to add their own udev rule file, if they need it, but
if they do that, the user/group will have been created.

Comment 79 Jakub Moc (RETIRED) gentoo-dev

2006-10-30 11:28:12 UTC

*** Bug 153456 has been marked as a duplicate of this bug. ***

Comment 80 Pedro Rebello de Andrade 2007-06-08 02:32:43 UTC

Shouldn't the solution for this bug be:

"Add the following line to /etc/ldap.conf:

 nss_initgroups_ignoreusers tss 
"
Also, shouldn't this solution become standard for stock distribution of nss_ldap for EVERY local system group? LDAP queries shouldn't be used to find out if user 'tape' or user 'djbdns' exist. This information is suposed to be local and remain local. Otherwise every configuration that queries LDAP will have a bloated set of users and groups.
 

P.S. - I personally think that /etc/ldap.conf should be renamed to /etc/nssldap.conf to reduce naming conflicts and confusion.

Comment 81 Robin Johnson archtester

2007-06-08 04:56:30 UTC

pedrorandrade: it's expected that you add yourself to the CC list if you post on a bug and expect a response.

You are attacking the problem at the wrong level.
If your system is correctly configured, every lookup for a system user will hit the files backend. The 'tss' lookup problem happened because the user got removed from /etc/passwd, which meant on LDAP systems, the lookup went to ldap next, which wasn't available because the system was still booting, combined with the very long timeouts that upstream nss_ldap has as defaults (and we patched back down in 250-r1).

Using the nss_initgroups_ignoreusers configuration option would not help the original situation at all, because the lookup was for the numeric uid of the user, not what groups they were in.

If you really want to be helpful on the seperate issue of initgroups_ignoreusers, write a patch to nss_ldap that permits only ldap users to be in ldap groups, and considers local users to never be in ldap groups. (And make sure the patch works when the LDAP server is not available).

Comment 82 Pedro Rebello de Andrade 2007-06-09 14:09:21 UTC

(In reply to comment #81)
> pedrorandrade: it's expected that you add yourself to the CC list if you post
> on a bug and expect a response.
Sorry. I very seldomly report/comment on bugs. I'll keep that in mind.

(snip)
> Using the nss_initgroups_ignoreusers configuration option would not help the
> original situation at all, because the lookup was for the numeric uid of the
> user, not what groups they were in.
OK. Got it. Numeric ID, NOT uid.

> If you really want to be helpful on the seperate issue of
> initgroups_ignoreusers, write a patch to nss_ldap that permits only ldap users
> to be in ldap groups, and considers local users to never be in ldap groups.
Don't really know how to write and submit a patch, but i'll get into it.

> (And make sure the patch works when the LDAP server is not available).
I guess that nss_initgroups_ignoreuser will never try to contact the LDAP server for the listed users so I don't see a problem there...

adam.carheden
andrej.filipcic
ari
bdowney
ben
betelgeuse
bicatali
bugs
cardoe
chris.c.hogan
chris
chris
chris
daniel
dcmwai
doctorzoidberg
ellingsw+20942
evert.gentoo
fmouse-gentoo
gfa
gregkh
hansmi
imperito
jhartrick
kparent
lazor
marian
martin.donnelly
martin.pelikan
pedrorandrade
ramereth
rek2
rockoo
stefaan
stephane
TenToThe8th
tobler_pc
uberlord
weyhan
wyvern5
zctech