Summary: | net-nds/openldap-2.3.41: slapd takes a long time to start when "group:" line in nsswitch.conf file includes "ldap" | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Andrei Iordache <andrei.iordache> |
Component: | [OLD] Server | Assignee: | Gentoo LDAP project <ldap-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | barzog, dschridde+gentoobugs, jkt, webmaster |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Test case for bug 222693 |
Description
Andrei Iordache
2008-05-18 17:34:32 UTC
How does your /etc/ldap.conf file look like? Do you have any line that starts with "nss_initgroups_ignoreusers" in there? If you do, does it include the user that your slapd is supposed to run as? (In reply to comment #1) > How does your /etc/ldap.conf file look like? Do you have any line that starts > with "nss_initgroups_ignoreusers" in there? If you do, does it include the user > that your slapd is supposed to run as? > Thank you for looking into this. My /etc/ldap.conf file is pretty much the one that nss_ldap installs by default, with only a few options modified like base, uri, ldap_version 3, rootbinddn, pam_password exop. There is a line "nss_initgroups_ignoreusers" (line 168) but it's commented. I also should mention that I don't know what that option is for and I have never used it. # Use backlinks for answering initgroups() #nss_initgroups backlink In the light of these I'm starting to think that maybe there's a problem with nss_ldap or pam_ldap because even if I put group: files [SUCCESS=return] ldap [UNAVAIL=return] in /etc/nsswitch.conf, it's like anything but 'files' and 'ldap' is ignored. Because the above line is supposed to mean that -- if what it was looked for is found in the files -- then the search should stop. But it doesn't and because slapd is not started then the slapd or slapd init script takes a long time to start looking-up data that itself is supposed to serve. On the other hand if I change in /etc/nsswitch.conf as following: passwd: files ldap shadow files ldap group: files then slapd starts instantly without problems. Is nscd running? It's an absolute requirement to do the fast startup. Somewhere there is an NSS lookup that is going to LDAP because it's not found in the base passwd layer. If your timeout is exactly 30 seconds, and you are using the EXACT timeout settings that the gentoo /etc/ldap.conf ships with, that means there are two lookups failing. Here's the block to look for: # For Gentoo's distribution of nss_ldap, as of 250-r1, we use these values nss_reconnect_tries 4 # number of times to double the sleep time nss_reconnect_sleeptime 1 # initial sleep value nss_reconnect_maxsleeptime 16 # max sleep value to cap at nss_reconnect_maxconntries 2 # how many tries before sleeping # This leads to a delay of 15 seconds (1+2+4+8=15) This specifically handles the case where the LDAP server is not reachable. timelimit and sizelimit only apply once the server is fully started. I suggest runninng the debugging of nscd to find what lookup isn't in /etc/passwd, and posting those details here. Sorry, I missed this in your second comment:
> On the other hand if I change in /etc/nsswitch.conf as following:
> passwd: files ldap
> shadow files ldap
> group: files
> then slapd starts instantly without problems.
This means that there are two groups that your system is trying to do a lookup of, and /etc/groups doesn't contain them, so it goes to LDAP.
I have had reports of this problem before, but absolutely nobody reported back on the full tracing, they all just did various dumb hacks around it.
I'd really like to know what two groups are being looked up. Enable 'logfile' and 'debug-level' in /etc/nscd.conf and then sort through the chaff to find what the two lookups that are going to nss_ldap are. Alternatively, emerge nss_ldap with USE=debug, and maybe add more of your own debugging inside the C source. I strongly suggest using -DDEBUG_SYSLOG during the compile as well, and having some syslog rules to catch the debug entries - they can contain confidential data, so do take care.
Also, the 'debug' option in ldap.conf doesn't help at all. It's debugging for the LDAP libraries, not nss_ldap/pam_ldap. This is documented in the nss_ldap manpage.
Created attachment 168474 [details] Test case for bug 222693 Comment on attachment 168474 [details] Test case for bug 222693 Thanks for replying, Robin I should start by saying that in the meanwhile I found an acceptable (for me) solution to this problem: # Reconnect policy: # hard_open: reconnect to DSA with exponential backoff if # opening connection failed # hard_init: reconnect to DSA with exponential backoff if # initializing connection failed # hard: alias for hard_open # soft: return immediately on server failure bind_policy soft In /etc/ldap.conf. If I set the bind policy parameter to soft, then the problem that I filled the bug for disappears. As far as I understand the parameter, it doesn't actually solve the problem but it hides it in the sense that if the first connection fails then subsequent connections are not performed. I could be wrong though. Now to come back to your suggestions. First of all, I absolutely do not use nscd, because it caches the information and I often need it to propagate as soon as possible and nscd delays the changes I make to LDAP. I don't have it started so I cannot debug it as you suggested. Is there really no way to debug this problem without it? You say you need to know which groups are being looked up in LDAP. When I start slapd, the group being looked-up -- in LDAP -- is 'ldap'. Also the same user is looked up. Please look at the testcase in the attachment so you see how I know. (In reply to comment #4) Thanks for replying, Robin I should start by saying that in the meanwhile I found an acceptable (for me) solution to this problem: # Reconnect policy: # hard_open: reconnect to DSA with exponential backoff if # opening connection failed # hard_init: reconnect to DSA with exponential backoff if # initializing connection failed # hard: alias for hard_open # soft: return immediately on server failure bind_policy soft In /etc/ldap.conf. If I set the bind policy parameter to soft, then the problem that I filled the bug for disappears. As far as I understand the parameter, it doesn't actually solve the problem but it hides it in the sense that if the first connection fails then subsequent connections are not performed. I could be wrong though. Now to come back to your suggestions. First of all, I absolutely do not use nscd, because it caches the information and I often need it to propagate as soon as possible and nscd delays the changes I make to LDAP. I don't have it started so I cannot debug it as you suggested. Is there really no way to debug this problem without it? > Sorry, I missed this in your second comment: > > On the other hand if I change in /etc/nsswitch.conf as following: > > passwd: files ldap > > shadow files ldap > > group: files > > then slapd starts instantly without problems. > > This means that there are two groups that your system is trying to do a lookup > of, and /etc/groups doesn't contain them, so it goes to LDAP. How can it look for the group information in LDAP if the group line in nsswitch.conf contains only 'files'? Shouldn't it ONLY look in the passwd file in that case? > I have had reports of this problem before, but absolutely nobody reported back > on the full tracing, they all just did various dumb hacks around it. > > I'd really like to know what two groups are being looked up. You say you need to know which groups are being looked up in LDAP. When I start slapd, the group being looked-up -- in LDAP -- is 'ldap'. Also the same user is looked up. Please look at the testcase in the attachment so you see how I know. > Enable 'logfile' > and 'debug-level' in /etc/nscd.conf and then sort through the chaff to find > what the two lookups that are going to nss_ldap are. Alternatively, emerge > nss_ldap with USE=debug, and maybe add more of your own debugging inside the C > source. I strongly suggest using -DDEBUG_SYSLOG during the compile as well, and > having some syslog rules to catch the debug entries - they can contain > confidential data, so do take care. > > Also, the 'debug' option in ldap.conf doesn't help at all. It's debugging for > the LDAP libraries, not nss_ldap/pam_ldap. This is documented in the nss_ldap > manpage. Please let me know if this proof is not sufficient and I'll try to recompile nss_ldap with USE=debug to see what I can figure that way, but I'm not that good with programming and debugging. I'll try my best though. I think I found a nice fix for this: I added the following (very long) line to /etc/ldap.conf: nss_initgroups_ignoreusers avahi,avahi-autoipd,backup,bin,daemon,dhcp,games,gdm,gnats,haldaemon,hplip,irc,klog,landscape,libuuid,list,lp,mail,man,messagebus,news,openldap,polkituser,proxy,pulse,root,sync,sys,syslog,uucp,www-data,mysql,ldap I suggest to recheck this for sanity (I just copied it over from an ubuntu ldap.conf and added mysql,ldap). And add a sane line like this to the default ldap.conf in gentoo. I confirm that adding "nss_initgroups_ignoreusers ldap" to ldap.conf fixes the issue. I was searching the problem and finally i found it in my case. I am going to describe the procedure that I followed. 1. Turn off nscd service and execute the next command nscd -d 2. In other console try to run next: bash -x /etc/init.d/slapd start 3. Script stopped when try to run "runuser", look the first console and you can see that some group could not be found. 4. Go to the /etc/security/limits.d and search that group and comment that line. In my case was pulse-rt. 5. try again start slapd service. I hope this information can be usefull. good luck. InCVS. |