After upgrading openssh from 4.5_p1-r1p to 4.6_p1-r3 LPK LDAP authentication is broken. Included in this updates are changes to pam.d/sshd. Even without ignoring this config-update nothing works anymore. It's possible to switch to a LDAP account after logging in as root Reproducible: Always Steps to Reproduce: 1. login with a ldap account 2. see how it fails 3. Actual Results: No login is granted. The auth.log shows that the user doesn't belong to the group defined. The group membership is correct. Expected Results: Login should be permitted Here's the sshd_config: UseLPK yes #LpkLdapConf /etc/ldap.conf LpkServers ldap://1.2.3.4 LpkUserDN ou=users,ou=dmz-auth,o=dc,c=de,dc=company,dc=com LpkGroupDN ou=groups,ou=dmz-auth,o=dc,c=de,dc=company,dc=com LpkBindDN cn=Auth,dc=company,dc=com LpkBindPw Just_a_poor_password LpkServerGroup www1 LpkForceTLS no LpkSearchTimelimit 3 LpkBindTimelimit 3 Here are example messages from the auth.log Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'admin' Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'admin' Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'admin' Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'admin'
It's not clear to me how a config with 'LpkServerGroup www1' would result in an error message regarding 'admin'. This is definitely the config which is being used on the failing machine?
You're right. I was experimenting and created a test group admin. It should be "Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'ww1'. Sorry for that mistake.
Sorry to be picky, but you meant: www1 right? Your last comment shows the line as: Aug 20 13:26:12 www1 sshd[12622]: [LDAP] 'dkerwin' is not in 'ww1' Assuming that was a typo when commenting here and not in the config: Please show the output of: ldapsearch <whatever options you need to auth here> "(&(objectClass=posixGroup)(|(cn=www1)(memberUid=dkerwin))" and: ldapsearch <whatever options you need to auth here> "(&(objectClass=posixAccount)(objectClass=ldapPublicKey)(uid=dkerwin))" censoring as required. LdapServerGroup is functioning OK locally here with the latest version of the patch, so it's not clear where the problem is at the moment.
# ldapsearch -h 1.2.3.4 -D "cn=User,dc=company,dc=com" -W -x "(&(objectClass=posixGroup)(cn=www1)(memberUid=dkerwin))" Enter LDAP Password: # extended LDIF # # LDAPv3 # base <> with scope subtree # filter: (&(objectClass=posixGroup)(cn=www1)(memberUid=dkerwin)) # requesting: ALL # # www1, groups, dmz-auth, company.com dn: cn=www1,ou=groups,ou=dmz-auth,dc=company,dc=com cn: www1 gidNumber: 1100 description: SSH login group for www1 objectClass: posixGroup objectClass: top memberUid: dkerwin # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 And here's the user query. # dkerwin, users, dmz-auth, company.com dn: uid=dkerwin,ou=users,ou=dmz-auth,dc=company,dc=com uid: dkerwin mail: john.doe@noname.de objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson objectClass: posixAccount objectClass: top objectClass: shadowAccount objectClass: ldapPublicKey shadowMax: 99999 shadowWarning: 7 homeDirectory: /home/dkerwin gecos: Daniel Kerwin sshPublicKey: ssh-dss xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ....... uidNumber: 12345 cn: Daniel Kerwin loginShell: /usr/local/bin/my_shell shadowLastChange: 1234567 gidNumber: 12345 sn: dkerwin
The LpkUserDN/LpkGroupDn don't match the results of the query you should. You have 'o=dc,c=de,' in there. Assuming that also may be a transcription error, can you please show query logs from the ldap server while a login is attempted? I cannot see why ldapsearch would return valid answers and LPK not :/
Created attachment 128701 [details] Ldap query log
What's really strange is that if i switch to the user i can see that all group memberships are fine. :-(
Well, lpk and pam/nss_ldap are not using the same code, so it's possible for them to disagree. Having said that, I cannot see a reason for the failure you see I'm afraid. I've no real clue where to go from here.. :/ It's definitely the upgrade that broke it? Can you try reverting and see if that works?
I can revert but i still have servers running the "old version" with the identical config and they work. As a mentioned earlier this only fails with SSH. The only change to the systems was a openssh update :-(
Ok, please install ltrace and try this, replacing paths/filenames as required: ltrace -l $(ldd /usr/sbin/sshd|grep ldap|cut -d' ' -f1) /usr/sbin/sshd -d -p 8022 > /tmp/debug_log Try to connect to the sshd on port 8022, then please send me a copy of the debug_log file, either attach it to the bug or mail me it.
Ok. I'll do this but have to wait to tomorrow. I'll attach the file asap. Thanks
Created attachment 129079 [details] SSHD debug log (ltrace) SSHD debug log (ltrace)
The logs show the ldap query timing out. Please try increasing LpkSearchTimelimit and LpkBindTimelimit and see if that changes anything.
Changed the timelimits to 60s for both lines. Doesn't change anything. Password is requested max 2 seconds after starting the ssh session. Doesn't look like a timeout for me...
I did same testing today and maybe it has to do with a change to /etc/pam.d/sshd which is included in the update. This is the sshd config before the update (where it worked): auth include system-auth auth required pam_shells.so auth required pam_nologin.so account include system-auth password include system-auth session include system-auth and here the config after the update: auth required pam_shells.so auth required pam_nologin.so auth include system-auth account include system-auth password include system-auth session include system-auth Here's my system-auth: auth required pam_env.so auth sufficient pam_unix.so likeauth nullok auth sufficient /lib/security/$ISA/pam_ldap.so use_first_pass auth required pam_deny.so account required pam_unix.so account sufficient pam_ldap.so password required pam_cracklib.so difok=2 minlen=8 dcredit=2 ocredit=2 retry=3 password sufficient pam_unix.so nullok md5 shadow use_authtok password required pam_ldap.so use_authtok password required pam_deny.so session required pam_limits.so session required pam_unix.so session required pam_mkhomedir.so skel=/etc/skel/ umask=0077 session optional pam_ldap.so
make sure you're using latest shadow and you've properly updated your pam.d files via etc-update
(In reply to comment #16) > make sure you're using latest shadow and you've properly updated your pam.d > files via etc-update > The systems run shadow 4.0.18.1-r and i updated all configs after the ssh update. It works with old version but the new ssh version including the pam update doesn't work. I even checked the ldap querys from the openldap log and they return the right entry. I'm running out of ideas...
There's a solution for the problem. Neil send an email to me containing the following information: Removing the lines LpkSearchTimelimit 3 LpkBindTimelimit 3 makes it work like charm. Even if this gives me a working setup there's a bug in the patch or the Gentoo integration. These parameters should be configurable... Thanks to Neil
the fact that they're in the config file sounds like they're configurable to me
I got my bugzilla-password, finally 8). From the docs at http://dev.inversepath.com/openssh-lpk/ they should be configureable. But they changed the default-values from http://dev.inversepath.com/openssh-lpk/openssh-lpk-4.5p1-0.3.8.patch : + options->lpk.b_timeout.tv_sec = 0; + options->lpk.s_timeout.tv_sec = 0; to http://dev.inversepath.com/openssh-lpk/openssh-lpk-4.6p1-0.3.9.patch : + options->lpk.b_timeout.tv_sec = -1; + options->lpk.s_timeout.tv_sec = -1; Maybe they broke sth. with this change. I'm not a C-Professional, nor have spare time today to check this, so I'm not sure if it's related to this change. -- Kind regards, neil
vapier: I'm not sure on what basis you've decided this is resolved, but it isn't. That said, I cannot replicate any problems with the setting of either timeout here. The change to set the values to -1 on start was to fix a bug with the values being reset, and is working correctly as far as I can tell. Please run under gdb with the values set in a config file as you had them when it was not working, break on servconf.c:294, step through that code and watch if it is assigning the values from the config file. When you've reached line 298, run: print options->lpk.s_timeout and: print options->lpk.b_timeout And let me know what is shown.
(In reply to comment #21) > vapier: I'm not sure on what basis you've decided this is resolved, but it > isn't. > > That said, I cannot replicate any problems with the setting of either timeout > here. The change to set the values to -1 on start was to fix a bug with the > values being reset, and is working correctly as far as I can tell. > > Please run under gdb with the values set in a config file as you had them when > it was not working, break on servconf.c:294, step through that code and watch > if it is assigning the values from the config file. When you've reached line > 298, run: print options->lpk.s_timeout > and: print options->lpk.b_timeout > > And let me know what is shown. > I will start debugging it when i got some free time. It may no be resolved but i can use my systems again. This is a huge improvement and makes me happy. I'll paste the results soon.