Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 147625 - portage-2.1.x error 3328 with >sys-auth/nss_ldap-239-r1 - nss_ldap causes SIGPIPE
Summary: portage-2.1.x error 3328 with >sys-auth/nss_ldap-239-r1 - nss_ldap causes SIG...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo LDAP project
URL: http://bugzilla.padl.com/show_bug.cgi...
Whiteboard:
Keywords:
: 138570 148428 152237 152539 152775 153438 153852 154076 154309 154373 154585 (view as bug list)
Depends on: 156511
Blocks:
  Show dependency tree
 
Reported: 2006-09-14 16:01 UTC by Jason Short
Modified: 2007-02-06 05:25 UTC (History)
18 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
use `id -G portage` instead of grp.getgrall() (groups.patch,862 bytes, patch)
2007-01-04 13:20 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Short 2006-09-14 16:01:17 UTC
!!! No gcc found. You probably need to 'source /etc/profile'
!!! to update the environment of this terminal and possibly
!!! other terminals also.
Portage 2.1.1 (default-linux/x86/2006.1, [unavailable], glibc-2.4-r3, 2.6.17.6 i686)
=================================================================
System uname: 
Gentoo Base System version 1.12.4
Last Sync: Tue, 12 Sep 2006 20:20:01 +0000
app-admin/eselect-compiler: [Not Present]
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r5
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -march=i686"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig buildpkg distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp://gentoo.corp.epsiia.com"
LINGUAS=""
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://gentoo.corp.epsiia.com/gentoo-portage"
USE="x86 apache2 bash-completion berkdb bitmap-fonts cgi cli crypt cups dlloader dri elibc_glibc fortran ftp gdbm gpm iconv imap input_devices_evdev input_devices_keyboard input_devices_mouse ipv6 isdnlog kerberos kernel_linux ldap libg++ mysql ncurses nls nptl nptlonly ntp openntpd pam pcre perl pic ppds pppd python readline reflection session soap spl ssl sysvipc tcpd truetype-fonts type1-fonts udev unicode userland_GNU userlocales video_cards_apm video_cards_ark video_cards_ati video_cards_chips video_cards_cirrus video_cards_cyrix video_cards_dummy video_cards_fbdev video_cards_glint video_cards_i128 video_cards_i740 video_cards_i810 video_cards_imstt video_cards_mga video_cards_neomagic video_cards_nsc video_cards_nv video_cards_rendition video_cards_s3 video_cards_s3virge video_cards_savage video_cards_siliconmotion video_cards_sis video_cards_sisusb video_cards_tdfx video_cards_tga video_cards_trident video_cards_tseng video_cards_v4l video_cards_vesa video_cards_vga video_cards_via video_cards_vmware video_cards_voodoo xml xorg zip zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, MAKEOPTS, PORTAGE_RSYNC_EXTRA_OPTS

# gcc -dumpversion
4.1.1


After updating to GCC 4.1.1 and GLIBC 2.4 as per the GCC Upgrade guide on the official docs site, I'm getting return codes of 3328 from several different emerge operations:

# emerge portage
Calculating dependencies  
aux_get(): (0) Error in sys-apps/portage-2.1.1 ebuild. (3328)
               Check for syntax error or corruption in the ebuild. (--debug)


aux_get(): (0) Error in sys-apps/portage-2.0.54-r2 ebuild. (3328)
               Check for syntax error or corruption in the ebuild. (--debug)


aux_get(): (0) Error in sys-apps/portage-2.0.51.22-r3 ebuild. (3328)
               Check for syntax error or corruption in the ebuild. (--debug)


aux_get(): (0) Error in sys-apps/portage-2.1-r2 ebuild. (3328)
               Check for syntax error or corruption in the ebuild. (--debug)

 
!!! All ebuilds that could satisfy "portage" have been masked.
!!! One of the following masked packages is required to complete your request:

aux_get(): (0) Error in sys-apps/portage-2.1.1 ebuild. (3328)
               Check for syntax error or corruption in the ebuild. (--debug)



!!! Problem in 'sys-apps/portage' dependencies.
!!!  exceptions
Traceback (most recent call last):
  File "/usr/bin/emerge", line 4049, in ?
    emerge_main()
  File "/usr/bin/emerge", line 4044, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 3467, in action_build
    retval, favorites = mydepgraph.select_files(myfiles)
  File "/usr/bin/emerge", line 943, in select_files
    self.mysd = self.select_dep(myroot, mykey, arg=x)
  File "/usr/bin/emerge", line 1146, in select_dep
    settings=pkgsettings, portdb=portdb)
  File "/usr/lib/portage/pym/portage.py", line 3734, in getmaskingstatus
    mygroups, eapi = portdb.aux_get(mycpv, ["KEYWORDS", "EAPI"])
  File "/usr/lib/portage/pym/portage.py", line 4843, in aux_get
    raise KeyError
KeyError
Comment 1 Jason Short 2006-09-14 16:14:14 UTC
I've now experienced this on two different systems.  Downgrading portage to 2.0.51.19 allows me to at least start builds, but inevitably the problem recurs.

I've built, rebuilt, and rebuilt again GCC, Glibc, and system, to no avail.

CFLAGS on the first host are -mcpu=i668 -O2
CFLAGS on the second host are -march=pentium4 -O3 -pipe

I have another system with identical CFLAGS to the second that has exhibited none of these problems.
Comment 2 Jakub Moc (RETIRED) gentoo-dev 2006-09-14 16:41:27 UTC
(In reply to comment #0)
> !!! No gcc found. You probably need to 'source /etc/profile'
> !!! to update the environment of this terminal and possibly
> !!! other terminals also.
> Portage 2.1.1 (default-linux/x86/2006.1, [unavailable], glibc-2.4-r3, 2.6.17.6

Set up your system properly before filing bugs (read the message above).

Also, emerge --sync again (preferably with a different mirror that doesn't provide broken ebuilds).

Comment 3 Jason Short 2006-09-15 12:08:49 UTC
I should have been more clear in my initial report.

After updating gcc to 4.1.1 and glibc to 4.2-r3, portage 2.1.1 reports gcc as 'unavailable', and returns these error 3328 messages (via the output I pasted and strace)

If I downgrade to 2.0.51.19 from a binpkg, it detects GCC correctly and merges without complaint.

I'm at a total loss as to where the toolchain is broken, or if it actually is.  I've been unable to find any meaningful information about what error 3328 even is, be it python or something else.
Comment 4 Jason Short 2006-09-15 12:15:41 UTC
After dropping portage back to 2.0.51.19, I'm able to run 'emerge -eav --nodeps system' successfully (was forced to add nodeps since 51.19 doesn't understand the new virtual).  I get no build errors other than portage breaking again after its self-upgrade.  Dropping its version then doing --resume --skipfirst results in a successful system rebuild.

Yet still 2.1.1 shows gcc as unavailable and returns this 3328 error message.  gcc-config -l and gcc -dumpversion return 4.1.1 as expected.
Comment 5 Zac Medico gentoo-dev 2006-09-15 15:40:43 UTC
Portage seems to be having trouble spawing child processes.  Please try the following testcase to see what happens:

python -c "import commands; print commands.getstatusoutput('gcc -dumpversion')"

The the 3328 exit code means that the child process (sandbox) was killed by a SIGPIPE.  I'm not sure why that happens, but you might try FEATURES="-sandbox userpriv" to see if there is a difference.  Those aux_get calls are triggering the depend phase of the ebuild (metadata generation), which generally indicates that you need to run `emerge --metadata` (that's always necessary when switching between portage-2.0.x and >=portage-2.1).
Comment 6 Jason Short 2006-09-17 05:06:19 UTC
# python -c "import commands; print commands.getstatusoutput('gcc -dumpversion')"
(0, '4.1.1')


# FEATURES="-sandbox userpriv" emerge sandbox -v 
Calculating dependencies... done!

>>> Emerging (1 of 1) sys-apps/sandbox-1.2.17 to /
# 

tail of the strace:

stat64("/var/tmp/portage/sandbox-1.2.17/work", 0xbf8ddd68) = -1 ENOENT (No such file or directory)
stat64("/bin/bash", {st_mode=S_IFREG|0755, st_size=632016, ...}) = 0
access("/bin/bash", X_OK)               = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7d796f8) = 16441
waitpid(16441, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0) = 16441
--- SIGCHLD (Child exited) @ 0 (0) ---
futex(0x804a5d8, FUTEX_WAKE, 1)         = 0
stat64("/var/tmp/portage/sandbox-1.2.17.portage_lockfile", {st_mode=S_IFREG|0660, st_size=0, ...}) = 0
fcntl64(3, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbf8dde40) = 0
fcntl64(3, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbf8dde40) = 0
fstat64(3, {st_mode=S_IFREG|0660, st_size=0, ...}) = 0
unlink("/var/tmp/portage/sandbox-1.2.17.portage_lockfile") = 0
fcntl64(3, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbf8dde40) = 0
close(3)                                = 0
ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33]0; *** terminating.\7", 22) = 22
open("/var/log/emerge.log", O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE, 0666) = 3
fstat64(3, {st_mode=S_IFREG|0660, st_size=377440, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77fa000
fstat64(3, {st_mode=S_IFREG|0660, st_size=377440, ...}) = 0
_llseek(3, 377440, [377440], SEEK_SET)  = 0
fstat64(3, {st_mode=S_IFREG|0660, st_size=377440, ...}) = 0
stat64("/var/log/emerge.log", {st_mode=S_IFREG|0660, st_size=377440, ...}) = 0
futex(0x804a5d8, FUTEX_WAKE, 1)         = 0
futex(0x804a5d8, FUTEX_WAKE, 1)         = 0
fcntl64(3, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbf8ddf00) = 0
fstat64(3, {st_mode=S_IFREG|0660, st_size=377440, ...}) = 0
_llseek(3, 377440, [377440], SEEK_SET)  = 0
gettimeofday({1158494750, 314627}, NULL) = 0
write(3, "1158494750:  *** terminating.\n", 30) = 30
futex(0x804a5d8, FUTEX_WAKE, 1)         = 0
fcntl64(3, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbf8ddf80) = 0
close(3)                                = 0
munmap(0xb77fa000, 4096)                = 0
ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33]0;root@:~\7", 12)         = 12
futex(0x8112528, FUTEX_WAKE, 1)         = 0
futex(0x812e7b8, FUTEX_WAKE, 1)         = 0
futex(0x812e7b8, FUTEX_WAKE, 1)         = 0
futex(0x8112528, FUTEX_WAKE, 1)         = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f6f3c1, [], 0}, 8) = 0
rt_sigaction(SIGTERM, {SIG_DFL}, {0xb7f6f3c1, [], 0}, 8) = 0
brk(0x8242000)                          = 0x8242000
futex(0x813b6a0, FUTEX_WAKE, 1)         = 0
futex(0x8117f88, FUTEX_WAKE, 1)         = 0
futex(0x80ff288, FUTEX_WAKE, 1)         = 0
futex(0x812e7b8, FUTEX_WAKE, 1)         = 0
futex(0x80633d8, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
exit_group(3328)                        = ?
Process 16440 detached
Comment 7 Jason Short 2006-09-18 07:52:38 UTC
I've now had the opportunity to review an strace of emerge --regen.  It appears that nss_ldap is to blame for the errors.

child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7d3a6f8) = 932
[pid   932] getsockname(5, {sa_family=AF_INET, sin_port=htons(40040), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid   926] waitpid(932, Process 926 suspended
 <unfinished ...>
[pid   932] getpeername(5, {sa_family=AF_INET, sin_port=htons(636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid   932] fcntl64(5, F_GETFD)         = 0x1 (flags FD_CLOEXEC)
[pid   932] dup(5)                      = 3
[pid   932] fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
[pid   932] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
[pid   932] close(5)                    = 0
[pid   932] fcntl64(4, F_GETFD)         = 0
[pid   932] dup2(4, 5)                  = 5
[pid   932] fcntl64(5, F_SETFD, 0)      = 0
[pid   932] close(4)                    = 0
[pid   932] write(5, "\25\3\1\0 \203\351\233\264\27\247rc\356wI\305.T~\177\36"..., 37) = -1 EPIPE (Broken pipe)
[pid   932] --- SIGPIPE (Broken pipe) @ 0 (0) ---
Process 926 resumed
Process 932 detached
<... waitpid resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0) = 932
--- SIGCHLD (Child exited) @ 0 (0) ---


# eix -I ldap
* dev-java/ldapsdk 
     Available versions:  4.1.7-r1
     Installed:           4.1.7-r1
     Homepage:            http://www.mozilla.org/directory/javasdk.html
     Description:         Netscape Directory SDK for Java

* dev-perl/perl-ldap 
     Available versions:  0.31 0.33 ~0.33-r1
     Installed:           0.33
     Homepage:            http://search.cpan.org/~gbarr/perl-ldap-0.33/
     Description:         A collection of perl modules which provide an object-oriented interface to LDAP servers.

* net-nds/openldap 
     Available versions:  2.1.30-r2 2.1.30-r5 2.1.30-r6 ~2.1.30-r7 ~2.2.23-r1 2.2.28-r3 ~2.2.28-r4 [M]2.3.21 [M]2.3.21-r1 [M]2.3.23 [M]2.3.24-r1 [M]2.3.24-r2
     Installed:           2.2.28-r3
     Homepage:            http://www.OpenLDAP.org/
     Description:         LDAP suite of application and development tools

* sys-auth/nss_ldap 
     Available versions:  174-r2 202 207 207-r1 210 211 215 215-r1 220 226 226-r1 234 238 239 239-r1 249 250 250-r1 252
     Installed:           252
     Homepage:            http://www.padl.com/OSS/nss_ldap.html
     Description:         NSS LDAP Module

* sys-auth/pam_ldap 
     Available versions:  156 161 164 167 171 176 176-r1 178 178-r1 180 182
     Installed:           182
     Homepage:            http://www.padl.com/OSS/pam_ldap.html
     Description:         PAM LDAP Module


Found 5 matches

Will attempt to find a version that does not break and report results.
Comment 8 Jason Short 2006-09-18 12:22:29 UTC
I've narrowed this down to a specific version and configuration:

it only occurs with nss_ldap after 239-r1 (tried with openldap 2.2.28-r3 and 2.3.24-r2) where ldap is first in the lookup order for passwd or group in nsswitch.conf.

this appears to be an upstream nss_ldap bug.
Comment 9 Jason Short 2006-09-18 12:34:04 UTC
http://bugzilla.padl.com/show_bug.cgi?id=176
Comment 10 Jason Short 2006-09-18 12:38:06 UTC
nscd workaround as described in upstream bugzilla restores functionality.
Comment 11 Andrew Stadt 2006-10-23 08:55:02 UTC
Jason - any luck tracking this one down?  Enabling/disabling/playing with the config of nscd did nothing work for me.  I can't do anything other then 'emerge -s' without it failing quietly unless I disable ldap in nsswitch.conf.

FWIW: I'm running nss_ldap 253, openldap 2.3.27 (and, not it should make a difference), pam_ldap 182.
Comment 12 Jakub Moc (RETIRED) gentoo-dev 2006-10-24 00:20:48 UTC
*** Bug 152539 has been marked as a duplicate of this bug. ***
Comment 13 Jakub Moc (RETIRED) gentoo-dev 2006-10-24 00:23:41 UTC
Reopen to reassign.
Comment 14 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-10-24 00:45:44 UTC
please attach your /etc/ldap.conf file, and your /etc/nsswitch.conf file.
Comment 15 Andrew Stadt 2006-10-24 03:21:43 UTC
(In reply to comment #14)
> please attach your /etc/ldap.conf file, and your /etc/nsswitch.conf file.
> 
I've dropped down to nss_ldap-239 and it's working for the most part.

Here's my /etc/ldap.conf:

ssl start_tls
ssl on
suffix "dc=stadt,dc=ca"

uri ldaps://alien.stadt.ca ldaps://www.stadt.ca
pam_password exop

ldap_version 3
binddn cn=nss,dc=stadt,dc=ca
bindpw <altered for post>

pam_filter objectclass=posixAccount
pam_login_attribute uid
pam_member_attribute memberuid
nss_base_passwd ou=People,dc=stadt,dc=ca?one
nss_base_shadow ou=People,dc=stadt,dc=ca?one
nss_base_group          ou=Group,dc=stadt,dc=ca?one
nss_base_hosts          ou=Hosts,dc=stadt,dc=ca?one
#nss_base_services      ou=Services,dc=padl,dc=com?one
#nss_base_networks      ou=Networks,dc=padl,dc=com?one
#nss_base_protocols     ou=Protocols,dc=padl,dc=com?one
#nss_base_rpc           ou=Rpc,dc=padl,dc=com?one
#nss_base_ethers        ou=Ethers,dc=padl,dc=com?one
#nss_base_netmasks      ou=Networks,dc=padl,dc=com?ne
#nss_base_bootparams    ou=Ethers,dc=padl,dc=com?one
#nss_base_aliases       ou=Aliases,dc=padl,dc=com?one
#nss_base_netgroup      ou=Netgroup,dc=stadt,dc=ca?one

and my nsswitch.conf:

# /etc/nsswitch.conf:
# $Header: /var/cvsroot/gentoo-x86/sys-libs/glibc/files/nsswitch.conf,v 1.1 2005/05/17 00:52:41 vapier Exp $

#passwd:                ldap files
#shadow:                ldap files
#group:         ldap files
passwd:      files ldap
shadow:      files ldap
group:       files ldap
#passwd:      compat
#shadow:      compat
#group:       compat

# passwd:    db files nis
# shadow:    db files nis
# group:     db files nis

hosts:       files dns
networks:    files dns

services:    db files
protocols:   db files
rpc:         db files
ethers:      db files
netmasks:    files
#netgroup:    files ldap
netgroup:    files
bootparams:  files

automount:   files
aliases:     files

As you can see, I only (currently) use ldap for passwd, shadow, amd group tables.

I'm off to work right now, and I'll play with it some more later today.
Comment 16 Andrew Stadt 2006-10-24 03:30:33 UTC
BTW: Since it seems to be relevant, I also recently upgraded to gcc-4.1.1 (and did the emerge -e system (x2).  This was about 5 days before the trouble started.

The last successful emerge I did, was portage-2.1.2_pre3-r6, I'll try an olderversion when I get back from work.

Comment 17 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-10-24 03:31:35 UTC
ok, please DISABLE ssl in your ldap.conf, so that it uses plaintext to connect to your LDAP server, and test again.

Assuming that solves it, incrementally add back portions of your SSL configuration  until the problem reoccurs.

Specifically 'ssl on' is most probably broken, I documented that 250-r1 originally (the patch I apply documented it in /etc/ldap.conf)

'ssl start_tls' should be used instead, along with a non-SSL URI, since TLS runs over the normal ldap port. Sometimes using 'host' instead of 'uri' made it go away.

The source of break MAY related to 'tls_checkpeer yes' on some systems, but I was not able to reliably prove that.
Comment 18 Andrew Stadt 2006-10-24 07:41:22 UTC
(In reply to comment #17)
> ok, please DISABLE ssl in your ldap.conf, so that it uses plaintext to connect
> to your LDAP server, and test again.
> 
> Assuming that solves it, incrementally add back portions of your SSL
> configuration  until the problem reoccurs.
> 
> Specifically 'ssl on' is most probably broken, I documented that 250-r1
> originally (the patch I apply documented it in /etc/ldap.conf)
> 
> 'ssl start_tls' should be used instead, along with a non-SSL URI, since TLS
> runs over the normal ldap port. Sometimes using 'host' instead of 'uri' made it
> go away.
> 
> The source of break MAY related to 'tls_checkpeer yes' on some systems, but I
> was not able to reliably prove that.
> 

You're correct of course.  Sorry for not reading all the comments you put in the /etc/ldap.conf file.  I have a bad habit of looking at the non-commented lines to see if anything had changed or not.

So far, I've had to disable both 'ssl on' and 'ssl start_tls'.  Using 'host' over 'uri' made no difference on my end, at least with 253.  Tried also toggling  'tls_checkpeer yes'/'... no' but it didn't make a differnce either.

At the moment I'm looking at rebuilding openssl, then openldap, and then retest.    

Looking back over the logs, it looks like I'm going to have to revoke someone's 'sudo emerge' access. From the mess I'm seeing with /usr/lib/libssl.so.*, /usr/lib/liblber.so.*, /usr/lib/libldap.so.*  I'm pretty sure at this point that this is all due to user error on my end.  I'll do some cleanup and post the results.  Sorry to bother you all.
Comment 19 Andrew Stadt 2006-10-25 03:24:48 UTC
Ok, I'm still stumped.

With nss_ldap-239-r1 everthing is fine, for the most part.

With >nss_ldap-239-r1 everything else seems to work, other then portage, which fails silently on 'emerge <package>', can't 'emerge --sync', and actually returns a permission error (3328) on an 'emerge -C <package>'

After updating nss_ldap and downgrading portage, I find that with nss_ldap-253 & portage-2.1.2_pre2_r9 everything (I've tried) works.  I don't know that much about python, so any suggestions on where to look?
Comment 20 Jakub Moc (RETIRED) gentoo-dev 2006-10-25 08:24:33 UTC
*** Bug 152775 has been marked as a duplicate of this bug. ***
Comment 21 Torsten Kurbad 2006-10-30 06:48:21 UTC
I found a workaround that (at least for me) doesn't seem to have any impact on the nss functionality.

In /etc/nsswitch.conf instead of

passwd: compat ldap
shadow: compat ldap
group: compat ldap

I now have the ldap flag only set for passwd and shadow, but left it out for group, i.e.:

passwd: compat ldap
shadow: compat ldap
group: compat

Unexpectedly, id <username> nevertheless shows all group memberships of the respective user that are only defined in the ldap directory, thus the

group: ... ldap

entry seems to be somehow redundant and causing the portage troubles.

Leaving the ldap flag out for group: reenables the full functionality of portage, while still having the group functionality from nss_ldap.

Although this can't be called a clean solution it seems to be a good workaround for the time being.

Maybe others here may try, if that works for them as well!

Regards,
Torsten
Comment 22 Sumit Khanna 2006-10-30 07:36:33 UTC
As the previous comment, I removed ldap from the group: in nsswitch.conf and portage began working again. 
Comment 23 Zac Medico gentoo-dev 2006-10-30 08:58:54 UTC
*** Bug 152237 has been marked as a duplicate of this bug. ***
Comment 24 kakou 2006-10-30 09:09:19 UTC
*** Bug 153438 has been marked as a duplicate of this bug. ***
Comment 25 kakou 2006-10-30 09:14:16 UTC
Ok, it's work for me, but I really need ldap and files in nsswitch.conf

Comment 26 Jason Sievert 2006-10-30 10:01:23 UTC
(In reply to comment #21)

Removing the ldap line from nsswitch.conf does in fact cause portage to work however in my case I did lose my groups when I logged in again.

My emerge info ---

Portage 2.1.2_rc1-r1 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.4-r4, 2.6.17-gentoo-r1 i686)
=================================================================
System uname: 2.6.17-gentoo-r1 i686 AMD Athlon(tm) XP 2400+
Gentoo Base System version 1.12.1
Last Sync: Mon, 30 Oct 2006 17:00:01 +0000
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: 1.2.11
dev-lang/python:     2.3.5, 2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.18.1
sys-devel/autoconf:  2.13, 2.60
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.17
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r1
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/overlays/mine"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow aac alsa apache2 apm audiofile bash-completion berkdb bzip2 cdparanoia cdr cgi cli cracklib crypt ctype cups dba divx4linux dlloader dri dvd eds elibc_glibc emboss encode ethereal evo exif fastcgi flac flash foomaticdb fortran gd gdbm gif gpg gpm gs gstreamer hal iconv imagemagick imap imlib input_devices_evdev input_devices_keyboard input_devices_mouse ipv6 isdnlog jabber jpeg kde kernel_linux ldap libg++ libwww logrotate mad maildir mikmod mmx mp3 mpeg mysql ncurses nfsv4 nls nptl nptlonly offensive ogg oggvorbis oscar oss pam pcre pdo-external pear perl php png posix postgres ppds pppd python quicktime readline reflection ruby sasl scannwe sdl session simplexml soap spell spl sqlite sse ssl tcpd tiff truetype truetype-fonts udev unicode usb userland_GNU userlocales vhosts video_cards_apm video_cards_ark video_cards_ati video_cards_chips video_cards_cirrus video_cards_cyrix video_cards_dummy video_cards_fbdev video_cards_glint video_cards_i128 video_cards_i740 video_cards_i810 video_cards_imstt video_cards_mga video_cards_neomagic video_cards_nsc video_cards_nv video_cards_rendition video_cards_s3 video_cards_s3virge video_cards_savage video_cards_siliconmotion video_cards_sis video_cards_sisusb video_cards_tdfx video_cards_tga video_cards_trident video_cards_tseng video_cards_v4l video_cards_vesa video_cards_vga video_cards_via video_cards_vmware video_cards_voodoo virus-scan vorbis wmf x86 xml xml2 xorg xsl xv zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 27 Jakub Moc (RETIRED) gentoo-dev 2006-11-02 12:42:26 UTC
*** Bug 153852 has been marked as a duplicate of this bug. ***
Comment 28 Jakub Moc (RETIRED) gentoo-dev 2006-11-04 15:14:32 UTC
*** Bug 154076 has been marked as a duplicate of this bug. ***
Comment 29 Zac Medico gentoo-dev 2006-11-06 22:33:09 UTC
*** Bug 154309 has been marked as a duplicate of this bug. ***
Comment 30 Jakub Moc (RETIRED) gentoo-dev 2006-11-07 09:53:10 UTC
*** Bug 154373 has been marked as a duplicate of this bug. ***
Comment 31 Zac Medico gentoo-dev 2006-11-09 12:48:54 UTC
*** Bug 154585 has been marked as a duplicate of this bug. ***
Comment 32 Gilles Dartiguelongue (RETIRED) gentoo-dev 2006-11-09 13:14:10 UTC
ok, this is a really weird issue.

this is my nsswitch.conf before :

passwd:      files ldap compat
shadow:      files ldap compat
group:       files compat

and after 

passwd:      files ldap
shadow:      files ldap
group:       files

and now portage works again.

Thanks so much for helping out, it was driving me crazy.
I think this is a strange enough issue to be mentioned in GWN.
Comment 33 Torsten Kurbad 2006-11-09 13:31:05 UTC
Gilles,

I once read somewhere in the PAM documentation that one should never ever delete the "compat" entries in nsswitch.conf, because it can lead to weird behaviour.
Anyway, the behaviour of portage in the whole issue is weird enough - and it should definitely be mentioned in the GWN.
Meanwhile I myself downgraded to nss_ldap-239-r1 on all our machines.
The whole "let out the group entries" thing breaks together as soon as you have (as I do) a samba PDC that relies on pam_krb5 and pam_ldap to deliver group memberships like domusers to grant access to shares.
My collegues almost killed me as I switched to the beformentioned "fix" there, since they were unable to access their samba shares.
And even worse: I figured out that leaving out the

group: ldap ...

entry only worked on my client since I had nscd running.
So far so good - anyone able to speak thorough C to fix this sh1t? ;o)

Regards
Torsten
Comment 34 Gilles Dartiguelongue (RETIRED) gentoo-dev 2006-11-09 13:44:19 UTC
I didn't know about this doc. So big DISCLAIMER for anyone reading this bug report, these settings works for me (at least on 4 boxes under my control)  but might as well fail for you.
Comment 35 Andrew Stadt 2006-11-09 17:06:46 UTC
(In reply to comment #32)
> ok, this is a really weird issue.

I'm certainly not going to argue that point!

> 
> and after 
> 
> passwd:      files ldap
> shadow:      files ldap
> group:       files
> 
More weirdness:  I tried this combination myself, and it worked up until I invalidated nscd's group table.  If it works for you, then by all means go for it.
Comment 36 Alexandre Ghisoli 2006-11-21 02:08:10 UTC
I got same errors. While checking logs on ldap server, I got exactly same answers if TLS is on or not.

So, you can turn debug on in /etc/ldap.conf by adding
debug 1

Debug levels seems to be the same as openldap server.

Then, run a id <username> and you will see debug message on your console. This will work also if you run emerge --sync for example.

My systems still get error code 2 while trying to sync with nss_ldap using TLS.
Comment 37 Stefaan De Roeck (RETIRED) gentoo-dev 2006-12-05 23:34:39 UTC
still an issue (incidentally, authentication through ldap also stopped working on my system)
Comment 38 Jimmy.Jazz 2006-12-09 06:34:36 UTC
(In reply to comment #23)
> *** Bug 152237 has been marked as a duplicate of this bug. ***
> 
Hello,

as suggested, when you comment out the following:

if secpass >= 2:
        for g in grp.getgrall():
                if "portage" in g[3]:
                        userpriv_groups.append(g[2])
        userpriv_groups = list(set(userpriv_groups))

then portage works again.

I'm really puzzled. I don't see why that portion of code interfere with ldap.
The list userpriv_groups is still set to the 250 gid before and after the 'if secpass >= 2:' condition, with or without that code. Even if the condition 'if "portage"' is not satisfied (discard "portage" user from the portage group table), portage stops working.

Also, with that code and when you delete ldap from the group: line in /etc/nsswitch.conf file, portage continues working well. userpriv_groups is still set to 250 ! That leaves us with more questions than answers.

Perhaps, ldap doesn't handle number gids well but the fact that userpriv_groups is still set to 250, whatever we modify, allowed us to discard that possibility. 

Finally, do we really need that code ? ;)
Comment 39 Zac Medico gentoo-dev 2006-12-09 10:37:38 UTC
(In reply to comment #38)
> Finally, do we really need that code ? ;)

Maybe you don't need it, but it fixes bug #137610.
Comment 40 Jimmy.Jazz 2006-12-10 06:56:10 UTC
(In reply to comment #39)
> (In reply to comment #38)
> > Finally, do we really need that code ? ;)
> 
> Maybe you don't need it, but it fixes bug #137610.
> 

you are right, i'm not using hardened gentoo, so the code does nothing in my case. Also to have some portion of code that does nothing and failed portage when you are using ldap, it is like a memory overflow in python itself caused by python-ldap or something else.

I'm using python 2.5-r1 and i have compiled it with -O3 flag. Perhaps that could be a beginning or clue for further searches.

Comment 41 Stefaan De Roeck (RETIRED) gentoo-dev 2007-01-04 03:41:57 UTC
Any ideas on this one yet?  Still seems to occur on my system...
Comment 42 Zac Medico gentoo-dev 2007-01-04 13:20:36 UTC
Created attachment 105417 [details, diff]
use `id -G portage` instead of grp.getgrall()

As a workaround for this bug I've a applied this patch for portage-2.1.2 in svn r5462.
Comment 43 Zac Medico gentoo-dev 2007-01-04 18:25:36 UTC
(In reply to comment #42)
> Created an attachment (id=105417) [edit]
> use `id -G portage` instead of grp.getgrall()

That patch is in portage-2.1.2_rc4-r6.
Comment 44 Stefaan De Roeck (RETIRED) gentoo-dev 2007-01-05 03:28:56 UTC
Seems to work for me at least, thanks! I'm curious to see if it works for everyone else as well... (as there was some fuzz about nscd etc.)
Comment 45 Alexandre Ghisoli 2007-01-07 20:44:39 UTC
Works for me too.

but didnt use nscd, so I can't tell about.
Comment 46 Marius Mauch (RETIRED) gentoo-dev 2007-01-10 07:54:41 UTC
*** Bug 148428 has been marked as a duplicate of this bug. ***
Comment 47 Marius Mauch (RETIRED) gentoo-dev 2007-01-11 11:11:13 UTC
*** Bug 138570 has been marked as a duplicate of this bug. ***
Comment 48 Ruud Althuizen 2007-01-11 11:15:43 UTC
Thanks for adding me Marius. But the machine that had this problem has been reinstalled since. Thus I have removed myselfe from the Cc list.
Comment 49 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-02-06 04:52:02 UTC
zmedico: safe to close this bug now?
Comment 50 Zac Medico gentoo-dev 2007-02-06 05:14:46 UTC
(In reply to comment #49)
> zmedico: safe to close this bug now?

That's fine with me.  The `id -g portage` workaround that that we've got in place seems to work well enough for portage's purposes.
Comment 51 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-02-06 05:25:20 UTC
closing since portage has the workaround for this.
For the other tracing of this issue, see bug 156511, from which a real solution will eventully come (the breakage is traced down to the depths of glibc's nss so far).