I'm using prefixed Gentoo on a Mac OS X 10.5 system with LDAP enabled for authentication and user search. After upgrading coreutils to 7.1, I get really strange results with ${EPREFIX}/usr/bin/id: ian@cumulonimbus ~ $ id -G 1463 8056 12869 5105 81 102 5124 79 13684 80 1440 1091 ian@cumulonimbus ~ $ id -G ian (fails to exit) ian@cumulonimbus ~ $ id -G root (fails to exit) ian@cumulonimbus ~ $ id -G rene 1453 The "rene" account has never logged into this machine (it's just in LDAP), and it does belong to more than one group. coreutils-6.12-r2 works fine: ian@cumulonimbus ~ $ id -G root 0 1 2 8 29 1148 3 9 5055 4 ian@cumulonimbus ~ $ id -G ian 1463 8056 12869 5105 81 102 5124 79 13684 80 ian@cumulonimbus ~ $ id -G rene 1453 1148 5055 1298 6061 32659 1091 This also has the unfortunate side effect of breaking portage, since it uses id -G to enumerate the portage user's groups. Reproducible: Always Steps to Reproduce: 1. Enable LDAP for user authentication 2. Emerge sys-apps/coreutils-7.1 3. id -G root Actual Results: id hangs. Expected Results: id should output a list of group IDs. Mac OS X 10.5, LDAP enabled for user authentication and search.
Hmmm, I recall someone else having a problem with this on some other platform. Apparently the coreutils folks have changed something which is a lot more expensive (if it will ever finish at all!).
interix has a similar problem regarding windows domains. there are replacements for some functions in the system, that don't query all members of all groups on a domain (which takes ages). i use those function (by redefining them in the coreutils ebuild, FYI), but i guess they're not available anywhere else, and these two problems are unrelated. however - darkside had a similar problem on interix lately which seemed to be a little different from what i fixed by using the replacement functions - maybe he's on LDAP too, instead of windows domains?
just a wishful check, does coreutils-7.2 exhibit the same problem?
(In reply to comment #3) > just a wishful check, does coreutils-7.2 exhibit the same problem? > Yes, it does. Trying to dig a little deeper ...
It looks like a Darwin bug. getgrouplist (called from mgetgroups.c:mgetgroups) is not changing the value of max_n_groups, which all the docs seem to say it should do. The result is an infinite loop. It worked in 6.12 because 6.12 never actually looked at the result from the getgrouplist call if the number of groups looked okay. So it would return bad results (it didn't include the whole list of groups), but it did return. Darwin kinda gets around this by always making a buffer the size of NGROUPS+1. But they also now support more than NGROUPS (from what I understand). One could patch this for Darwin by just increasing the buffer size (2X?) until getgrouplist didn't return an error. Happy to help ...
Follow up from gentoo-alt@lists.gentoo.org: > I don't think it's a bug. Well, there is some room for debate; I probably stated that too strongly. There is a question as to when the OS should update ngroups in getgrouplist. The 4.4BSD/Darwin docs are a little more ambiguous then the NetBSD 5.0/Linux docs. The code is in Darwin is v1.2 of getgrouplist.c which (apparently) comes from NetBSD. Darwin is using v1.2 and it's moved on quite a bit since since then, so I'd say it's Darwin's problem. > I think coreutils recently started to try > harder to obtain the information it needs, thus for instance going into > an ldap or NT domain lookup, which just takes an awful amount of time. This isn't about taking a long time, it's about an infinite loop. I guarantee it. > we should try to find > the configure flag that reverts this behaviour and add it so There's no config flag that will revert the behavior. The change between 6 and 7 in this case is in the same code path, it's just looking at the results slightly differently between the two version. With a "working" version of getgrouplist, there's no change in behavior between 6 and 7, but given the behavior on Darwin, there is. There is a flag for completely avoiding getgrouplist and I guess we could set that, but that wasn't used before and I, personally, don't like avoiding the OS call. Plus it doesn't give the same results as Darwin's /usr/bin/id. It's easy enough to patch to either "do the right thing" or to just patch it to do what Darwin's native id does.
ok, then we should bring this to upstream's attention
Cool with me. Is the idea that the coreutils people care about working around broken getgroupslist?
I think they do care about their software going into an endless loop.
Good point. This something I can help with/do? Haven't exactly done it before, but I should be able to find the right people to submit to ...
I think gnu coreutils people primarily work on the gnu coreutils mailing list. First search if this issue hasn't been reported already, then submit the problem to them, including your in-depth analysis. In the meanwhile I'm willing to: a) mask b) patch, provided the patch is acceptable ;)
Submitted to <bug-coreutils@gnu.org>. I don't have a strong feeling on masking vs. patching, though I guess I do figure it makes sense to do something fairly quickly since emerge calls id so this can really stop things dead in the water. Reverting is easy enough. Or a one line patch (at the end) will revert just this case to 6.12. Note that in either case, the results are wrong: it's hard-limiting to 10 groups, regardless of how many the user is, so prefix id and /usr/bin/id are going to return different results. A fix that would return correct results would be to look at the return value from getgrouplist and manage changing the buffer size manually and speculatively, that is, perhaps doubling it a few times in hope that you'll generate as much space as is required. I asked on bug-coreutil whether they would want to do this. Would you? A little more freestore load, but really not a big deal, I would think (just kinda annoying, since you just wish Darwin got rid of the bug).
Here's the pseudo-revert, though I'd still rather do it "right": diff --git a/lib/mgetgroups.c b/lib/mgetgroups.c index e697013..77359f0 100644 --- a/lib/mgetgroups.c +++ b/lib/mgetgroups.c @@ -1,3 +1,4 @@ +#include <stdio.h> /* mgetgroups.c -- return a list of the groups a user is in Copyright (C) 2007-2009 Free Software Foundation, Inc. @@ -94,6 +95,11 @@ mgetgroups (char const *username, gid_t gid, GETGROUPS_T **groups) } g = h; + if( ng < 0 && max_n_groups <= N_GROUPS_INIT) + { + ng = max_n_groups; + } + if (0 <= ng) { *groups = g;
Whoops. Sorry about the debugging printf include. Anyway, here is a "correct" patch. It relies on the OS not forever returning -1. It also includes something I didn't realize, which is that when Darwin getgrouplist does return a full list, it just sets the return value to 0, whereas it looks like the newer libs have it return the number of groups. diff --git a/lib/mgetgroups.c b/lib/mgetgroups.c index e697013..7282197 100644 --- a/lib/mgetgroups.c +++ b/lib/mgetgroups.c @@ -83,8 +83,18 @@ mgetgroups (char const *username, gid_t gid, GETGROUPS_T **groups) GETGROUPS_T *h; /* getgrouplist updates max_n_groups to num required. */ + /* Some old getgrouplist return -1 but don't update max_n_groups */ + int previous_max_n_groups = max_n_groups; ng = getgrouplist (username, gid, g, &max_n_groups); + if (ng < 0 && max_n_groups == previous_max_n_groups) + { + max_n_groups <<= 2; + } else { + /* some old getgrouplist don't return max_n_groups on success */ + ng = max_n_groups; + } + if ((h = realloc_groupbuf (g, max_n_groups)) == NULL) { int saved_errno = errno;
'course, I meant *= 2 or <<= 1; long day ...
Created attachment 187745 [details, diff] Patch posted to bug-coreutils ( This is the patch that's been posted to bug-coreutils. Pretty much what I posted before. Tested against 10.5.6 and returns the right result, even when a user has many groups. This is just the code change, and against 7.2; not the supporting stuff. cf: http://lists.gnu.org/archive/html/bug-coreutils/2009-04/msg00089.html
Hmmmm ... bugzilla ate my comments on the patch. This is what was posted to bug-coreutils, applied to 7.2. Just the code, not the supporting materials. The code is essentially what I posted before and returns the correct results for a user with dozens of groups.
I added the following patch: http://article.gmane.org/gmane.comp.gnu.core-utils.bugs/16578 http://repo.or.cz/w/coreutils.git?a=commitdiff;h=bf87a2c8ea4487ca4448c9fe42a9c9858400acbd with modifications to apply on the 7.2 branch. Thanks all, sorry for the wait!