I just hit the oddest error: * Adding group 'dnsmasq' to your system ... * - Groupid: next available * Adding user 'dnsmasq' to your system ... * - Userid: 118 * - Shell: /sbin/nologin * - Home: /dev/null * - Groups: dnsmasq * - GECOS: added by portage for dnsmasq useradd: UID 118 is not unique * ERROR: net-dns/dnsmasq-2.66 failed (setup phase): * (no error message) When I checked userid 118 was already assigned to radvd. the only scenario I can piece together (since both eclass and ebuild haven't been changed in a while and look sane) is that portage ran pkg_setup on both at the same time and collided. I am currently building with --jobs=16 and FEATURES="-ebuild-locks parallel-install". I've never seen any collisions like this (and I've been running like this for a few weeks now with multiple catalyst runs a day) so it is a fairly difficult thing to trigger. Anyone care to take a stab at resolving this?
enewuser is looping through uids to find the next one available. Why don't we just call useradd without the -u option and let it do the proper locking?
Running useradd in linux automatically finds the next available uid, and -r is passed already so we get 100-999. All that really needs to be done for linux to work properly is to drop the searching part and just run useradd instead. The quickest and easiest fix for the major of our users (linux users) would be to conditional out this block of code so linux users don't run it at all: # handle uid local euid=$1; shift if [[ -n ${euid} && ${euid} != -1 ]] ; then if [[ ${euid} -gt 0 ]] ; then if [[ -n $(egetent passwd ${euid}) ]] ; then euid="next" fi else eerror "Userid given but is not greater than 0 !" die "${euid} is not a valid UID" fi else euid="next" fi if [[ ${euid} == "next" ]] ; then for ((euid = 101; euid <= 999; euid++)); do [[ -z $(egetent passwd ${euid}) ]] && break done fi opts+=( -u ${euid} )
*-darwin*) cannot automatically assign uid *-freebsd*|*-dragonfly*) can automatically assign uid but it's >1000 which would change the behavior of the eclass. *-netbsd*) not tested *-openbsd*) not tested *) in this case linux, when testing the uid is automatically picked between 100 and 999 thanks to the -r flag. It is recommended to either drop -u uid from the default case, or much more safely add a new "-linux*" that skips the uid selection stuff and avoids the issue entirely. I'll work up a patch and we can see if there is a sane way to do this.
Created attachment 353992 [details, diff] possible fix
Because this is a base system package I am waiting more than the normal 2 weeks but please note, I intend to commit this fix next week unless there are technical comments pointing out an issue with that.
(In reply to Rick Farina (Zero_Chaos) from comment #5) > Because this is a base system package I am waiting more than the normal 2 > weeks but please note, I intend to commit this fix next week unless there > are technical comments pointing out an issue with that. If a package supplies a uid, with this change it will never be used as the default. This change will result in the next available uid always being used, which is not always desirable. So NACK this change.
Created attachment 356106 [details, diff] potential fix that preserves an ebuild's default if possible
Comment on attachment 356106 [details, diff] potential fix that preserves an ebuild's default if possible Eh now that I think about it, you still have a race so this isn't good. Realistically you need to attempt to add it, if it fails then add it without the uid set on Linux.
> If a package supplies a uid, with this change it will never be used as the > default. This change will result in the next available uid always being > used, which is not always desirable. > > So NACK this change. I don't see what you are talking about: opts+=( -u ${euid} ) einfo " - Userid: ${euid}" I changed the default to not specify a uid, I didn't remove the ability to specify uid unless there was an unintentional screw up. Can you please point out what I've done wrong?
Comment on attachment 353992 [details, diff] possible fix After confirming that the uide setting functions work fine, I'm unobsoleting my fix.
Comment on attachment 356106 [details, diff] potential fix that preserves an ebuild's default if possible As noted by comment #8 from original author, this doesn't solve the issue.
I would really like this to go into the tree, I can't be the only one doing autobuilds that gets hit by this. I know several other developers with systems equal to or better than mine. Yes it's a tough race to hit, but the patch works fine, tested and confirmed, and seems to fix the race for me. I would still like to have this committed next week as there are currently no valid complaints afaict.
Comment on attachment 353992 [details, diff] possible fix > + opts+=( -u ${euid} ) > + einfo " - Userid: ${euid}" This only applies if CHOST != linux, making the arguments raised by comment #6 still valid. I see an easy way to correct this, and will follow up with a better patch.
Created attachment 357554 [details, diff] Don't search for an unused UID on Linux, but still allow suggested UIDs from ebuilds This is an updated version of attachment #353992 [details, diff]. In addition to the intentions of the original patch, it brings back the functionality of suggesting a UID to use in ebuilds for Linux hosts. This should address all qualms raised by comment #6 in an effective manner. It also greatly condenses and removes code duplication of both the original code and the previous patch. However, it does not solve the specific race condition raised in comment #8. All of the ebuilds in the tree specify a UID for standardization or hysterical reasons, and there are few chances of collisions. The only chances come from packages that use UIDs just above 100, and these are high enough to be unlikely to cause a collision, and are even less likely to trigger the race condition and cause the build to fail. Because this race condition is extremely difficult to hit, and correcting the race condition isn't going to be pretty, at least from my quick look at the enewuser function while working on this fix, I don't believe it's worth the effort to investigate at the present time. We'll cross that bridge if we get there. Please note that I have not had an opportunity to test this code. While it is theoretically sound, it may contain practical bugs. I offer up Zero_Chaos to test it, since he has a suitable system for doing so.
Perhaps a similar issue? * Adding group 'fcron' to your system ... * - Groupid: next available groupadd: cannot lock /etc/group; try again later. * ERROR: sys-process/fcron-3.1.1::gentoo failed (setup phase): The question is, is there a sane way to fix this or is the best we can do just call it until it works? One the bright side, the useradd doesn't seem to have issues after the patch from dwfreed.
(In reply to Rick Farina (Zero_Chaos) from comment #15) Looking through the shadow code, it looks like both useradd and groupadd ultimately call lckpwdf(3) to obtain a lock. If a lock is not obtained in 15 seconds, lckpwdf times out and you would get that "cannot lock" message. I have to wonder how you could be generating enough I/O to delay useradd or groupadd by that amount of time.
(In reply to Mike Gilbert from comment #16) > I have to wonder how you could be generating enough I/O to delay useradd or > groupadd by that amount of time. SATA1 hdd, running two catalyst instances in parallel with MAKEOPTS="-j48 -l32" and EMERGE_DEFAULT_OPTS="--jobs=64 --load-average=32" I am not even close to cpu bound, but my IO is off the charts pegged all the time and everything lives in IOwait
Created attachment 365648 [details, diff] Don't search for an unused UID/GID on Linux, but still allow suggested UIDs/GIDs from ebuilds This extends attachment #357554 [details, diff] to also apply to groups, and moves the ID handling code into a separate function to reduce code duplication. While here, I noticed that in enewgroup, CHOST matching openbsd resulted in groupadd being called with -r, which is an undefined flag in OpenBSD, so I adjusted the case statement to group OpenBSD with NetBSD to fix that. To extend the list of CHOSTs that can automatically pick an unused ID for both users and groups out of the system range, you simply need to modify this one function to match more CHOSTs, and perhaps make small tweaks to how the CHOST-specific useradd- and groupadd-alike commands are called to take advantage of this. Please note that I have not had an opportunity to test this code. While it is theoretically sound, it may contain practical bugs. Somebody using each of the Gentoo Prefix environments supported should probably give it a test to ensure it still Does the Right Thing(TM).
Comment on attachment 353992 [details, diff] possible fix marking my original fix obsolete as we have better ones on the bug
user.eclass has been removed from the gentoo repo.