Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 623752 - sys-apps/portage-2.3.6: ebuild-ipc: daemon process not detected (race?)
Summary: sys-apps/portage-2.3.6: ebuild-ipc: daemon process not detected (race?)
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-04 07:11 UTC by Martin Väth
Modified: 2017-07-05 08:47 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Väth 2017-07-04 07:11:45 UTC
This is an error which so far occured only once with portage-2.3.6 and is not reproducable: It occured when emerging gentoo-sources and simultaneously emerging the same package in a chroot environment. So I guess that some race might be involved which somehow confused the two daemon processes

>>> Preparing to unpack ...                         
ebuild-ipc: daemon process not detected             
 * The ebuild phase 'setup' has exited unexpectedly. [...]
ebuild-ipc: daemon process not detected             
 * The ebuild phase 'die_hooks' has exited unexpectedly. [...]

(the emerge failed after these messages).
Comment 1 Zac Medico gentoo-dev 2017-07-04 18:26:43 UTC
What filesystem do you use for PORTAGE_TMPDIR (/var/tmp/portage)?

Possible causes:

1) A concurrent process removed the PORTAGE_BUILDDIR lock file.

2) The PORTAGE_BUILDDIR variable somehow became corrupted, preventing ebuild-ipc from locating the appropriate lock file.

3) This code which initializes portage.locks._lock_fn could have temporarily failed during ebuild-ipc startup:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=5ef5fbaab88de47d4dbab333661d3525261d7633

For example, this could happen if fcntl temporarily fails with ENOLCK, as documented in the fcntl(2) man page:

> ENOLCK Too many segment locks open, lock table is full, or a remote locking protocol failed  (e.g.,  locking  over NFS).
Comment 2 Martin Väth 2017-07-05 08:47:30 UTC
(In reply to Zac Medico from comment #1)
> What filesystem do you use for PORTAGE_TMPDIR (/var/tmp/portage)?

ext4 w/ xattr but without acl.
There is certainly still suffficient space and sufficiently many inodes.
All filesystems are local; the repositories and /var/db are mounted with sqfs+overlay (the latter different within the chroot).

> 1) A concurrent process removed the PORTAGE_BUILDDIR lock file

I suppose that this file resides in a subdirectory of $PORTAGE_TMPDIR
Hard to imagine: The chroot does not share this directory, and it does not share  /var/cache either.

However, what _is_ shared with the chroot are /run (hence /var/lock and /var/run), the repositories and $DISTDIR.

The latter is the reason why the package was started simultaneously: One process fetched the file while the other waited for the lock in $DISTDIR.

On the other hand, I use this practice since ages, and so far I never had such an error (that's why I am so confident that I will not be able to reproduce it).

> 2) The PORTAGE_BUILDDIR variable somehow became corrupted, preventing
> ebuild-ipc from locating the appropriate lock file.

I don't know anything which might set this variable. Also, the emerge of the subsequent package (--keep-going=y was active) had succeeded.

> ENOLCK Too many segment locks open, lock table is full

The emerging package was gentoo-sources. Perhaps one of the processes already had the untar running which might perhaps have many filehandles open or closing simultaneously. Also I am using rather strict limits:

N512 U512 L512

So lack of some resources might indeed be a reason. I just did not guess this from the error message...