Summary: | >=sys-apps/portage-2.2.0_alpha174: OSError: [Errno 13] Permission denied: '/var/tmp/portage/app-admin/syslog-ng-3.4.1/.ipc_lock' due to race between root EbuildIpcDaemon and non-root EbuildIpc | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Samuli Suominen (RETIRED) <ssuominen> |
Component: | Current packages | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | Keywords: | InVCS |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://issuetracker.google.com/issues/187785642 | ||
See Also: | https://github.com/gentoo/portage/pull/768 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Samuli Suominen (RETIRED)
2013-05-08 05:22:21 UTC
syslog-ng-3.4.1 src_prepare(): >>> Emerging (1 of 2) app-admin/syslog-ng-3.4.1 * Fetching files in the background. To view fetch progress, run * `tail -f /var/log/emerge-fetch.log` in another terminal. * syslog-ng_3.4.1.tar.gz SHA256 SHA512 WHIRLPOOL size ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking syslog-ng_3.4.1.tar.gz to /var/tmp/portage/app-admin/syslog-ng-3.4.1/work >>> Source unpacked in /var/tmp/portage/app-admin/syslog-ng-3.4.1/work >>> Preparing source in /var/tmp/portage/app-admin/syslog-ng-3.4.1/work/syslog-ng-3.4.1 ... * Applying syslog-ng-3.4.1-rollup.patch ... [ ok ] * Running eautoreconf in '/var/tmp/portage/app-admin/syslog-ng-3.4.1/work/syslog-ng-3.4.1' ... * Running eautoreconf in '/var/tmp/portage/app-admin/syslog-ng-3.4.1/work/syslog-ng-3.4.1/modules/afmongodb/libmongo-client' ... * Running eautoreconf in '/var/tmp/portage/app-admin/syslog-ng-3.4.1/work/syslog-ng-3.4.1/lib/ivykis' ... * Running eautoreconf in '/var/tmp/portage/app-admin/syslog-ng-3.4.1/work/syslog-ng-3.4.1/modules/afamqp/rabbitmq-c' ... * Running libtoolize --install --copy --force --automake ... Traceback (most recent call last): File "/usr/lib64/portage/pym/portage/locks.py", line 103, in lockfile [ ok ] * Running aclocal -I m4 --install ... myfd = os.open(lockfilename, os.O_CREAT|os.O_RDWR, 0o660) File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 13] Permission denied: '/var/tmp/portage/app-admin/syslog-ng-3.4.1/.ipc_lock' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib64/portage/bin/ebuild-ipc.py", line 228, in <module> sys.exit(ebuild_ipc_main(sys.argv[1:])) File "/usr/lib64/portage/bin/ebuild-ipc.py", line 225, in ebuild_ipc_main return ebuild_ipc.communicate(args) File "/usr/lib64/portage/bin/ebuild-ipc.py", line 82, in communicate lock_obj = portage.locks.lockfile(self.ipc_lock_file, unlinkfile=True) File "/usr/lib64/portage/pym/portage/locks.py", line 231, in lockfile waiting_msg=waiting_msg, flags=flags) File "/usr/lib64/portage/pym/portage/locks.py", line 109, in lockfile raise PermissionDenied(func_call) portage.exception.PermissionDenied: open('/var/tmp/portage/app-admin/syslog-ng-3.4.1/.ipc_lock') * Running libtoolize --install --copy --force --automake ... * Running libtoolize --install --copy --force --automake ... * Running libtoolize --install --copy --force --automake ... [ ok ] * Running aclocal ... [ ok ] * Running aclocal -I m4 --install ... [ ok ] * Running aclocal -I m4 ... [ ok ] * Running autoconf ... [ ok ] * Running autoheader ... [ ok ] * Running automake --add-missing --copy --foreign ... [ ok ] * Running autoconf ... * Running autoconf ... * Running autoconf ... [ ok ] * Running autoheader ... [ ok ] * Running autoheader ... [ ok ] * Running autoheader ... [ ok ] * Running automake --add-missing --copy --foreign ... [ ok ] * Running automake --add-missing --copy --foreign ... [ ok ] * Running automake --add-missing --copy --foreign ... [ ok ] * Running elibtoolize in: syslog-ng-3.4.1/ * Applying portage/1.2.0 patch ... * Applying sed/1.5.6 patch ... * Applying as-needed/2.4.2 patch ... * Applying target-nm/2.4.2 patch ... * Running elibtoolize in: syslog-ng-3.4.1/modules/afamqp/rabbitmq-c/ * Applying portage/1.2.0 patch ... * Applying sed/1.5.6 patch ... * Applying as-needed/2.4.2 patch ... * Applying target-nm/2.4.2 patch ... * Running elibtoolize in: syslog-ng-3.4.1/modules/afmongodb/libmongo-client/ * Applying portage/1.2.0 patch ... * Applying sed/1.5.6 patch ... * Applying as-needed/2.4.2 patch ... * Applying target-nm/2.4.2 patch ... * Running elibtoolize in: syslog-ng-3.4.1/lib/ivykis/ * Applying portage/1.2.0 patch ... * Applying sed/1.5.6 patch ... * Applying as-needed/2.4.2 patch ... * Applying target-nm/2.4.2 patch ... >>> Source prepared. This is strange. What permissions do you have on '/var/tmp/portage/app-admin/syslog-ng-3.4.1/.ipc_lock'? (In reply to comment #2) > This is strange. What permissions do you have on > '/var/tmp/portage/app-admin/syslog-ng-3.4.1/.ipc_lock'? Now I can't reproduce this anymore and of course I didn't have anything like 'keepwork' in place so it's impossible for me to get that information. If this happens again, I'll try to get more data > myfd = os.open(lockfilename, os.O_CREAT|os.O_RDWR, 0o660) hmm. 0o660? we've been hitting this for a long time in CrOS (at least since 2018). it does seem to be a rare race condition, but when we're constantly building thousands of packages, this doesn't get so rare :). the issue seems to be that emerge, when running as root with lots of parallel jobs, will grab this ipc_lock that other non-root processes also grab randomly. but since they both grab & release it quickly, you have to get "lucky" for the race to fail. basically it looks like: * [non-root] bin/ebuild-ipc.py:EbuildIpc.communicate sees the lock is free * [root] lib/_emerge/EbuildIpcDaemon.py:EbuildIpcDaemon._input_handler grabs the lock as 0:0 * [non-root] bin/ebuild-ipc.py:EbuildIpc.communicate tries to grab the lock, but fails with EACCES * everything blows up we managed to capture a (hand trace) of the filesystem state here: https://issuetracker.google.com/issues/187785642#comment43 and not quite as good of a capture of the failure, but includes the process information: * we see that `emerge` (root) owns the lock * it's the `ebuild` (non-root) trying to grab it and sees that emerge holds it https://issuetracker.google.com/issues/187785642#comment51 i'm not familiar with what resources these locks are protecting, so i can't suggest a quick fix. it feels like having EbuildIpcDaemon just use a diff filename is probably the wrong answer. (In reply to SpanKY from comment #4) > i'm not familiar with what resources these locks are protecting, so i can't > suggest a quick fix. it feels like having EbuildIpcDaemon just use a diff > filename is probably the wrong answer. A likely trigger is the lock related to bug 401919 inside lib/_emerge/EbuildIpcDaemon.py, since this code runs as root: > else: # EIO/POLLHUP > # This can be triggered due to a race condition which happens when > # the previous _reopen_input() call occurs before the writer has > # closed the pipe (see bug #401919). It's not safe to re-open > # without a lock here, since it's possible that another writer will > # write something to the pipe just before we close it, and in that > # case the write will be lost. Therefore, try for a non-blocking > # lock, and only re-open the pipe if the lock is acquired. > lock_filename = os.path.join(os.path.dirname(self.input_fifo), ".ipc_lock") > try: > lock_obj = lockfile(lock_filename, unlinkfile=True, flags=os.O_NONBLOCK) > except TryAgain: > # We'll try again when another IO_HUP event arrives. > pass > else: > try: > self._reopen_input() > finally: > unlockfile(lock_obj) I've realized that there's actually a permission race in the lockfile creation, which we can solve by doing a chmod g+s on the parent directory, and umask 0002 during creation of the lockfile. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=704bcd1581e49432f363f0eda648d58411775d7f commit 704bcd1581e49432f363f0eda648d58411775d7f Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2021-11-07 04:19:58 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2021-11-10 23:10:53 +0000 EbuildIpcDaemon: fix lock permission race Move ipc files to a .ipc subdirectory, with a setgid bit to prevent a lockfile group permission race. The lockfile function uses an appropriate open call with mode argument so that the lockfile is created atomically with both group ownership and group write bit. Bug: https://bugs.gentoo.org/468990 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/ebuild-ipc.py | 6 +++--- bin/phase-functions.sh | 4 ++-- lib/_emerge/AbstractEbuildProcess.py | 4 ++-- lib/_emerge/EbuildIpcDaemon.py | 2 +- lib/portage/package/ebuild/prepare_build_dirs.py | 9 +++++++++ lib/portage/tests/ebuild/test_doebuild_spawn.py | 1 + lib/portage/tests/ebuild/test_ipc_daemon.py | 6 +++--- 7 files changed, 21 insertions(+), 11 deletions(-) Released in portage-3.0.29. |