Summary: | sys-apps/portage-2.2.28: OSError: [Errno 9] Bad file descriptor when accessing portage.data.userpriv_groups | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Zac Medico <zmedico> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | chutzpah, wizardedit |
Priority: | Normal | Keywords: | InVCS |
Version: | 2.2 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 611328 |
Description
Zac Medico
2016-05-04 17:08:13 UTC
I saw this again today with python-3.4.5. Tried to reproduce it by running this in a shell loop, but could not trigger it: python3.4 -c "import portage; portage.process.spawn(['/bin/true'], uid=portage.data.portage_uid, gid=portage.data.portage_gid, groups=portage.data.userpriv_groups)" It looks like this is a python bug, which occurs because the subprocess _execute_child function calls os.close(errpipe_write) in the subprocess that is created with its _posixsubprocess.fork_exec() call. The _execute_child function really should check the pid to make sure it's in the correct process before it calls os.close(errpipe_write), much like portage's ForkProcess class checks the pid here: https://gitweb.gentoo.org/proj/portage.git/tree/pym/portage/util/_async/ForkProcess.py?h=portage-2.3.3#n56 The symptom appears close to this issue: http://bugs.python.org/issue16140 However, python-3.4.5 already has the associated patch for that issue: https://hg.python.org/cpython/rev/d51df72dadb7 I not sure that the fork issue described in comment #2 is possible, given that _exit(255) is called if exec fails here: https://hg.python.org/cpython/file/v3.4.5/Modules/_posixsubprocess.c#l689 Maybe after the for, it calls back into the python interpreter and triggers the finally block with the os.close(errpipe_read) somehow? The fact that EBADF first occurs in _eintr_retry_call(os.read, errpipe_read, 50000), which is not in a finally block, rules out the fork idea from comment #2. So something must be closing the errpipe_read file descriptor prematurely. It's probably very close to http://bugs.python.org/issue16140. If something closes the file descriptor that belongs to a file object, then that file descriptor can get reused for errpipe_read. Then the garbage collector can come along and close the file descriptor after it has been reused for errpipe_read: https://hg.python.org/cpython/file/v3.4.5/Modules/_io/iobase.c#l231 Whatever is responsible for closing a file descriptor belonging to a file object could be either in portage or a library that portage uses (including the standard library). Any subprocess.Popen calls that use stdout=subprocess.PIPE and don't explicitly close the associated stdout file can trigger this when that stdout file eventually becomes garbage collected, since the subprocess _execute_child function closes c2pread here: https://hg.python.org/cpython/file/v3.4.5/Lib/subprocess.py#l1407 Meanwhile, c2pread is associated with self.stdout having closefd=True: https://hg.python.org/cpython/file/v3.4.5/Lib/subprocess.py#l840 So, it's crucial for portage to explicitly close all such stdout=subprocess.PIPE files before the garbage collector gets them. The code flaw is present in all of the latest python versions, so I've created this upstream issue: http://bugs.python.org/issue29373 Actually, os.close(p2cread) closes a different file descriptor than c2pread, so my analysis is not correct. I suppose it could be some other code closing a file descriptor that belongs to a file object. I've closed the upstream python issue. Regardless of the root cause of the bad file descriptor, it's clear that we don't want the lazy portage.data.userpriv_groups evaluation to happen after the fork (in the portage.process._exec function). Therefore, we should fix that first, and it will mitigate the problem by reducing the number of times that the lazy evaluation occurs. Patch posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/9fac1da4f92981a022b331a3008bc03a https://github.com/gentoo/portage/pull/98 This is in the master branch: https://gitweb.gentoo.org/proj/portage.git/commit/?id=ccf975296daec92d376c4989e5ffb2a6cdbe8a2d Fixed in portage-2.3.5. |