Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 582098 - sys-apps/portage-2.2.28: OSError: [Errno 9] Bad file descriptor when accessing portage.data.userpriv_groups
Summary: sys-apps/portage-2.2.28: OSError: [Errno 9] Bad file descriptor when accessin...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 611328
  Show dependency tree
 
Reported: 2016-05-04 17:08 UTC by Zac Medico
Modified: 2017-05-20 18:04 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zac Medico gentoo-dev 2016-05-04 17:08:13 UTC
This issue is difficult to reproduce, but I've seen it observed in a couple of different contexts now. In one context, I was able to suppress the issue by triggering early evaluation of the lazy portage.data.userpriv_groups attribute.

>>> Installing (2 of 343) dev-lang/go-1.6.1::gentoo
>>> Jobs: 7 of 343 complete, 1 failed               Load avg: 21.8, 17.5, 16.2
 * Package:    dev-libs/libaio-0.3.110
 * Repository: gentoo
 * Maintainer: gokturk@binghamton.edu proxy-maint@gentoo.org
 * USE:        abi_x86_64 amd64 elibc_glibc kernel_linux userland_GNU
 * FEATURES:   preserve-libs sandbox splitdebug userpriv usersandbox
>>> Unpacking source...
>>> Unpacking libaio-0.3.110.tar.gz to /var/tmp/portage/dev-libs/libaio-0.3.110/work
>>> Source unpacked in /var/tmp/portage/dev-libs/libaio-0.3.110/work
>>> Preparing source in /var/tmp/portage/dev-libs/libaio-0.3.110/work/libaio-0.3.110 ...
 * Applying libaio-0.3.109-install.patch ...
 [ ok ]
 * Applying libaio-0.3.109-x32.patch ...
 [ ok ]
 * Applying libaio-0.3.109-testcase-8.patch ...
 [ ok ]
 * Applying libaio-0.3.110-cppflags.patch ...
 [ ok ]
 * Applying libaio-0.3.110-optional-werror.patch ...
 [ ok ]
 * Applying libaio-0.3.110-link-stdlib.patch ...
 [ ok ]
 * Will copy sources from /var/tmp/portage/dev-libs/libaio-0.3.110/work/libaio-0.3.110
 * abi_x86_64.amd64: copying to /var/tmp/portage/dev-libs/libaio-0.3.110/work/libaio-0.3.110-abi_x86_64.amd64
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/dev-libs/libaio-0.3.110/work/libaio-0.3.110 ...
 * abi_x86_64.amd64: running multilib-minimal_abi_src_configure
>>> Source configured.
[Errno 9] Bad file descriptor:
   /usr/bin/sandbox /usr/lib/portage/python3.4/ebuild.sh compile
Traceback (most recent call last):
  File "/usr/lib64/python3.4/subprocess.py", line 1418, in _execute_child
    part = _eintr_retry_call(os.read, errpipe_read, 50000)
  File "/usr/lib64/python3.4/subprocess.py", line 491, in _eintr_retry_call
    return func(*args)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.4/site-packages/portage/process.py", line 317, in spawn
    unshare_net, unshare_ipc, cgroup)
  File "/usr/lib64/python3.4/site-packages/portage/process.py", line 504, in _exec
    if groups:
  File "/usr/lib64/python3.4/site-packages/portage/proxy/objectproxy.py", line 89, in __bool__
    return bool(object.__getattribute__(self, '_get_target')())
  File "/usr/lib64/python3.4/site-packages/portage/data.py", line 281, in _get_target
    return _get_global(object.__getattribute__(self, '_name'))
  File "/usr/lib64/python3.4/site-packages/portage/data.py", line 212, in _get_global
    stderr=subprocess.STDOUT)
  File "/usr/lib64/python3.4/subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.4/subprocess.py", line 1424, in _execute_child
    os.close(errpipe_read)
OSError: [Errno 9] Bad file descriptor
 * The ebuild phase 'compile' has exited unexpectedly. This type of
 * behavior is known to be triggered by things such as failed variable
 * assignments (bug #190128) or bad substitution errors (bug #200313).
 * Normally, before exiting, bash should have displayed an error message
 * above. If bash did not produce an error message above, it's possible
 * that the ebuild has called `exit` when it should have called `die`
 * instead. This behavior may also be triggered by a corrupt bash binary or
 * a hardware problem such as memory or cpu malfunction. If the problem is
 * not reproducible or it appears to occur randomly, then it is likely to
 * be triggered by a hardware problem. If you suspect a hardware problem
 * then you should try some basic hardware diagnostics such as memtest.
 * Please do not report this as a bug unless it is consistently
 * reproducible and you are sure that your bash binary and hardware are
 * functioning properly.
Comment 1 Zac Medico gentoo-dev 2017-01-25 02:04:00 UTC
I saw this again today with python-3.4.5. Tried to reproduce it by running this in a shell loop, but could not trigger it:

python3.4 -c "import portage; portage.process.spawn(['/bin/true'],
uid=portage.data.portage_uid, gid=portage.data.portage_gid,
groups=portage.data.userpriv_groups)"
Comment 2 Zac Medico gentoo-dev 2017-01-25 07:59:49 UTC
It looks like this is a python bug, which occurs because the subprocess _execute_child function calls os.close(errpipe_write) in the subprocess that is created with its _posixsubprocess.fork_exec() call. The _execute_child function really should check the pid to make sure it's in the correct process before it calls os.close(errpipe_write), much like portage's ForkProcess class checks the pid here:

https://gitweb.gentoo.org/proj/portage.git/tree/pym/portage/util/_async/ForkProcess.py?h=portage-2.3.3#n56
Comment 3 Zac Medico gentoo-dev 2017-01-25 08:36:50 UTC
The symptom appears close to this issue:

http://bugs.python.org/issue16140

However, python-3.4.5 already has the associated patch for that issue:

https://hg.python.org/cpython/rev/d51df72dadb7

I not sure that the fork issue described in comment #2 is possible, given that _exit(255) is called if exec fails here:

https://hg.python.org/cpython/file/v3.4.5/Modules/_posixsubprocess.c#l689

Maybe after the for, it calls back into the python interpreter and triggers the finally block with the os.close(errpipe_read) somehow?
Comment 4 Zac Medico gentoo-dev 2017-01-25 08:48:59 UTC
The fact that EBADF first occurs in _eintr_retry_call(os.read, errpipe_read, 50000), which is not in a finally block, rules out the fork idea from comment #2. So something must be closing the errpipe_read file descriptor prematurely.
Comment 5 Zac Medico gentoo-dev 2017-01-25 09:23:55 UTC
It's probably very close to http://bugs.python.org/issue16140. If something closes the file descriptor that belongs to a file object, then that file descriptor can get reused for errpipe_read. Then the garbage collector can come along and close the file descriptor after it has been reused for errpipe_read:

https://hg.python.org/cpython/file/v3.4.5/Modules/_io/iobase.c#l231

Whatever is responsible for closing a file descriptor belonging to a file object could be either in portage or a library that portage uses (including the standard library).
Comment 6 Zac Medico gentoo-dev 2017-01-25 09:55:18 UTC
Any subprocess.Popen calls that use stdout=subprocess.PIPE and don't explicitly close the associated stdout file can trigger this when that stdout file eventually becomes garbage collected, since the subprocess _execute_child function closes c2pread here:

https://hg.python.org/cpython/file/v3.4.5/Lib/subprocess.py#l1407

Meanwhile, c2pread is associated with self.stdout having closefd=True:

https://hg.python.org/cpython/file/v3.4.5/Lib/subprocess.py#l840

So, it's crucial for portage to explicitly close all such stdout=subprocess.PIPE files before the garbage collector gets them.
Comment 7 Zac Medico gentoo-dev 2017-01-25 17:24:02 UTC
The code flaw is present in all of the latest python versions, so I've created this upstream issue:

http://bugs.python.org/issue29373
Comment 8 Zac Medico gentoo-dev 2017-01-25 17:38:55 UTC
Actually, os.close(p2cread) closes a different file descriptor than c2pread, so my analysis is not correct. I suppose it could be some other code closing a file descriptor that belongs to a file object. I've closed the upstream python issue.
Comment 9 Zac Medico gentoo-dev 2017-01-25 19:37:04 UTC
Regardless of the root cause of the bad file descriptor, it's clear that we don't want the lazy portage.data.userpriv_groups evaluation to happen after the fork (in the portage.process._exec function). Therefore, we should fix that first, and it will mitigate the problem by reducing the number of times that the lazy evaluation occurs.
Comment 12 Zac Medico gentoo-dev 2017-05-20 18:04:27 UTC
Fixed in portage-2.3.5.