Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 705910

Summary: sys-apps/portage: Encountered pdb.set_trace() in `_internal_caller_exception_handler' in asyncio_event_loop.py:81
Product: Portage Development Reporter: Joshua Kinard <kumba>
Component: CoreAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: kalin
Priority: Normal Keywords: InVCS
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 711148    

Description Joshua Kinard gentoo-dev 2020-01-20 00:38:08 UTC
Backstory: Trying to run an emerge operation over an NFSv4 link to a NAS running FreeBSD.  Still working bugs out on the uid/gid mapping for the NFSv4 exports.  When attempting to update the sys-devel/gnuconfig package, portage tried changing the permissions on a file in the image directory and received "invalid argument" back from the NFS server.  This somehow led to a pdb.set_trace() call in asyncio_event_loop:81, in the _internal_caller_exception_handler function.

I assume this is a leftover from a debugging session that probably needs to be removed.  Additionally, in `apply_permissions' in portage/util/__init__.py, around line 1097, portage really should handle an errno.EINVAL case from the filesystem, such as can be encountered on NFS mounts that are still a bit goofy with permission mappings.

As far as I can determine, portage attempted to chown a directory, "/var/tmp/portage/sys-devel/gnuconfig-20190912/image/usr" with uid=-1/gid=0, which failed.  It looks like '-1' is a default argument passed to uid, so my guess is the `_post_src_install_uid_fix' function in portage/package/ebuild/doebuild.py did not handle a particular scenario.  I am only lightly skimming that function, but nothing immediately stands out as a probable cause right now.

Traceback:

# emerge sys-devel/gnuconfig
Calculating dependencies... done!

>>> Verifying ebuild manifests

>>> Emerging (1 of 1) sys-devel/gnuconfig-20190912::gentoo
 * gnuconfig-20190912.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                    [ ok ]
>>> Unpacking source...
>>> Unpacking gnuconfig-20190912.tar.bz2 to /var/tmp/portage/sys-devel/gnuconfig-20190912/work
>>> Source unpacked in /var/tmp/portage/sys-devel/gnuconfig-20190912/work
>>> Preparing source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ...
 * Applying 0001-add-ps2-targets-to-config.sub.patch ...                                                                                                                     [ ok ]
 * Applying 0002-Add-x32-support-to-config.guess.patch ...                                                                                                                   [ ok ]
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ...
>>> Source configured.
>>> Compiling source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ...
>>> Source compiled.
>>> Test phase [not enabled]: sys-devel/gnuconfig-20190912

>>> Install sys-devel/gnuconfig-20190912 into /var/tmp/portage/sys-devel/gnuconfig-20190912/image
>>> Completed installing sys-devel/gnuconfig-20190912 into /var/tmp/portage/sys-devel/gnuconfig-20190912/image

 * Final size of build directory: 195 KiB
 * Final size of installed tree:    5 KiB

/usr/lib/python3.7/site-packages/portage/util/__init__.py:1100: BytesWarning: str() on a bytes instance
  func_call = "chown('%s', %i, %i)" % (filename, uid, gid)
Exception in callback AsynchronousTask.wait()
handle: <Handle AsynchronousTask.wait()>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait
    self._wait_hook()
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook
    self._exit_listener_stack.pop()(self)
  File "/usr/lib/python3.7/site-packages/_emerge/EbuildPhase.py", line 205, in _ebuild_exit
    self._ebuild_exit_unlocked(ebuild_process)
  File "/usr/lib/python3.7/site-packages/_emerge/EbuildPhase.py", line 265, in _ebuild_exit_unlocked
    _post_src_install_uid_fix(settings, out)
  File "/usr/lib/python3.7/site-packages/portage/package/ebuild/doebuild.py", line 2195, in _post_src_install_uid_fix
    follow_links=False)
  File "/usr/lib/python3.7/site-packages/portage/util/__init__.py", line 1271, in apply_secpass_permissions
    stat_cached=stat_cached, follow_links=follow_links)
  File "/usr/lib/python3.7/site-packages/portage/util/__init__.py", line 1097, in apply_permissions
    portage.data.lchown(filename, uid, gid)
  File "/usr/lib/python3.7/site-packages/portage/__init__.py", line 246, in __call__
    rval = self._func(*wrapped_args, **wrapped_kwargs)
OSError: [Errno 22] Invalid argument: b'/var/tmp/portage/sys-devel/gnuconfig-20190912/image/usr'
--Return--
> /usr/lib/python3.7/site-packages/portage/util/_eventloop/asyncio_event_loop.py(81)_internal_caller_exception_handler()->None
-> pdb.set_trace()
(Pdb)
Comment 1 Joshua Kinard gentoo-dev 2020-01-20 01:12:12 UTC
I think this bit of code is questionable.  In 'portage/package/ebuild/doebuild.p', lines 2182 to 2195 (indents removed):

    if mystat.st_uid != portage_uid and \
            mystat.st_gid != portage_gid:
            continue
    myuid = -1
    mygid = -1
    if mystat.st_uid == portage_uid:
            myuid = inst_uid
    if mystat.st_gid == portage_gid:
            mygid = inst_gid
    apply_secpass_permissions(
            _unicode_encode(fpath, encoding=_encodings['merge']),
            uid=myuid, gid=mygid,
            mode=mystat.st_mode, stat_cached=mystat,
            follow_links=False)

The first if statement is intended to deal with cases where file_uid != portage_uid AND file_gid != portage_gid.  In my case, the uid doesn't match (uid:0 != portage_uid:250), but gid (gid:250 == portage_gid:250) DOES, so this check silently fails and allows myuid/mygid to be initialized to '-1'.

The next two if statements independently check uid and gid, and set them to the portage_uid/portage_gid value if they match.  Since the first check was skipped over, myuid remains at '-1', while mygid gets set to '0', and thus get passed down to the lower functions (apply_secpass_permissions), and we end up running into the pdb.set_trace() function call.

This is probably the wrong fix, but changing the first if statement from 'and' to 'or:

    if mystat.st_uid != portage_uid or \
            mystat.st_gid != portage_gid:
            continue

Avoids the exception happening in portage itself (and the pdb.set_trace() call).  I still have to fix the permissions problems on my NFS server, but portage will at least fail gracefully with meaningful errors:

>>> Install sys-apps/debianutils-4.9.1 into /var/tmp/portage/sys-apps/debianutils-4.9.1/image
/usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/tempfile': Invalid argument
/usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/run-parts': Invalid argument
 * ERROR: sys-apps/debianutils-4.9.1::gentoo failed (install phase):
 *   dobin failed
 *
Comment 2 Zac Medico gentoo-dev 2020-01-20 05:11:38 UTC
(In reply to Joshua Kinard from comment #1)
> This is probably the wrong fix, but changing the first if statement from
> 'and' to 'or:
> 
>     if mystat.st_uid != portage_uid or \
>             mystat.st_gid != portage_gid:
>             continue

The 'and' logic is a requirement of PMS section 13.2.1:

https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1

> The owner, group and mode (including set*id and sticky bits)
> of the directory must be preserved, except as follows:
> 
> Any directory owned by the user used to perform the build must
> become owned by the root user.
> 
> Any directory whose group is the primary group of the user
> used to perform the build must have its group be that of the
> root user.

The most recent change to the logic was to make it comply with PMS here:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec
Comment 3 Joshua Kinard gentoo-dev 2020-01-20 08:11:15 UTC
(In reply to Zac Medico from comment #2)
> (In reply to Joshua Kinard from comment #1)
> > This is probably the wrong fix, but changing the first if statement from
> > 'and' to 'or:
> > 
> >     if mystat.st_uid != portage_uid or \
> >             mystat.st_gid != portage_gid:
> >             continue
> 
> The 'and' logic is a requirement of PMS section 13.2.1:
> 
> https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1
> 
> > The owner, group and mode (including set*id and sticky bits)
> > of the directory must be preserved, except as follows:
> > 
> > Any directory owned by the user used to perform the build must
> > become owned by the root user.
> > 
> > Any directory whose group is the primary group of the user
> > used to perform the build must have its group be that of the
> > root user.
> 
> The most recent change to the logic was to make it comply with PMS here:
> 
> https://gitweb.gentoo.org/proj/portage.git/commit/
> ?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec

This makes sense, thanks for the link.  I found a setting I missed on my NFS server which fixed the underlying permission issue, and the existing portage logic seems to work fine now.

What about the pdb.set_trace() call?  Should that be there?  I assumed it was a left over debugging thing, hence the bug.

And what about a case for handling errno.EINVAL?  This seems to be a common return code from chown/chgrp/install when failing to change permissions on some misconfigured NFS exports.
Comment 4 Larry the Git Cow gentoo-dev 2020-03-02 05:03:01 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=30150206fb0b3e013ef5b163b8d2f24c70a9d977

commit 30150206fb0b3e013ef5b163b8d2f24c70a9d977
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-02 04:56:49 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-02 05:01:06 +0000

    AsyncioEventLoop: always die with SIGTERM in exception handler (bug 705910)
    
    Remove call to pdb.set_trace() in exception handler, since it's
    not very useful, and always die with a SIGTERM for unexpected
    exceptions here.
    
    Bug: https://bugs.gentoo.org/705910
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/util/_eventloop/asyncio_event_loop.py | 31 +++++++----------------
 1 file changed, 9 insertions(+), 22 deletions(-)
Comment 5 Larry the Git Cow gentoo-dev 2020-03-02 05:22:48 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2f1811d916f1ece1c356cab3da4e95d30fa61a04

commit 2f1811d916f1ece1c356cab3da4e95d30fa61a04
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-02 05:18:56 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-02 05:20:50 +0000

    sys-apps/portage: Bump to version 2.3.91
    
     #705910 remove pdb.set_trace() from exception handler
     #711174 FEATURES=compress-build-logs EOFError regression
     #711178 emerge --getbinpkg event loop recursion regression
    
    Bug: https://bugs.gentoo.org/711148
    Bug: https://bugs.gentoo.org/705910
    Closes: https://bugs.gentoo.org/711174
    Closes: https://bugs.gentoo.org/711178
    Package-Manager: Portage-2.3.91, Repoman-2.3.20
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest                                         | 2 +-
 sys-apps/portage/{portage-2.3.90.ebuild => portage-2.3.91.ebuild} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
Comment 6 Kalin KOZHUHAROV 2020-05-03 09:21:01 UTC
I was just about to file a new bug and I found this, I feel it is the same, but trigger was different. I had a waiting for confirmation `emerge -a` (hours overnight) and I am guessing `emerge-sync` was ran by cron and removed some files. When I hit `y` this is what happened:

2020-05-02 22:29 $ emerge -Datuvq --keep-going y @world -a -N

[snip]

Would you like to merge these packages? [Yes/No] 
>>> Verifying ebuild manifests
>>> Running pre-merge checks for dev-libs/libfilezilla-0.15.1
>>> Running pre-merge checks for sys-kernel/linux-firmware-20200421
>>> Running pre-merge checks for mail-client/thunderbird-68.7.0-r1
 * Checking for at least 4 GiB disk space at "/tmp/portage/mail-client/thunderbird-68.7.0-r1/temp" ...                                                                                                      [ ok ]
>>> Running pre-merge checks for net-ftp/filezilla-3.39.0
>>> Running pre-merge checks for www-client/firefox-68.7.0-r1
 * Checking for at least 4 GiB disk space at "/tmp/portage/www-client/firefox-68.7.0-r1/temp" ...                                                                                                           [ ok ]
>>> Running pre-merge checks for www-client/google-chrome-81.0.4044.129
>>> Emerging (1 of 40) app-arch/rpm2targz-9.0.0.5g-r2::gentoo
>>> Jobs: 0 of 40 complete, 1 running               Load avg: 0.16, 0.07, 0.03!!! aux_get(): ebuild for 'dev-libs/nettle-3.5.1' does not exist at:
!!!            /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild
Exception in callback AsynchronousTask.wait()
handle: <Handle AsynchronousTask.wait()>
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 39, in __getattr__
    self.mtime = obj = os.stat(self.location)[stat.ST_MTIME]
  File "/usr/lib/python3.7/site-packages/portage/__init__.py", line 246, in __call__
    rval = self._func(*wrapped_args, **wrapped_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: b'/var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 556, in _pull_valid_cache
    ebuild_hash.mtime
  File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 42, in __getattr__
    raise FileNotFound(self.location)
portage.exception.FileNotFound: /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait
    self._wait_hook()
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook
    self._exit_listener_stack.pop()(self)
  File "/usr/lib/python3.7/site-packages/_emerge/CompositeTask.py", line 95, in _default_final_exit
    return self.wait()
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait
    self._wait_hook()
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook
    self._exit_listener_stack.pop()(self)
  File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 57, in _task_exit
    self.schedule()
  File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 45, in schedule
    task.start()
  File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 30, in start
    self._start()
  File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 45, in _start
    AsyncTaskFuture(future=self._fetcher_proc._async_uri_map()),
  File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 292, in _async_uri_map
    useflags=use, mytree=mytree, loop=self.scheduler)
  File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 811, in async_fetch_map
    mypkg, ["EAPI", "SRC_URI"], mytree=mytree, loop=loop)
  File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 681, in async_aux_get
    mydata, ebuild_hash = self._pull_valid_cache(mycpv, myebuild, mylocation)
  File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 561, in _pull_valid_cache
    raise PortageKeyError(cpv)
portage.exception.PortageKeyError: 'dev-libs/nettle-3.5.1'
--Return--
> /usr/lib/python3.7/site-packages/portage/util/_eventloop/asyncio_event_loop.py(81)_internal_caller_exception_handler()->None
-> pdb.set_trace()
(Pdb) 

2020-05-03 11:03 $ emerge --version
Portage 2.3.89 (python 3.7.7-final-0, default/linux/amd64/17.1, gcc-9.3.0, glibc-2.30-r8, 5.6.7-r13 x86_64)

apparently 3.5.1 of dev-libs/nettle was replaced with -r1

So, handling better missing ebuilds should be the answer. Given that I was running with `--keep-going y` I'd expect this ebuild to be skipped with error and the process continue.

I am looking at the linked commits, but not sure they will cover this trigger... hmm, I guess the test will be to run `emerge -a something`, wait till prompt, delete an ebuild in another session and press `y`. I am running stable portage and there is no upgrade for that yet.
Comment 7 Zac Medico gentoo-dev 2020-05-03 21:33:56 UTC
(In reply to Kalin KOZHUHAROV from comment #6)
> I was just about to file a new bug and I found this, I feel it is the same,
> but trigger was different. I had a waiting for confirmation `emerge -a`
> (hours overnight) and I am guessing `emerge-sync` was ran by cron and
> removed some files. When I hit `y` this is what happened:

In order to safely run emerge --sync in a cron job, you need to enable sync-rcu and set sync-rcu-store-dir in repos.conf. It's documented in the emerge man page (this was bug 662070).
Comment 8 Zac Medico gentoo-dev 2020-05-03 21:38:04 UTC
(In reply to Zac Medico from comment #7)
> It's documented in the emerge man page (this was bug 662070).

I mean the portage man page. Search for sync-rcu:

https://dev.gentoo.org/~zmedico/portage/doc/man/portage.5.html