Summary: | sys-apps/portage: Encountered pdb.set_trace() in `_internal_caller_exception_handler' in asyncio_event_loop.py:81 | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Joshua Kinard <kumba> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | kalin |
Priority: | Normal | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 711148 |
Description
Joshua Kinard
2020-01-20 00:38:08 UTC
I think this bit of code is questionable. In 'portage/package/ebuild/doebuild.p', lines 2182 to 2195 (indents removed):
if mystat.st_uid != portage_uid and \
mystat.st_gid != portage_gid:
continue
myuid = -1
mygid = -1
if mystat.st_uid == portage_uid:
myuid = inst_uid
if mystat.st_gid == portage_gid:
mygid = inst_gid
apply_secpass_permissions(
_unicode_encode(fpath, encoding=_encodings['merge']),
uid=myuid, gid=mygid,
mode=mystat.st_mode, stat_cached=mystat,
follow_links=False)
The first if statement is intended to deal with cases where file_uid != portage_uid AND file_gid != portage_gid. In my case, the uid doesn't match (uid:0 != portage_uid:250), but gid (gid:250 == portage_gid:250) DOES, so this check silently fails and allows myuid/mygid to be initialized to '-1'.
The next two if statements independently check uid and gid, and set them to the portage_uid/portage_gid value if they match. Since the first check was skipped over, myuid remains at '-1', while mygid gets set to '0', and thus get passed down to the lower functions (apply_secpass_permissions), and we end up running into the pdb.set_trace() function call.
This is probably the wrong fix, but changing the first if statement from 'and' to 'or:
if mystat.st_uid != portage_uid or \
mystat.st_gid != portage_gid:
continue
Avoids the exception happening in portage itself (and the pdb.set_trace() call). I still have to fix the permissions problems on my NFS server, but portage will at least fail gracefully with meaningful errors:
>>> Install sys-apps/debianutils-4.9.1 into /var/tmp/portage/sys-apps/debianutils-4.9.1/image
/usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/tempfile': Invalid argument
/usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/run-parts': Invalid argument
* ERROR: sys-apps/debianutils-4.9.1::gentoo failed (install phase):
* dobin failed
*
(In reply to Joshua Kinard from comment #1) > This is probably the wrong fix, but changing the first if statement from > 'and' to 'or: > > if mystat.st_uid != portage_uid or \ > mystat.st_gid != portage_gid: > continue The 'and' logic is a requirement of PMS section 13.2.1: https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1 > The owner, group and mode (including set*id and sticky bits) > of the directory must be preserved, except as follows: > > Any directory owned by the user used to perform the build must > become owned by the root user. > > Any directory whose group is the primary group of the user > used to perform the build must have its group be that of the > root user. The most recent change to the logic was to make it comply with PMS here: https://gitweb.gentoo.org/proj/portage.git/commit/?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec (In reply to Zac Medico from comment #2) > (In reply to Joshua Kinard from comment #1) > > This is probably the wrong fix, but changing the first if statement from > > 'and' to 'or: > > > > if mystat.st_uid != portage_uid or \ > > mystat.st_gid != portage_gid: > > continue > > The 'and' logic is a requirement of PMS section 13.2.1: > > https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1 > > > The owner, group and mode (including set*id and sticky bits) > > of the directory must be preserved, except as follows: > > > > Any directory owned by the user used to perform the build must > > become owned by the root user. > > > > Any directory whose group is the primary group of the user > > used to perform the build must have its group be that of the > > root user. > > The most recent change to the logic was to make it comply with PMS here: > > https://gitweb.gentoo.org/proj/portage.git/commit/ > ?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec This makes sense, thanks for the link. I found a setting I missed on my NFS server which fixed the underlying permission issue, and the existing portage logic seems to work fine now. What about the pdb.set_trace() call? Should that be there? I assumed it was a left over debugging thing, hence the bug. And what about a case for handling errno.EINVAL? This seems to be a common return code from chown/chgrp/install when failing to change permissions on some misconfigured NFS exports. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=30150206fb0b3e013ef5b163b8d2f24c70a9d977 commit 30150206fb0b3e013ef5b163b8d2f24c70a9d977 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-02 04:56:49 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-02 05:01:06 +0000 AsyncioEventLoop: always die with SIGTERM in exception handler (bug 705910) Remove call to pdb.set_trace() in exception handler, since it's not very useful, and always die with a SIGTERM for unexpected exceptions here. Bug: https://bugs.gentoo.org/705910 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/util/_eventloop/asyncio_event_loop.py | 31 +++++++---------------- 1 file changed, 9 insertions(+), 22 deletions(-) The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2f1811d916f1ece1c356cab3da4e95d30fa61a04 commit 2f1811d916f1ece1c356cab3da4e95d30fa61a04 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-02 05:18:56 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-02 05:20:50 +0000 sys-apps/portage: Bump to version 2.3.91 #705910 remove pdb.set_trace() from exception handler #711174 FEATURES=compress-build-logs EOFError regression #711178 emerge --getbinpkg event loop recursion regression Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/705910 Closes: https://bugs.gentoo.org/711174 Closes: https://bugs.gentoo.org/711178 Package-Manager: Portage-2.3.91, Repoman-2.3.20 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 2 +- sys-apps/portage/{portage-2.3.90.ebuild => portage-2.3.91.ebuild} | 0 2 files changed, 1 insertion(+), 1 deletion(-) I was just about to file a new bug and I found this, I feel it is the same, but trigger was different. I had a waiting for confirmation `emerge -a` (hours overnight) and I am guessing `emerge-sync` was ran by cron and removed some files. When I hit `y` this is what happened: 2020-05-02 22:29 $ emerge -Datuvq --keep-going y @world -a -N [snip] Would you like to merge these packages? [Yes/No] >>> Verifying ebuild manifests >>> Running pre-merge checks for dev-libs/libfilezilla-0.15.1 >>> Running pre-merge checks for sys-kernel/linux-firmware-20200421 >>> Running pre-merge checks for mail-client/thunderbird-68.7.0-r1 * Checking for at least 4 GiB disk space at "/tmp/portage/mail-client/thunderbird-68.7.0-r1/temp" ... [ ok ] >>> Running pre-merge checks for net-ftp/filezilla-3.39.0 >>> Running pre-merge checks for www-client/firefox-68.7.0-r1 * Checking for at least 4 GiB disk space at "/tmp/portage/www-client/firefox-68.7.0-r1/temp" ... [ ok ] >>> Running pre-merge checks for www-client/google-chrome-81.0.4044.129 >>> Emerging (1 of 40) app-arch/rpm2targz-9.0.0.5g-r2::gentoo >>> Jobs: 0 of 40 complete, 1 running Load avg: 0.16, 0.07, 0.03!!! aux_get(): ebuild for 'dev-libs/nettle-3.5.1' does not exist at: !!! /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild Exception in callback AsynchronousTask.wait() handle: <Handle AsynchronousTask.wait()> Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 39, in __getattr__ self.mtime = obj = os.stat(self.location)[stat.ST_MTIME] File "/usr/lib/python3.7/site-packages/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) FileNotFoundError: [Errno 2] No such file or directory: b'/var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 556, in _pull_valid_cache ebuild_hash.mtime File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 42, in __getattr__ raise FileNotFound(self.location) portage.exception.FileNotFound: /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait self._wait_hook() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook self._exit_listener_stack.pop()(self) File "/usr/lib/python3.7/site-packages/_emerge/CompositeTask.py", line 95, in _default_final_exit return self.wait() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait self._wait_hook() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook self._exit_listener_stack.pop()(self) File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 57, in _task_exit self.schedule() File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 45, in schedule task.start() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 30, in start self._start() File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 45, in _start AsyncTaskFuture(future=self._fetcher_proc._async_uri_map()), File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 292, in _async_uri_map useflags=use, mytree=mytree, loop=self.scheduler) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 811, in async_fetch_map mypkg, ["EAPI", "SRC_URI"], mytree=mytree, loop=loop) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 681, in async_aux_get mydata, ebuild_hash = self._pull_valid_cache(mycpv, myebuild, mylocation) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 561, in _pull_valid_cache raise PortageKeyError(cpv) portage.exception.PortageKeyError: 'dev-libs/nettle-3.5.1' --Return-- > /usr/lib/python3.7/site-packages/portage/util/_eventloop/asyncio_event_loop.py(81)_internal_caller_exception_handler()->None -> pdb.set_trace() (Pdb) 2020-05-03 11:03 $ emerge --version Portage 2.3.89 (python 3.7.7-final-0, default/linux/amd64/17.1, gcc-9.3.0, glibc-2.30-r8, 5.6.7-r13 x86_64) apparently 3.5.1 of dev-libs/nettle was replaced with -r1 So, handling better missing ebuilds should be the answer. Given that I was running with `--keep-going y` I'd expect this ebuild to be skipped with error and the process continue. I am looking at the linked commits, but not sure they will cover this trigger... hmm, I guess the test will be to run `emerge -a something`, wait till prompt, delete an ebuild in another session and press `y`. I am running stable portage and there is no upgrade for that yet. (In reply to Kalin KOZHUHAROV from comment #6) > I was just about to file a new bug and I found this, I feel it is the same, > but trigger was different. I had a waiting for confirmation `emerge -a` > (hours overnight) and I am guessing `emerge-sync` was ran by cron and > removed some files. When I hit `y` this is what happened: In order to safely run emerge --sync in a cron job, you need to enable sync-rcu and set sync-rcu-store-dir in repos.conf. It's documented in the emerge man page (this was bug 662070). (In reply to Zac Medico from comment #7) > It's documented in the emerge man page (this was bug 662070). I mean the portage man page. Search for sync-rcu: https://dev.gentoo.org/~zmedico/portage/doc/man/portage.5.html |