Backstory: Trying to run an emerge operation over an NFSv4 link to a NAS running FreeBSD. Still working bugs out on the uid/gid mapping for the NFSv4 exports. When attempting to update the sys-devel/gnuconfig package, portage tried changing the permissions on a file in the image directory and received "invalid argument" back from the NFS server. This somehow led to a pdb.set_trace() call in asyncio_event_loop:81, in the _internal_caller_exception_handler function. I assume this is a leftover from a debugging session that probably needs to be removed. Additionally, in `apply_permissions' in portage/util/__init__.py, around line 1097, portage really should handle an errno.EINVAL case from the filesystem, such as can be encountered on NFS mounts that are still a bit goofy with permission mappings. As far as I can determine, portage attempted to chown a directory, "/var/tmp/portage/sys-devel/gnuconfig-20190912/image/usr" with uid=-1/gid=0, which failed. It looks like '-1' is a default argument passed to uid, so my guess is the `_post_src_install_uid_fix' function in portage/package/ebuild/doebuild.py did not handle a particular scenario. I am only lightly skimming that function, but nothing immediately stands out as a probable cause right now. Traceback: # emerge sys-devel/gnuconfig Calculating dependencies... done! >>> Verifying ebuild manifests >>> Emerging (1 of 1) sys-devel/gnuconfig-20190912::gentoo * gnuconfig-20190912.tar.bz2 BLAKE2B SHA512 size ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking gnuconfig-20190912.tar.bz2 to /var/tmp/portage/sys-devel/gnuconfig-20190912/work >>> Source unpacked in /var/tmp/portage/sys-devel/gnuconfig-20190912/work >>> Preparing source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ... * Applying 0001-add-ps2-targets-to-config.sub.patch ... [ ok ] * Applying 0002-Add-x32-support-to-config.guess.patch ... [ ok ] >>> Source prepared. >>> Configuring source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ... >>> Source configured. >>> Compiling source in /var/tmp/portage/sys-devel/gnuconfig-20190912/work ... >>> Source compiled. >>> Test phase [not enabled]: sys-devel/gnuconfig-20190912 >>> Install sys-devel/gnuconfig-20190912 into /var/tmp/portage/sys-devel/gnuconfig-20190912/image >>> Completed installing sys-devel/gnuconfig-20190912 into /var/tmp/portage/sys-devel/gnuconfig-20190912/image * Final size of build directory: 195 KiB * Final size of installed tree: 5 KiB /usr/lib/python3.7/site-packages/portage/util/__init__.py:1100: BytesWarning: str() on a bytes instance func_call = "chown('%s', %i, %i)" % (filename, uid, gid) Exception in callback AsynchronousTask.wait() handle: <Handle AsynchronousTask.wait()> Traceback (most recent call last): File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait self._wait_hook() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook self._exit_listener_stack.pop()(self) File "/usr/lib/python3.7/site-packages/_emerge/EbuildPhase.py", line 205, in _ebuild_exit self._ebuild_exit_unlocked(ebuild_process) File "/usr/lib/python3.7/site-packages/_emerge/EbuildPhase.py", line 265, in _ebuild_exit_unlocked _post_src_install_uid_fix(settings, out) File "/usr/lib/python3.7/site-packages/portage/package/ebuild/doebuild.py", line 2195, in _post_src_install_uid_fix follow_links=False) File "/usr/lib/python3.7/site-packages/portage/util/__init__.py", line 1271, in apply_secpass_permissions stat_cached=stat_cached, follow_links=follow_links) File "/usr/lib/python3.7/site-packages/portage/util/__init__.py", line 1097, in apply_permissions portage.data.lchown(filename, uid, gid) File "/usr/lib/python3.7/site-packages/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 22] Invalid argument: b'/var/tmp/portage/sys-devel/gnuconfig-20190912/image/usr' --Return-- > /usr/lib/python3.7/site-packages/portage/util/_eventloop/asyncio_event_loop.py(81)_internal_caller_exception_handler()->None -> pdb.set_trace() (Pdb)
I think this bit of code is questionable. In 'portage/package/ebuild/doebuild.p', lines 2182 to 2195 (indents removed): if mystat.st_uid != portage_uid and \ mystat.st_gid != portage_gid: continue myuid = -1 mygid = -1 if mystat.st_uid == portage_uid: myuid = inst_uid if mystat.st_gid == portage_gid: mygid = inst_gid apply_secpass_permissions( _unicode_encode(fpath, encoding=_encodings['merge']), uid=myuid, gid=mygid, mode=mystat.st_mode, stat_cached=mystat, follow_links=False) The first if statement is intended to deal with cases where file_uid != portage_uid AND file_gid != portage_gid. In my case, the uid doesn't match (uid:0 != portage_uid:250), but gid (gid:250 == portage_gid:250) DOES, so this check silently fails and allows myuid/mygid to be initialized to '-1'. The next two if statements independently check uid and gid, and set them to the portage_uid/portage_gid value if they match. Since the first check was skipped over, myuid remains at '-1', while mygid gets set to '0', and thus get passed down to the lower functions (apply_secpass_permissions), and we end up running into the pdb.set_trace() function call. This is probably the wrong fix, but changing the first if statement from 'and' to 'or: if mystat.st_uid != portage_uid or \ mystat.st_gid != portage_gid: continue Avoids the exception happening in portage itself (and the pdb.set_trace() call). I still have to fix the permissions problems on my NFS server, but portage will at least fail gracefully with meaningful errors: >>> Install sys-apps/debianutils-4.9.1 into /var/tmp/portage/sys-apps/debianutils-4.9.1/image /usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/tempfile': Invalid argument /usr/bin/install: cannot change ownership of '/var/tmp/portage/sys-apps/debianutils-4.9.1/image//bin/run-parts': Invalid argument * ERROR: sys-apps/debianutils-4.9.1::gentoo failed (install phase): * dobin failed *
(In reply to Joshua Kinard from comment #1) > This is probably the wrong fix, but changing the first if statement from > 'and' to 'or: > > if mystat.st_uid != portage_uid or \ > mystat.st_gid != portage_gid: > continue The 'and' logic is a requirement of PMS section 13.2.1: https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1 > The owner, group and mode (including set*id and sticky bits) > of the directory must be preserved, except as follows: > > Any directory owned by the user used to perform the build must > become owned by the root user. > > Any directory whose group is the primary group of the user > used to perform the build must have its group be that of the > root user. The most recent change to the logic was to make it comply with PMS here: https://gitweb.gentoo.org/proj/portage.git/commit/?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec
(In reply to Zac Medico from comment #2) > (In reply to Joshua Kinard from comment #1) > > This is probably the wrong fix, but changing the first if statement from > > 'and' to 'or: > > > > if mystat.st_uid != portage_uid or \ > > mystat.st_gid != portage_gid: > > continue > > The 'and' logic is a requirement of PMS section 13.2.1: > > https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-14100013.2.1 > > > The owner, group and mode (including set*id and sticky bits) > > of the directory must be preserved, except as follows: > > > > Any directory owned by the user used to perform the build must > > become owned by the root user. > > > > Any directory whose group is the primary group of the user > > used to perform the build must have its group be that of the > > root user. > > The most recent change to the logic was to make it comply with PMS here: > > https://gitweb.gentoo.org/proj/portage.git/commit/ > ?id=294f1a18f3271f7d506d346c2a514a2afa6ce8ec This makes sense, thanks for the link. I found a setting I missed on my NFS server which fixed the underlying permission issue, and the existing portage logic seems to work fine now. What about the pdb.set_trace() call? Should that be there? I assumed it was a left over debugging thing, hence the bug. And what about a case for handling errno.EINVAL? This seems to be a common return code from chown/chgrp/install when failing to change permissions on some misconfigured NFS exports.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=30150206fb0b3e013ef5b163b8d2f24c70a9d977 commit 30150206fb0b3e013ef5b163b8d2f24c70a9d977 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-02 04:56:49 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-02 05:01:06 +0000 AsyncioEventLoop: always die with SIGTERM in exception handler (bug 705910) Remove call to pdb.set_trace() in exception handler, since it's not very useful, and always die with a SIGTERM for unexpected exceptions here. Bug: https://bugs.gentoo.org/705910 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/util/_eventloop/asyncio_event_loop.py | 31 +++++++---------------- 1 file changed, 9 insertions(+), 22 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2f1811d916f1ece1c356cab3da4e95d30fa61a04 commit 2f1811d916f1ece1c356cab3da4e95d30fa61a04 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-02 05:18:56 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-02 05:20:50 +0000 sys-apps/portage: Bump to version 2.3.91 #705910 remove pdb.set_trace() from exception handler #711174 FEATURES=compress-build-logs EOFError regression #711178 emerge --getbinpkg event loop recursion regression Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/705910 Closes: https://bugs.gentoo.org/711174 Closes: https://bugs.gentoo.org/711178 Package-Manager: Portage-2.3.91, Repoman-2.3.20 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 2 +- sys-apps/portage/{portage-2.3.90.ebuild => portage-2.3.91.ebuild} | 0 2 files changed, 1 insertion(+), 1 deletion(-)
I was just about to file a new bug and I found this, I feel it is the same, but trigger was different. I had a waiting for confirmation `emerge -a` (hours overnight) and I am guessing `emerge-sync` was ran by cron and removed some files. When I hit `y` this is what happened: 2020-05-02 22:29 $ emerge -Datuvq --keep-going y @world -a -N [snip] Would you like to merge these packages? [Yes/No] >>> Verifying ebuild manifests >>> Running pre-merge checks for dev-libs/libfilezilla-0.15.1 >>> Running pre-merge checks for sys-kernel/linux-firmware-20200421 >>> Running pre-merge checks for mail-client/thunderbird-68.7.0-r1 * Checking for at least 4 GiB disk space at "/tmp/portage/mail-client/thunderbird-68.7.0-r1/temp" ... [ ok ] >>> Running pre-merge checks for net-ftp/filezilla-3.39.0 >>> Running pre-merge checks for www-client/firefox-68.7.0-r1 * Checking for at least 4 GiB disk space at "/tmp/portage/www-client/firefox-68.7.0-r1/temp" ... [ ok ] >>> Running pre-merge checks for www-client/google-chrome-81.0.4044.129 >>> Emerging (1 of 40) app-arch/rpm2targz-9.0.0.5g-r2::gentoo >>> Jobs: 0 of 40 complete, 1 running Load avg: 0.16, 0.07, 0.03!!! aux_get(): ebuild for 'dev-libs/nettle-3.5.1' does not exist at: !!! /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild Exception in callback AsynchronousTask.wait() handle: <Handle AsynchronousTask.wait()> Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 39, in __getattr__ self.mtime = obj = os.stat(self.location)[stat.ST_MTIME] File "/usr/lib/python3.7/site-packages/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) FileNotFoundError: [Errno 2] No such file or directory: b'/var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 556, in _pull_valid_cache ebuild_hash.mtime File "/usr/lib/python3.7/site-packages/portage/eclass_cache.py", line 42, in __getattr__ raise FileNotFound(self.location) portage.exception.FileNotFound: /var/db/repos/gentoo/dev-libs/nettle/nettle-3.5.1.ebuild During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait self._wait_hook() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook self._exit_listener_stack.pop()(self) File "/usr/lib/python3.7/site-packages/_emerge/CompositeTask.py", line 95, in _default_final_exit return self.wait() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 84, in wait self._wait_hook() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 195, in _wait_hook self._exit_listener_stack.pop()(self) File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 57, in _task_exit self.schedule() File "/usr/lib/python3.7/site-packages/_emerge/SequentialTaskQueue.py", line 45, in schedule task.start() File "/usr/lib/python3.7/site-packages/_emerge/AsynchronousTask.py", line 30, in start self._start() File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 45, in _start AsyncTaskFuture(future=self._fetcher_proc._async_uri_map()), File "/usr/lib/python3.7/site-packages/_emerge/EbuildFetcher.py", line 292, in _async_uri_map useflags=use, mytree=mytree, loop=self.scheduler) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 811, in async_fetch_map mypkg, ["EAPI", "SRC_URI"], mytree=mytree, loop=loop) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 681, in async_aux_get mydata, ebuild_hash = self._pull_valid_cache(mycpv, myebuild, mylocation) File "/usr/lib/python3.7/site-packages/portage/dbapi/porttree.py", line 561, in _pull_valid_cache raise PortageKeyError(cpv) portage.exception.PortageKeyError: 'dev-libs/nettle-3.5.1' --Return-- > /usr/lib/python3.7/site-packages/portage/util/_eventloop/asyncio_event_loop.py(81)_internal_caller_exception_handler()->None -> pdb.set_trace() (Pdb) 2020-05-03 11:03 $ emerge --version Portage 2.3.89 (python 3.7.7-final-0, default/linux/amd64/17.1, gcc-9.3.0, glibc-2.30-r8, 5.6.7-r13 x86_64) apparently 3.5.1 of dev-libs/nettle was replaced with -r1 So, handling better missing ebuilds should be the answer. Given that I was running with `--keep-going y` I'd expect this ebuild to be skipped with error and the process continue. I am looking at the linked commits, but not sure they will cover this trigger... hmm, I guess the test will be to run `emerge -a something`, wait till prompt, delete an ebuild in another session and press `y`. I am running stable portage and there is no upgrade for that yet.
(In reply to Kalin KOZHUHAROV from comment #6) > I was just about to file a new bug and I found this, I feel it is the same, > but trigger was different. I had a waiting for confirmation `emerge -a` > (hours overnight) and I am guessing `emerge-sync` was ran by cron and > removed some files. When I hit `y` this is what happened: In order to safely run emerge --sync in a cron job, you need to enable sync-rcu and set sync-rcu-store-dir in repos.conf. It's documented in the emerge man page (this was bug 662070).
(In reply to Zac Medico from comment #7) > It's documented in the emerge man page (this was bug 662070). I mean the portage man page. Search for sync-rcu: https://dev.gentoo.org/~zmedico/portage/doc/man/portage.5.html