Reproduces on ppc64le, ppc4, ppc (tested also on dist-kernel to make sure it's not a kernel config problem). Does not repro on amd64, arm64. All using 64k page size, so that's my bet. _______________________________________________________________ test_spawnE2big _______________________________________________________________ [gw3] linux -- Python 3.11.7 /usr/bin/python3.11 capsys = <_pytest.capture.CaptureFixture object at 0x7fff91f75290> tmp_path = PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of-portage/pytest-0/popen-gw3/test_spawnE2big0') @pytest.mark.skipif(platform.system() != "Linux", reason="not Linux") def test_spawnE2big(capsys, tmp_path): env = dict() env["VERY_LARGE_ENV_VAR"] = "X" * 1024 * 256 logfile = tmp_path / "logfile" echo_output = "Should never appear" with capsys.disabled(): retval = portage.process.spawn( [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile ) with open(logfile) as f: logfile_content = f.read() > assert ( "Largest environment variable: VERY_LARGE_ENV_VAR (262164 bytes)" in logfile_content ) E AssertionError: assert 'Largest environment variable: VERY_LARGE_ENV_VAR (262164 bytes)' in '\n' Reproducible: Always
Created attachment 883750 [details] build.log and emerge --info
(In reply to matoro from comment #0) > All using 64k page size, so that's my bet. Yeah, what does `getconf ARG_MAX` report? It's 2097152 by default. Maybe we should just have the test case mock _exec and raise a fake E2BIG exception.
(In reply to Zac Medico from comment #2) > (In reply to matoro from comment #0) > > All using 64k page size, so that's my bet. > > Yeah, what does `getconf ARG_MAX` report? It's 2097152 by default. > > Maybe we should just have the test case mock _exec and raise a fake E2BIG > exception. This is actually the same everywhere, doesn't vary with page size. loongdev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 16384 ppc64dev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 65536 ppc64ledev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 65536 storage.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 4096 ia64dev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 65536 sparcdev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 8192 mipsdev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 4096 hppadev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 4096 alphadev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 8192 riscvdev.matoro.tk | CHANGED | rc=0 >> ARG_MAX = 2097152, PAGE_SIZE = 4096
FTR logfile = PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of-portage/pytest-0/popen-gw3/test_spawnE2big0/logfile') logfile_content = '\n' so it appears that the logfile only contains a single newline. Next step, find out what's going on :)
(In reply to Florian Schmaus from comment #4) > FTR > > logfile = > PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of- > portage/pytest-0/popen-gw3/test_spawnE2big0/logfile') > logfile_content = '\n' > > so it appears that the logfile only contains a single newline. Next step, > find out what's going on :) It didn't raise E2BIG because of the difference in kernel config. The "Should never appear" message doesn't show because this line of the test case should only pass a single shell script argument to bash -c in order to avoid losing echo_output: > [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile
(In reply to Zac Medico from comment #5) > It didn't raise E2BIG… Yes, I think we don't see the kernel behavior we are testing. > because of the difference in kernel config. However, I assume not due a different kernel config, but due the different page size. Basically the threshold we want to cross is a constant factor times page size. The constant factor being MAX_ARG_STRLEN in the kernel. On amd64 with 4KiB pages, this gives us a limit of 128KiB. The test creates an environment variable of 256 KiB to trigger E2BIG. However, on ppc'ish arches with 64KiB pages, the threshold is 2048KiB. Therefore, we kernel does not return E2BIG here. At least, that is the working theory I've been discussing with matoro over IRC. > The > "Should never appear" message doesn't show because this line of the test > case should only pass a single shell script argument to bash -c in order to > avoid losing echo_output: > > > [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile Good point! This should be changed to [BASH_BINARY, "-c", f"echo {echo_output}"]
So based on discussion, I wrote this patch: https://paste.matoro.tk/ohwjd91 On amd64, this applies and everything still passes. But on the ppc side, this suddenly causes a completely unrelated test to start failing. I've made sure this isn't a one-off, but reproducible over multiple runs, including switching back and forth between applying/not applying the patch. Here's the log of the new failing test: https://paste.matoro.tk/5czws2c
Created attachment 883876 [details, diff] 923368.patch Swapping the order of closing the stream and killing the process fixes it. Final patch here.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=a308d831ceb1f1e7d0733700117cc18c8eca09c8 commit a308d831ceb1f1e7d0733700117cc18c8eca09c8 Author: Matoro Mahri <matoro_gentoo@matoro.tk> AuthorDate: 2024-01-31 20:26:06 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-02-09 07:08:22 +0000 tests: process: calculate size of E2BIG environment variable dynamically The size of the command line required to trigger E2BIG response from the kernel is defined as MAX_ARG_STRLEN, which is equal to 32 * PAGE_SIZE. Therefore the minimum size required to trigger the expected error response must be calculated dynamically based on PAGE_SIZE. Bug: https://bugs.gentoo.org/923368 Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk> Closes: https://github.com/gentoo/portage/pull/1246 Signed-off-by: Sam James <sam@gentoo.org> lib/portage/tests/process/test_spawn_fail_e2big.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
The BrokePipeError reported in https://gitweb.gentoo.org/proj/portage.git/commit/?id=fba76a545f29fb8b529197c25c64700ef77413ae resembles https://github.com/python/cpython/issues/65818 with was fixed way back in python 3.4 and 3.5. In https://github.com/gentoo/portage/pull/1257 I have this patch that might help by only calling the kill method when the tar_stream_reader instance has not been successfully closed: From ad61940b03be2f24c0b54c1070a4923abe18e633 Mon Sep 17 00:00:00 2001 From: Zac Medico <zmedico@gentoo.org> Date: Fri, 9 Feb 2024 08:22:45 -0800 Subject: [PATCH 1/3] gpkg: Less aggressive subprocess.Popen kill in order to avoid BrokenPipeError Do not kill tar_stream_reader instances if we can successfully close them, since that can trigger BrokenPipeError during self.proc.stdin.close() calls, and this state is best avoided because it's unclear how the caller should handle the error. If a BrokenPipeError does occur then simply print a traceback and hope that the corresponding file descriptor is closed during garbage collection. Do not try to reverse the order of self.proc.kill() and self.proc.stdin.close() as in commit fba76a545f2 since that triggered pypy ci job hangs for which no reliable solution has been found so far. Bug: https://bugs.gentoo.org/923368 Signed-off-by: Zac Medico <zmedico@gentoo.org> --- lib/portage/gpkg.py | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/lib/portage/gpkg.py b/lib/portage/gpkg.py index 031b3f87cb..f1d8f97f8e 100644 --- a/lib/portage/gpkg.py +++ b/lib/portage/gpkg.py @@ -1,7 +1,8 @@ -# Copyright 2001-2020 Gentoo Authors +# Copyright 2001-2024 Gentoo Authors # Distributed under the terms of the GNU General Public License v2 import tarfile +import traceback import io import threading import subprocess @@ -151,7 +152,10 @@ def kill(self): if self.proc is not None: self.killed = True self.proc.kill() - self.proc.stdin.close() + try: + self.proc.stdin.close() + except BrokenPipeError: + traceback.print_exc() self.close() def _cmd_read_thread(self): @@ -213,7 +217,7 @@ def close(self): if self.proc is not None: self.proc.stdin.close() if self.proc.wait() != os.EX_OK: - if not self.error: + if not (self.killed or self.error): raise CompressorOperationFailed("compression failed") if self.read_thread.is_alive(): self.read_thread.join() @@ -349,7 +353,10 @@ def kill(self): if self.proc is not None: self.killed = True self.proc.kill() - self.proc.stdin.close() + try: + self.proc.stdin.close() + except BrokenPipeError: + traceback.print_exc() self.close() def read(self, bufsize=-1): @@ -986,11 +993,13 @@ def decompress(self, decompress_dir): try: image_safe = tar_safe_extract(image, "image") image_safe.extractall(decompress_dir) + image_tar.close() except Exception as ex: writemsg(colorize("BAD", "!!!Extract failed.")) raise finally: - image_tar.kill() + if not image_tar.closed: + image_tar.kill() def update_metadata(self, metadata, new_basename=None, force=False): """
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=ad61940b03be2f24c0b54c1070a4923abe18e633 commit ad61940b03be2f24c0b54c1070a4923abe18e633 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-02-09 16:22:45 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-02-09 23:52:11 +0000 gpkg: Less aggressive subprocess.Popen kill in order to avoid BrokenPipeError Do not kill tar_stream_reader instances if we can successfully close them, since that can trigger BrokenPipeError during self.proc.stdin.close() calls, and this state is best avoided because it's unclear how the caller should handle the error. If a BrokenPipeError does occur then simply print a traceback and hope that the corresponding file descriptor is closed during garbage collection. Do not try to reverse the order of self.proc.kill() and self.proc.stdin.close() as in commit fba76a545f2 since that triggered pypy ci job hangs for which no reliable solution has been found so far. Bug: https://bugs.gentoo.org/923368 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/gpkg.py | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=77c44c46194922509bc4f2b5cfc099412a560a69 commit 77c44c46194922509bc4f2b5cfc099412a560a69 Author: Sam James <sam@gentoo.org> AuthorDate: 2024-02-22 07:23:40 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-02-22 07:23:50 +0000 sys-apps/portage: add 3.0.62 Closes: https://bugs.gentoo.org/663324 Closes: https://bugs.gentoo.org/728046 Closes: https://bugs.gentoo.org/891137 Closes: https://bugs.gentoo.org/906368 Closes: https://bugs.gentoo.org/916566 Closes: https://bugs.gentoo.org/921170 Closes: https://bugs.gentoo.org/921208 Closes: https://bugs.gentoo.org/921400 Closes: https://bugs.gentoo.org/922038 Closes: https://bugs.gentoo.org/922142 Closes: https://bugs.gentoo.org/923368 Closes: https://bugs.gentoo.org/923750 Closes: https://bugs.gentoo.org/923841 Closes: https://bugs.gentoo.org/923852 Closes: https://bugs.gentoo.org/923854 Closes: https://bugs.gentoo.org/924192 Closes: https://bugs.gentoo.org/924273 Closes: https://bugs.gentoo.org/924585 Closes: https://bugs.gentoo.org/921380 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.62.ebuild | 246 +++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+)