Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 923368 - sys-apps/portage-3.0.61-r1 fails tests on ppc64le, ppc64, ppc (64k kernel?): AssertionError: assert 'Largest environment variable: VERY_LARGE_ENV_VAR (262164 bytes)' in '\n'
Summary: sys-apps/portage-3.0.61-r1 fails tests on ppc64le, ppc64, ppc (64k kernel?): ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC64 Linux
: Normal normal
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS, PullRequest
Depends on: 921380
Blocks: 921933
  Show dependency tree
 
Reported: 2024-01-31 00:44 UTC by matoro
Modified: 2024-02-22 07:24 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log and emerge --info (file_923368.txt,817.60 KB, text/plain)
2024-01-31 00:46 UTC, matoro
Details
923368.patch (file_923368.txt,1.81 KB, patch)
2024-01-31 20:13 UTC, matoro
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description matoro archtester 2024-01-31 00:44:39 UTC
Reproduces on ppc64le, ppc4, ppc (tested also on dist-kernel to make sure it's not a kernel config problem).  Does not repro on amd64, arm64.  All using 64k page size, so that's my bet.

_______________________________________________________________ test_spawnE2big _______________________________________________________________
[gw3] linux -- Python 3.11.7 /usr/bin/python3.11

capsys = <_pytest.capture.CaptureFixture object at 0x7fff91f75290>
tmp_path = PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of-portage/pytest-0/popen-gw3/test_spawnE2big0')

    @pytest.mark.skipif(platform.system() != "Linux", reason="not Linux")
    def test_spawnE2big(capsys, tmp_path):
        env = dict()
        env["VERY_LARGE_ENV_VAR"] = "X" * 1024 * 256

        logfile = tmp_path / "logfile"
        echo_output = "Should never appear"
        with capsys.disabled():
            retval = portage.process.spawn(
                [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile
            )

        with open(logfile) as f:
            logfile_content = f.read()
>           assert (
                "Largest environment variable: VERY_LARGE_ENV_VAR (262164 bytes)"
                in logfile_content
            )
E           AssertionError: assert 'Largest environment variable: VERY_LARGE_ENV_VAR (262164 bytes)' in '\n'

Reproducible: Always
Comment 1 matoro archtester 2024-01-31 00:46:03 UTC
Created attachment 883750 [details]
build.log and emerge --info
Comment 2 Zac Medico gentoo-dev 2024-01-31 16:26:38 UTC
(In reply to matoro from comment #0)
> All using 64k page size, so that's my bet.

Yeah, what does `getconf ARG_MAX` report? It's 2097152 by default.

Maybe we should just have the test case mock _exec and raise a fake E2BIG exception.
Comment 3 matoro archtester 2024-01-31 18:00:41 UTC
(In reply to Zac Medico from comment #2)
> (In reply to matoro from comment #0)
> > All using 64k page size, so that's my bet.
> 
> Yeah, what does `getconf ARG_MAX` report? It's 2097152 by default.
> 
> Maybe we should just have the test case mock _exec and raise a fake E2BIG
> exception.

This is actually the same everywhere, doesn't vary with page size.

loongdev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 16384
ppc64dev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 65536
ppc64ledev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 65536
storage.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 4096
ia64dev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 65536
sparcdev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 8192
mipsdev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 4096
hppadev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 4096
alphadev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 8192
riscvdev.matoro.tk | CHANGED | rc=0 >>
ARG_MAX = 2097152, PAGE_SIZE = 4096
Comment 4 Florian Schmaus gentoo-dev 2024-01-31 18:25:57 UTC
FTR

logfile    = PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of-portage/pytest-0/popen-gw3/test_spawnE2big0/logfile')
logfile_content = '\n'

so it appears that the logfile only contains a single newline. Next step, find out what's going on :)
Comment 5 Zac Medico gentoo-dev 2024-01-31 18:50:36 UTC
(In reply to Florian Schmaus from comment #4)
> FTR
> 
> logfile    =
> PosixPath('/var/tmp/portage/sys-apps/portage-3.0.61-r1/temp/pytest-of-
> portage/pytest-0/popen-gw3/test_spawnE2big0/logfile')
> logfile_content = '\n'
> 
> so it appears that the logfile only contains a single newline. Next step,
> find out what's going on :)

It didn't raise E2BIG because of the difference in kernel config. The "Should never appear" message doesn't show because this line of the test case should only pass a single shell script argument to bash -c in order to avoid losing echo_output:

> [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile
Comment 6 Florian Schmaus gentoo-dev 2024-01-31 19:10:54 UTC
(In reply to Zac Medico from comment #5)
> It didn't raise E2BIG…

Yes, I think we don't see the kernel behavior we are testing.

> because of the difference in kernel config.

However, I assume not due a different kernel config, but due the different page size. Basically the threshold we want to cross is a constant factor times page size. The constant factor being MAX_ARG_STRLEN in the kernel.

On amd64 with 4KiB pages, this gives us a limit of 128KiB. The test creates an environment variable of 256 KiB to trigger E2BIG. However, on ppc'ish arches with 64KiB pages, the threshold is 2048KiB. Therefore, we kernel does not return E2BIG here.

At least, that is the working theory I've been discussing with matoro over IRC. 

>  The
> "Should never appear" message doesn't show because this line of the test
> case should only pass a single shell script argument to bash -c in order to
> avoid losing echo_output:
> 
> > [BASH_BINARY, "-c", "echo", echo_output], env=env, logfile=logfile

Good point! This should be changed to

[BASH_BINARY, "-c", f"echo {echo_output}"]
Comment 7 matoro archtester 2024-01-31 19:56:11 UTC
So based on discussion, I wrote this patch:  https://paste.matoro.tk/ohwjd91

On amd64, this applies and everything still passes.  But on the ppc side, this suddenly causes a completely unrelated test to start failing.  I've made sure this isn't a one-off, but reproducible over multiple runs, including switching back and forth between applying/not applying the patch.

Here's the log of the new failing test:  https://paste.matoro.tk/5czws2c
Comment 8 matoro archtester 2024-01-31 20:13:57 UTC
Created attachment 883876 [details, diff]
923368.patch

Swapping the order of closing the stream and killing the process fixes it.  Final patch here.
Comment 9 Larry the Git Cow gentoo-dev 2024-02-09 07:08:26 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=a308d831ceb1f1e7d0733700117cc18c8eca09c8

commit a308d831ceb1f1e7d0733700117cc18c8eca09c8
Author:     Matoro Mahri <matoro_gentoo@matoro.tk>
AuthorDate: 2024-01-31 20:26:06 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-02-09 07:08:22 +0000

    tests: process: calculate size of E2BIG environment variable dynamically
    
    The size of the command line required to trigger E2BIG response from the
    kernel is defined as MAX_ARG_STRLEN, which is equal to 32 * PAGE_SIZE.
    Therefore the minimum size required to trigger the expected error
    response must be calculated dynamically based on PAGE_SIZE.
    
    Bug: https://bugs.gentoo.org/923368
    Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk>
    Closes: https://github.com/gentoo/portage/pull/1246
    Signed-off-by: Sam James <sam@gentoo.org>

 lib/portage/tests/process/test_spawn_fail_e2big.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
Comment 10 Zac Medico gentoo-dev 2024-02-10 03:34:07 UTC
The BrokePipeError reported in https://gitweb.gentoo.org/proj/portage.git/commit/?id=fba76a545f29fb8b529197c25c64700ef77413ae resembles https://github.com/python/cpython/issues/65818 with was fixed way back in python 3.4 and 3.5.

In https://github.com/gentoo/portage/pull/1257 I have this patch that might help by only calling the kill method when the tar_stream_reader instance has not been successfully closed:

From ad61940b03be2f24c0b54c1070a4923abe18e633 Mon Sep 17 00:00:00 2001
From: Zac Medico <zmedico@gentoo.org>
Date: Fri, 9 Feb 2024 08:22:45 -0800
Subject: [PATCH 1/3] gpkg: Less aggressive subprocess.Popen kill in order to
 avoid BrokenPipeError

Do not kill tar_stream_reader instances if we can successfully
close them, since that can trigger BrokenPipeError during
self.proc.stdin.close() calls, and this state is best avoided
because it's unclear how the caller should handle the error.

If a BrokenPipeError does occur then simply print a traceback
and hope that the corresponding file descriptor is closed
during garbage collection. Do not try to reverse the order
of self.proc.kill() and self.proc.stdin.close() as in commit
fba76a545f2 since that triggered pypy ci job hangs for which
no reliable solution has been found so far.

Bug: https://bugs.gentoo.org/923368
Signed-off-by: Zac Medico <zmedico@gentoo.org>
---
 lib/portage/gpkg.py | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/lib/portage/gpkg.py b/lib/portage/gpkg.py
index 031b3f87cb..f1d8f97f8e 100644
--- a/lib/portage/gpkg.py
+++ b/lib/portage/gpkg.py
@@ -1,7 +1,8 @@
-# Copyright 2001-2020 Gentoo Authors
+# Copyright 2001-2024 Gentoo Authors
 # Distributed under the terms of the GNU General Public License v2
 
 import tarfile
+import traceback
 import io
 import threading
 import subprocess
@@ -151,7 +152,10 @@ def kill(self):
         if self.proc is not None:
             self.killed = True
             self.proc.kill()
-            self.proc.stdin.close()
+            try:
+                self.proc.stdin.close()
+            except BrokenPipeError:
+                traceback.print_exc()
             self.close()
 
     def _cmd_read_thread(self):
@@ -213,7 +217,7 @@ def close(self):
         if self.proc is not None:
             self.proc.stdin.close()
             if self.proc.wait() != os.EX_OK:
-                if not self.error:
+                if not (self.killed or self.error):
                     raise CompressorOperationFailed("compression failed")
             if self.read_thread.is_alive():
                 self.read_thread.join()
@@ -349,7 +353,10 @@ def kill(self):
         if self.proc is not None:
             self.killed = True
             self.proc.kill()
-            self.proc.stdin.close()
+            try:
+                self.proc.stdin.close()
+            except BrokenPipeError:
+                traceback.print_exc()
             self.close()
 
     def read(self, bufsize=-1):
@@ -986,11 +993,13 @@ def decompress(self, decompress_dir):
                     try:
                         image_safe = tar_safe_extract(image, "image")
                         image_safe.extractall(decompress_dir)
+                        image_tar.close()
                     except Exception as ex:
                         writemsg(colorize("BAD", "!!!Extract failed."))
                         raise
                     finally:
-                        image_tar.kill()
+                        if not image_tar.closed:
+                            image_tar.kill()
 
     def update_metadata(self, metadata, new_basename=None, force=False):
         """
Comment 11 Larry the Git Cow gentoo-dev 2024-02-10 06:06:06 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=ad61940b03be2f24c0b54c1070a4923abe18e633

commit ad61940b03be2f24c0b54c1070a4923abe18e633
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-02-09 16:22:45 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-02-09 23:52:11 +0000

    gpkg: Less aggressive subprocess.Popen kill in order to avoid BrokenPipeError
    
    Do not kill tar_stream_reader instances if we can successfully
    close them, since that can trigger BrokenPipeError during
    self.proc.stdin.close() calls, and this state is best avoided
    because it's unclear how the caller should handle the error.
    
    If a BrokenPipeError does occur then simply print a traceback
    and hope that the corresponding file descriptor is closed
    during garbage collection. Do not try to reverse the order
    of self.proc.kill() and self.proc.stdin.close() as in commit
    fba76a545f2 since that triggered pypy ci job hangs for which
    no reliable solution has been found so far.
    
    Bug: https://bugs.gentoo.org/923368
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/gpkg.py | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)
Comment 12 Larry the Git Cow gentoo-dev 2024-02-22 07:24:18 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=77c44c46194922509bc4f2b5cfc099412a560a69

commit 77c44c46194922509bc4f2b5cfc099412a560a69
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-02-22 07:23:40 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-02-22 07:23:50 +0000

    sys-apps/portage: add 3.0.62
    
    Closes: https://bugs.gentoo.org/663324
    Closes: https://bugs.gentoo.org/728046
    Closes: https://bugs.gentoo.org/891137
    Closes: https://bugs.gentoo.org/906368
    Closes: https://bugs.gentoo.org/916566
    Closes: https://bugs.gentoo.org/921170
    Closes: https://bugs.gentoo.org/921208
    Closes: https://bugs.gentoo.org/921400
    Closes: https://bugs.gentoo.org/922038
    Closes: https://bugs.gentoo.org/922142
    Closes: https://bugs.gentoo.org/923368
    Closes: https://bugs.gentoo.org/923750
    Closes: https://bugs.gentoo.org/923841
    Closes: https://bugs.gentoo.org/923852
    Closes: https://bugs.gentoo.org/923854
    Closes: https://bugs.gentoo.org/924192
    Closes: https://bugs.gentoo.org/924273
    Closes: https://bugs.gentoo.org/924585
    Closes: https://bugs.gentoo.org/921380
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-3.0.62.ebuild | 246 +++++++++++++++++++++++++++++++++
 2 files changed, 247 insertions(+)