Summary: | x11-libs/qwt:6[qt6]: sandbox causes qmake to fail: Project ERROR: Cannot run compiler 'g++' | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Andrew Udvare <audvare> |
Component: | Sandbox | Assignee: | Sandbox Maintainers <sandbox> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | gto7052, ionen, sam, sci |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=908807 https://bugs.gentoo.org/show_bug.cgi?id=908816 https://bugs.gentoo.org/show_bug.cgi?id=913493 https://bugs.gentoo.org/show_bug.cgi?id=915695 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
build log
coredump Dockerfile fix sys-apps/sandbox-2.35 QA-notice |
Description
Andrew Udvare
2023-06-18 22:52:43 UTC
Created attachment 864179 [details]
build log
Was unable to reproduce, maybe an LTO problem? I built qtbase, sandbox, and even glibc without LTO and the issue still occurs. This seems like a sandbox issue: qmake6 at some point is running `/bin/sh -c 'g++ -E /usr/lib64/qt6/mkspecs/features/data/macros.cpp 2>/dev/null'`. This triggers sandbox to look up the execv symbol with sb_unwrapped_execv_DEFAULT(). dlvsym fails and _dl_catch_error is called to handle the error. The error handler eventually segfaults during a recursive call to do_lookup_x. The segfault is within glibc after its error handling (__GI__dl_catch_exception, etc). dlvsym call in sandbox https://gitweb.gentoo.org/proj/sandbox.git/tree/libsandbox/wrappers.c#n53 do_lookup_x https://github.com/bminor/glibc/blob/master/elf/dl-lookup.c#L338 Created attachment 864419 [details]
coredump
Does the build succeed with FEATURES="-sandbox -usersandbox"? (In reply to Mike Gilbert from comment #5) > Does the build succeed with FEATURES="-sandbox -usersandbox"? Yes. I am unable to reproduce this. This really only started happening after getting my machines to use LTO for every possible package (I set -flto in flags and then ran emerge -e @world). I only excluded ones that are not yet fixed. My config (make.conf variables are already posted): /etc/portage/env/no-lto: CFLAGS="-O2 -ftree-vectorize -ggdb -march=native -mtune=native -pipe" CXXFLAGS="${CFLAGS}" /etc/portage/package.env/no-lto: app-crypt/efitools no-lto # https://bugs.gentoo.org/908813 app-emulation/virtualbox no-lto # https://bugs.gentoo.org/908814 app-shells/mcfly no-lto # https://bugs.gentoo.org/854867 dev-python/cryptography no-lto # https://bugs.gentoo.org/903908 dev-qt/qtscript no-lto # https://bugs.gentoo.org/652158 media-sound/guitarix no-lto # https://bugs.gentoo.org/860861 media-sound/musescore no-lto # https://bugs.gentoo.org/908808 media-video/rav1e no-lto # https://bugs.gentoo.org/908815 sys-apps/ripgrep-all no-lto # https://bugs.gentoo.org/863626 Created attachment 864511 [details] Dockerfile I was able to reproduce this bug in a Docker container. The Dockerfile sets up the image and then builds a 'build.sh' script that should be run. $ # download Dockerfile here, named Dockerfile $ docker build -t bug908880 $ docker run -it bug908880:latest $ ./build.sh Also qtbase[cups] is *not* pulling in net-print/cups like it should be. Thanks. I'll give it a spin, though I will need to adjust the cpu flags to work on my old AMD Phenom processor. I ran similar steps in a systemd-nspawn container, and was unable to reproduce the problem. As previously mentioned, I did need to tweak CPU_FLAGS_X86 to match my CPU. As well, -march=native would enable different instructions on my system. It is possible this plays some role. For the record, here is my make.conf from the container: COMMON_FLAGS="-O2 -flto -ftree-vectorize -ggdb -march=native -mtune=native -pipe" CFLAGS="${COMMON_FLAGS}" CXXFLAGS="${COMMON_FLAGS}" FCFLAGS="${COMMON_FLAGS}" FFLAGS="${COMMON_FLAGS}" ACCEPT_KEYWORDS="~amd64" ACCEPT_LICENSE="*" INPUT_DEVICES="evdev libinput" L10N="en en_US" LINGUAS="en en_US" USE="lto qt6" CPU_FLAGS_X86="3dnow 3dnowext mmx mmxext popcnt sse sse2 sse3 sse4a" MAKEOPTS="-j6" FEATURES="-network-sandbox" This is definitely a CPU issue. The problem is figuring out which package and flag is causing the issue. On an older machine with these settings I did not get the same issue. CFLAGS="-O2 -flto -ftree-vectorize -march=native -mtune=native -pipe" CXXFLAGS="-O2 -flto -ftree-vectorize -march=native -mtune=native -pipe" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sse sse2 sse3 sse4_1 sse4_2 ssse3" I successfully built sys-apps/sandbox, qwt:6[qt5,qt6] (in that order) with the above settings. Renamed this ticket since the qmake6 crash was caused by -ftree-vectorize on sys-apps/sandbox. I am keeping that flag removed from sandbox permanently. This bug can be considered somewhat of a duplicate of https://bugs.gentoo.org/show_bug.cgi?id=908807 (graphviz[qt5] having the same issue because it find Qt 6's qmake first). (In reply to Andrew Udvare from comment #13) > Renamed this ticket since the qmake6 crash was caused by -ftree-vectorize on > sys-apps/sandbox. I am keeping that flag removed from sandbox permanently. > It's unclear to me if that's the same issue as your original dlsym problem. In any case, that shouldn't happen. > This bug can be considered somewhat of a duplicate of > https://bugs.gentoo.org/show_bug.cgi?id=908807 (graphviz[qt5] having the > same issue because it find Qt 6's qmake first). Let's leave that other bug purely for the Qt 6 automagic which is an issue in itself. Here is a longer explanation of what is happening but I do not yet have a solution. qmake6 is running its normal scripts to check the system. It lands here: https://github.com/qt/qtbase/blob/f4a80552c2784eaa0659acd12df7a865aeb7ebec/mkspecs/features/toolchain.prf#L38 which runs `g++ -E /usr/lib64/qt6/mkspecs/features/data/macros.cpp 2>/dev/null`. The expected output is generated (I tested this by setting its output to another file). Only when under sandbox, a cc1plus subprocess exits with SIGPIPE. This in turn makes GCC return non-zero (-1 when it gets back to qmake). The cc1plus subprocess stops in c_common_finish (line 1311 for gcc-13.1.1_p20230527, https://github.com/gcc-mirror/gcc/blob/master/gcc/c-family/c-opts.cc#L1319 ). The call to fclose() fails. Stack trace: Thread 5.1 "cc1plus" received signal SIGPIPE, Broken pipe. 0x00007ffff7b5db80 in write () from /usr/lib64/libc.so.6 (ins)(gdb) bt #0 0x00007ffff7b5db80 in write () from /usr/lib64/libc.so.6 #1 0x00007ffff7ae5185 in _IO_file_write () from /usr/lib64/libc.so.6 #2 0x00007ffff7ae44c9 in ?? () from /usr/lib64/libc.so.6 #3 0x00007ffff7ae6229 in _IO_do_write () from /usr/lib64/libc.so.6 #4 0x00007ffff7ae5b50 in _IO_file_close_it () from /usr/lib64/libc.so.6 #5 0x00007ffff7ad892b in fclose () from /usr/lib64/libc.so.6 #6 0x000000000191f7cd in c_common_finish () at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/c-family/c-opts.cc:1311 #7 0x00000000018d3b23 in finalize (no_backend=true) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2031 #8 do_compile (no_backend=true) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2138 #9 toplev::main (this=this@entry=0x7fffffffd056, argc=<optimized out>, argv=<optimized out>) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2281 #10 0x00000000018d2dbb in main (argc=<optimized out>, argv=<optimized out>) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/main.cc:39 The disassembly at where it stopped (I do not know why I cannot get glibc code to show in GDB despite having the source unpacked by Portage and built with -ggdb): Dump of assembler code for function write: 0x00007ffff7b5db70 <+0>: cmp BYTE PTR [rip+0xdda61],0x0 # 0x7ffff7c3b5d8 <__libc_single_threaded> 0x00007ffff7b5db77 <+7>: je 0x7ffff7b5db90 <write+32> 0x00007ffff7b5db79 <+9>: mov eax,0x1 0x00007ffff7b5db7e <+14>: syscall => 0x00007ffff7b5db80 <+16>: cmp rax,0xfffffffffffff000 0x00007ffff7b5db86 <+22>: ja 0x7ffff7b5dbe0 <write+112> The return value from the write syscall is -32 (in $rax). This cmp instruction is from https://github.com/bminor/glibc/blob/4290aed05135ae4c0272006442d147f2155e70d7/sysdeps/unix/sysv/linux/sysdep.h#L27C1-L28C40 , assuming write is actually _dl_write https://github.com/bminor/glibc/blob/4290aed05135ae4c0272006442d147f2155e70d7/sysdeps/unix/sysv/linux/dl-write.c#L24 . #define INTERNAL_SYSCALL_ERROR_P(val) \ ((unsigned long int) (val) > -4096UL) With proper symbols, the stack trace: Thread 5.1 "cc1plus" received signal SIGPIPE, Broken pipe. 0x00007ffff7b68b80 in __GI___libc_write (fd=1, buf=0x26d8670, nbytes=378) at ../sysdeps/unix/sysv/linux/write.c:26 26 return SYSCALL_CANCEL (write, fd, buf, nbytes); (ins)(gdb) bt #0 0x00007ffff7b68b80 in __GI___libc_write (fd=1, buf=0x26d8670, nbytes=378) at ../sysdeps/unix/sysv/linux/write.c:26 #1 0x00007ffff7af0185 in _IO_new_file_write (f=0x7ffff7c3f780 <_IO_2_1_stdout_>, data=0x26d8670, n=378) at fileops.c:1180 #2 0x00007ffff7aef4c9 in new_do_write (fp=0x7ffff7c3f780 <_IO_2_1_stdout_>, data=0x26d8670 "# 0 \"/usr/lib64/qt6/mkspecs/features/data/macros.cpp\"\n# 0 \"<built-in>\"\n# 0 \"<command-line>\"\n# 1 \"/usr/include/stdc-predef.h\" 1 3 4\n# 0 \"<command-line>\" 2\n# 1 \"/usr/lib64/qt6/mkspecs/features/data/macr"..., to_do=to_do@entry=378) at /var/tmp/portage/sys-libs/glibc-2.37-r3/work/glibc-2.37/libio/libioP.h:946 #3 0x00007ffff7af1229 in _IO_new_do_write (fp=fp@entry=0x7ffff7c3f780 <_IO_2_1_stdout_>, data=<optimized out>, to_do=378) at fileops.c:425 #4 0x00007ffff7af0b50 in _IO_new_file_close_it (fp=fp@entry=0x7ffff7c3f780 <_IO_2_1_stdout_>) at fileops.c:135 #5 0x00007ffff7ae392b in _IO_new_fclose (fp=0x7ffff7c3f780 <_IO_2_1_stdout_>) at iofclose.c:53 #6 0x0000000000803bfa in c_common_finish () at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/c-family/c-opts.cc:1311 #7 0x00000000014931ff in finalize (no_backend=true) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2031 #8 do_compile (no_backend=true) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2138 #9 toplev::main (this=this@entry=0x7fffffffd056, argc=<optimized out>, argc@entry=11, argv=<optimized out>, argv@entry=0x7fffffffd188) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/toplev.cc:2281 #10 0x000000000149494b in main (argc=11, argv=0x7fffffffd188) at /var/tmp/portage/sys-devel/gcc-13.1.1_p20230527/work/gcc-13-20230527/gcc/main.cc:39 I cannot reproduce the issue by passing the GCC command to sandbox. It will exit normally. Only when sandbox is involved and the path is sandbox -> qmake -> gcc -> cc1plus does it fail. When I put together a Dockerfile without -ftree-vectorize but with -flto and built @world, everything worked. My flags are now: -O2 -flto=auto -ggdb -march=native -mtune=native -pipe About 1/3 of my system was built with these flags (emerge -e1 qwt:6 media-gfx/graphviz), but the issue continues. I will consider a full rebuild, and possibly a full rebuild without -flto. Prior to this issue I did not have -flto set globally and that's when qwt:6 built successfully with sandbox enabled on my real machine (and not inside container). Core dump compressed with zstd: https://chunk.io/Tatsh/7a0714c9221f4cc194afba6233e38df6 Clang gets the same error: Thread 4.1 "clang++" received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7ffff7dc7f40 (LWP 13229)] 0x00007fffebb20b80 in __GI___libc_write (fd=1, buf=0x555555658d10, nbytes=718) at ../sysdeps/unix/sysv/linux/write.c:26 26 return SYSCALL_CANCEL (write, fd, buf, nbytes); (ins)(gdb) bt #0 0x00007fffebb20b80 in __GI___libc_write (fd=1, buf=0x555555658d10, nbytes=718) at ../sysdeps/unix/sysv/linux/write.c:26 #1 0x00007fffed03f640 in llvm::raw_fd_ostream::write_impl (this=0x55555561ce80, Ptr=0x555555658d10 "# 1 \"/usr/lib64/qt6/mkspecs/features/data/macros.cpp\"\n# 1 \"<built-in>\" 1\n# 1 \"<built-in>\" 3\n# 438 \"<built-in>\" 3\n# 1 \"<command line>\" 1\n# 1 \"<built-in>\" 2\n# 1 \"/usr/include/gentoo/fortify.h\" 1\n# 3 \"/u"..., Size=718) at /var/tmp/portage/sys-devel/llvm-16.0.6/work/llvm/lib/Support/raw_ostream.cpp:768 #2 0x00007fffed03fefa in llvm::raw_ostream::flush (this=0x55555561ce80) at /var/tmp/portage/sys-devel/llvm-16.0.6/work/llvm/include/llvm/Support/raw_ostream.h:187 #3 llvm::raw_fd_ostream::~raw_fd_ostream (this=0x55555561ce80, __in_chrg=<optimized out>) at /var/tmp/portage/sys-devel/llvm-16.0.6/work/llvm/lib/Support/raw_ostream.cpp:668 #4 0x00007fffed03ffc9 in llvm::raw_fd_ostream::~raw_fd_ostream (this=0x55555561ce80, __in_chrg=<optimized out>) ... We need to know what's on the other side of the pipe which dies, SIGPIPE is ok What else can I do in the debugger? My current general workflow to get data: set follow-fork-mode child set detach-on-fork no b execv_DEFAULT r Revising title because USE=qt5 is not necessary to trigger the error. I'd be interested if the attached patch which adressing sys-apps/sandbox-2.35 maybe fixes this issue. Installing sandbox-2.35 generated several possibly severe QA-notices here: * /tmp/portage/sys-apps/sandbox-2.35/work/sandbox-2.35/src/environ.c:211:19: warning: the comparison will always evaluate as ‘true’ for the address of ‘work_dir’ will never be NULL [-Waddress] * /tmp/portage/sys-apps/sandbox-2.35/work/sandbox-2.35/libsandbox/canonicalize.c:113:41: warning: passing argument 2 to 'restrict'-qualified parameter aliases with argument 1 [-Wrestrict] * /tmp/portage/sys-apps/sandbox-2.35/work/sandbox-2.35/libsandbox/libsandbox.c:144:9: warning: passing argument 2 to 'restrict'-qualified parameter aliases with argument 1 [-Wrestrict] Depending on cpu/compiler version, code like readlink(x,x,sizeof(x)) could lead to undefined behavior because source and destination buffers overlap. Created attachment 865353 [details, diff]
fix sys-apps/sandbox-2.35 QA-notice
The proposed patch; it was also sent to the sandbox maintainers.
Tried the above patch on sandbox and got the same result. Tried with the newest sandbox 2.37 and the GCC subprocess (under qmake6) no longer crashes, but it still gives unexpected empty output. The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9152c25f592db19e2d6f6ab0aab991a463503a34 commit 9152c25f592db19e2d6f6ab0aab991a463503a34 Author: Ionen Wolkens <ionen@gentoo.org> AuthorDate: 2023-10-21 05:46:22 +0000 Commit: Ionen Wolkens <ionen@gentoo.org> CommitDate: 2023-10-21 06:21:50 +0000 dev-qt/qtbase: fix qsb and qmake with sandbox Also add to 6.5.3, while the issue has been less prominent in 6.5.x, there has been users that ran into issues with older versions, and is needed for stable users. See bug #915695 for details, the others are essentially duplicates which are hopefully fixed too (please report if still issues given I could never reproduce myself and cannot confirm). Closes: https://bugs.gentoo.org/908809 Closes: https://bugs.gentoo.org/908816 Closes: https://bugs.gentoo.org/913493 Closes: https://bugs.gentoo.org/915695 Thanks-to: vowstar Thanks-to: Mike Gilbert <floppym@gentoo.org> Signed-off-by: Ionen Wolkens <ionen@gentoo.org> .../qtbase-6.5.3-forkfd-childstack-size.patch | 27 ++++++++++++++++++++++ ...{qtbase-6.5.3.ebuild => qtbase-6.5.3-r1.ebuild} | 1 + ...{qtbase-6.6.0.ebuild => qtbase-6.6.0-r1.ebuild} | 1 + 3 files changed, 29 insertions(+) |