Summary: | dev-lang/python-3.12.0_beta3: build failure on ppc32 (miscompiled by gcc-13?) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Sam James <sam> |
Component: | Current packages | Assignee: | Python Gentoo Team <python> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | amonakov+bugs.gentoo, ppc, toolchain |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | PPC | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=880677 https://github.com/python/cpython/issues/106428 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99619 https://sourceware.org/bugzilla/show_bug.cgi?id=30697 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 865117, 915000 | ||
Attachments: | build.log |
Description
Sam James
2023-07-03 01:29:37 UTC
[this is on 'ppc32-testing' on timberdoodle, can get into it via 'machinectl shell ppc32-testing']. /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3 # LD_LIBRARY_PATH=/var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3 ./python -E -S -m sysconfig --generate-posix-vars Segmentation fault (core dumped) (gdb) r Starting program: /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/python -E -S -m sysconfig --generate-posix-vars [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0xf7c3d608 in sys_audit_tstate () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 (gdb) bt #0 0xf7c3d608 in sys_audit_tstate () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #1 0xf7c3e920 in _PySys_Audit () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #2 0xf7c267b0 in PyInterpreterState_New () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #3 0xf7c23a28 in pyinit_core.constprop () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #4 0xf7c23d74 in Py_InitializeFromConfig () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #5 0xf7c58438 in pymain_init () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #6 0xf7c59918 in pymain_main () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #7 0xf7c59a40 in Py_BytesMain () from /var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3/libpython3.12.so.1.0 #8 0x00400534 in main () (gdb) (I can't look into this more tonight, sorry) Program received signal SIGSEGV, Segmentation fault. sys_audit_tstate (ts=0x312e0000, event=0xf7d5df0c "cpython.PyInterpreterState_New", argFormat=0x0, vargs=vargs@entry=0xffffd9d0) at ./Python/sysmodule.c:196 196 PyInterpreterState *is = ts->interp; (gdb) bt #0 sys_audit_tstate (ts=0x312e0000, event=0xf7d5df0c "cpython.PyInterpreterState_New", argFormat=0x0, vargs=vargs@entry=0xffffd9d0) at ./Python/sysmodule.c:196 #1 0xf7c3e920 in _PySys_Audit (tstate=tstate@entry=0x312e0000, event=event@entry=0xf7d5df0c "cpython.PyInterpreterState_New", argFormat=argFormat@entry=0x0) at ./Python/sysmodule.c:312 #2 0xf7c267b0 in PyInterpreterState_New () at Python/pystate.c:709 #3 0xf7c23a28 in pycore_create_interpreter (runtime=0xf7f6ff08 <_PyRuntime>, tstate_p=<synthetic pointer>, src_config=0xffffda68) at Python/pylifecycle.c:628 #4 pyinit_config (runtime=0xf7f6ff08 <_PyRuntime>, config=0xffffda68, tstate_p=0xffffdbc8) at Python/pylifecycle.c:891 #5 pyinit_core (src_config=src_config@entry=0xffffdc28, tstate_p=tstate_p@entry=0xffffdbc8, runtime=0xf7f6ff08 <_PyRuntime>) at Python/pylifecycle.c:1060 #6 0xf7c23d74 in Py_InitializeFromConfig (config=0xffffdc28) at Python/pylifecycle.c:1256 #7 Py_InitializeFromConfig (config=config@entry=0xffffdc28) at Python/pylifecycle.c:1241 #8 0xf7c58438 in pymain_init (args=args@entry=0xffffddcc) at Modules/main.c:67 #9 0xf7c59918 in pymain_main (args=args@entry=0xffffddcc) at Modules/main.c:710 #10 0xf7c59a40 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:743 #11 0x00400534 in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15 (gdb) p ts $1 = (PyThreadState *) 0x312e0000 (gdb) p *ts Cannot access memory at address 0x312e0000 mgorny says gcc-11.3.1_p20221209 works, gcc-13.1.1_p20230527 fails. He's building 12. ubsan says Include/internal/pycore_pystate.h:118:18: runtime error: member access within null pointer of type 'struct PyThreadState' #0 0xf6cfa94c in _PyInterpreterState_GET Include/internal/pycore_pystate.h:118 #1 0xf6cfa94c in PyDict_New Objects/dictobject.c:849 #2 0xf6ee16ac in init_interned_dict Objects/unicodeobject.c:248 #3 0xf6ee16ac in _PyUnicode_InitGlobalObjects Objects/unicodeobject.c:14659 #4 0xf710a368 in pycore_init_global_objects Python/pylifecycle.c:678 #5 0xf710a368 in pycore_interp_init Python/pylifecycle.c:826 #6 0xf7110014 in pyinit_config Python/pylifecycle.c:897 #7 0xf7110014 in pyinit_core Python/pylifecycle.c:1060 #8 0xf7110388 in Py_InitializeFromConfig Python/pylifecycle.c:1256 #9 0xf7110388 in Py_InitializeFromConfig Python/pylifecycle.c:1241 #10 0xf719e23c in pymain_init Modules/main.c:67 #11 0xf71a0f38 in pymain_main Modules/main.c:710 #12 0xf71a1060 in Py_BytesMain Modules/main.c:743 #13 0x8b0540 in main Programs/python.c:15 #14 0x649bf0 (/usr/lib/libc.so.6+0x29bf0) #15 0x649e14 in __libc_start_main (/usr/lib/libc.so.6+0x29e14) _PyInterpreterState_GET has a _Py_EnsureTstateNotNULL guard if Py_DEBUG, trying... (In reply to Sam James from comment #6) > _PyInterpreterState_GET has a _Py_EnsureTstateNotNULL guard if Py_DEBUG, > trying... powerpc-unknown-linux-gnu-gcc -Wl,-O1 -Wl,--as-needed -Xlinker -export-dynamic -o python Programs/python.o -L. -lpython3.12d -ldl -lm powerpc-unknown-linux-gnu-gcc -Wl,-O1 -Wl,--as-needed -Xlinker -export-dynamic -o Programs/_testembed Programs/_testembed.o -L. -lpython3.12d -ldl -lm LD_LIBRARY_PATH=/var/tmp/portage/dev-lang/python-3.12.0_beta3/work/Python-3.12.0b3 ./python -E -S -m sysconfig --generate-posix-vars ;\ if test $? -ne 0 ; then \ echo "generate-posix-vars failed" ; \ rm -f ./pybuilddir.txt ; \ exit 1 ; \ fi Fatal Python error: _PyInterpreterState_GET: the function must be called with the GIL held, after Python initialization and before Python finalization, but the GIL is released (the current Python thread state is NULL) Python runtime state: preinitialized Current thread 0xf793c020 (most recent call first): <no Python frame> make: *** [Makefile:949: pybuilddir.txt] Error 134 * ERROR: dev-lang/python-3.12.0_beta3::gentoo failed (compile phase): * emake failed meh. mgorny says sys-devel/gcc-12.3.1_p20230623:12::gentoo is ok. At first, I couldn't reproduce it manually, but: (export CFLAGS_NODIST="-O2 -fwrapv -mcpu=powerpc" ; ./configure --enable-shared && make -j$(nproc) CPPFLAGS= CFLAGS= LDFLAGS= ) works, i.e. --enable-shared is required. I bisected gcc in case it could give us a hint as to what's wrong (even if Python is to blame) and got: 1d561e1851c466a4952081caef17747781609b00 is the first bad commit commit 1d561e1851c466a4952081caef17747781609b00 Author: Artem Klimov <jakmobius@gmail.com> Date: Wed Jul 6 17:02:01 2022 +0300 ipa-visibility: Optimize TLS access [PR99619] Fix PR99619, which asks to optimize TLS model based on visibility. The fix is implemented as an IPA optimization: this allows to take optimized visibility status into account (as well as avoid modifying all language frontends). 2022-04-17 Artem Klimov <jakmobius@gmail.com> gcc/ChangeLog: PR middle-end/99619 * ipa-visibility.cc (function_and_variable_visibility): Promote TLS access model afer visibility optimizations. * varasm.cc (have_optimized_refs): New helper. (optimize_dyn_tls_for_decl_p): New helper. Use it ... (decl_default_tls_model): ... here in place of 'optimize' check. gcc/testsuite/ChangeLog: PR middle-end/99619 * gcc.dg/tls/vis-attr-gd.c: New test. * gcc.dg/tls/vis-attr-hidden-gd.c: New test. * gcc.dg/tls/vis-attr-hidden.c: New test. * gcc.dg/tls/vis-flag-hidden-gd.c: New test. * gcc.dg/tls/vis-flag-hidden.c: New test. * gcc.dg/tls/vis-pragma-hidden-gd.c: New test. * gcc.dg/tls/vis-pragma-hidden.c: New test. Co-Authored-By: Alexander Monakov <amonakov@gcc.gnu.org> Signed-off-by: Artem Klimov <jakmobius@gmail.com> the python people seem to think TLS is broken, but that feels unlikely to me? but then again, the bisect result is related, so I don't know if this is a GCC thing or a Python thing yet. amonakov, any ideas by chance? I can reproduce it. It's a BFD linker bug. With another linker (CC='gcc -fuse-ld=gold') it works fine. The linker seems to mishandle one of these relocations when GOT is big: addis 3,3,_Py_tss_tstate@dtprel@ha addi 3,3,_Py_tss_tstate@dtprel@l Single-stepping in GDB shows that the resulting value of r3 differs from actual address of _Py_tss_tstate (as computed by GDB, which in principle could also be wrong) by 0x1000. You can reproduce it with any GCC version by specifying TLS model in python source explicitly: diff --git a/Python/pystate.c b/Python/pystate.c index cdd975f..ef8b9f6 100644 --- a/Python/pystate.c +++ b/Python/pystate.c @@ -63,6 +63,7 @@ extern "C" { #ifdef HAVE_THREAD_LOCAL +__attribute__((tls_model("local-dynamic"))) _Py_thread_local PyThreadState *_Py_tss_tstate = NULL; #endif (or you can specify the "global-dynamic" model explicitly to paper over the linker bug with new GCC) Thank you for the analysis! Reported to binutils at https://sourceware.org/bugzilla/show_bug.cgi?id=30697. Fixed upstream now by Alan! commit f3b1a0a2a8d1d13ba80a5b1b38f021b38abda220 Author: Michał Górny <mgorny@gentoo.org> Date: Fri Aug 4 04:52:00 2023 +0200 dev-lang/python: Backport binutils TLS workaround as 3.12.0_beta4_p2 Signed-off-by: Michał Górny <mgorny@gentoo.org> and commit 702277874f55fdee0c90b93949c07f967362d834 Author: Andreas K. Hüttel <dilfridge@gentoo.org> Date: Fri Aug 4 13:02:27 2023 +0200 sys-devel/binutils: 2.41 patchlevel 2 bump (2.41-r1) Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Many thanks Alexander for the superb help. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=95805f7262242992d5200428925c75eaa7d910cb commit 95805f7262242992d5200428925c75eaa7d910cb Author: Sam James <sam@gentoo.org> AuthorDate: 2023-08-09 02:39:51 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-08-09 02:39:51 +0000 sys-devel/binutils: keyword 2.40-r8 This contains the fix for ppc TLS. We don't need to wait to stable this one to drop the patch from Python 3.12, I just didn't want the fix to solely be in the very-new 2.41 slot as users may not have changed to it yet and it felt likely to lead to confused users/bug reports if Python 3.12 was only buildable w/ binutils-2.41. Bug: https://bugs.gentoo.org/909544 Signed-off-by: Sam James <sam@gentoo.org> sys-devel/binutils/binutils-2.40-r8.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 676f06fc74cc7c4006706acedfff91a0dd6d554e Author: Michał Górny <mgorny@gentoo.org> Date: Thu Aug 10 04:18:17 2023 +0200 dev-lang/python: Remove ppc hack and backport re fix to 3.12.0_rc1_p1 Signed-off-by: Michał Górny <mgorny@gentoo.org> All done. |