Summary: | sys-apps/portage-2.3.16 segfaults due to dev-python/pyblake2-1.0.0 | ||
---|---|---|---|
Product: | Portage Development | Reporter: | NHO <jy6x2b32pie9> |
Component: | Core | Assignee: | Michał Górny <mgorny> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | dev-portage, mgorny, notify, perfect007gentleman, prefix, public+gentoo, python, steffen, tomboy64, unheatedgarage |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
emerge --info
build.log (with clang settings) build.log (with gcc settings) |
On second thought, this may be due gcc-7.2, will test later. I can confirm I've been getting this problem... but only on some of my boxes. [18855.000330] traps: emerge[6060] general protection ip:7f1a52d4215f sp:7ffcd7169900 error:0 [18855.000333] in pyblake2.cpython-34m.so[7f1a52d3d000+8000] For me at least... the following *appears* to "fix" the problem. mv /usr/lib64/python3.4/site-packages/pyblake2.cpython-34m.so /var/tmp/ emerge -1a pyblake2 Apologies, minor correction... got over excited... that only fixed one of my installs. Removing pyblake2.cpython-34m.so certainly makes things work, but reinstalling brought the problem back again on one box, but not the others. Any chance you have -O3 in CFLAGS as I do? :) I have a Skylake CPU running ~amd64 system with gcc 7.2 Compiling dev-python/blake2 with -O2 solved the problem and it no longer segfaults. CFLAGS="-march=native -O2 -ftree-vectorize -pipe", Kaby Lake Rebuild does nothing to help. Rebuild with clang-5 does nothing to help. So, at least it's not gcc-7.2 It appears that -ftree-vectorize is to blame. Without it, pyblake2 work I'm on portage 2.3.15 and I have the same issue. I can't upgrade to 2.3.16 or downgrade to 2.3.14:
# emerge -1 portage
>>> Emerging (1 of 1) sys-apps/portage-2.3.16::gentoo
Segmentation fault
$ dmesg
[ 5964.956736] traps: emerge[15270] general protection ip:7f356f93de4c sp:7ffd2c8d46c8 error:0
[ 5964.956740] in pyblake2.cpython-34m.so[7f356f939000+8000]
Yep, rebuilding dev-python/pyblake2 without -ftree-vectorize fixes it. I fixed it by building dev-python/pyblake2 with -O2 instead of -O3 I'd appreciate some help debugging it. For a start, does the following segv for you: $ git clone https://github.com/dchest/pyblake2 $ cd pyblake2 $ python -m venv _venv $ . _venv/bin/activate $ export CFLAGS='${your-cflags-that-break-stuff}' $ pip install -U . $ python test/test.py Because I feel like it should segv for me, and it doesn't. So that may be also -march=-related. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5dddb56946409beae23b9cfa513f81bcb77c531c commit 5dddb56946409beae23b9cfa513f81bcb77c531c Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2017-11-22 13:21:05 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2017-11-22 13:21:45 +0000 dev-python/pyblake2: Try to disable -ftree-vectorize to avoid segv Bug: https://bugs.gentoo.org/638428 dev-python/pyblake2/pyblake2-1.0.0.ebuild | 5 +++++ 1 file changed, 5 insertions(+)} Since I can't reproduce it, please let me know if that commit solved it with CFLAGS that were failing previously. (In reply to Michał Górny from comment #12) > Since I can't reproduce it, please let me know if that commit solved it with > CFLAGS that were failing previously. It appears that your commit is a solution. At least, it fixes problem for me. Now, I made an issue upstream. https://github.com/dchest/pyblake2/issues/13 (In reply to Michał Górny from comment #12) > Since I can't reproduce it, please let me know if that commit solved it with > CFLAGS that were failing previously. Upstream pushed equivalent commit and bumped version, so update to 1.0.1 would make your patch redundant. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4d5f5debfda38d14b37b25a1758be56a5e28f8b4 commit 4d5f5debfda38d14b37b25a1758be56a5e28f8b4 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2017-11-22 16:42:28 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2017-11-22 16:48:56 +0000 dev-python/pyblake2: Bump to 1.0.1, with upstream -fno-tree-vectorize Bug: https://bugs.gentoo.org/638428 dev-python/pyblake2/Manifest | 2 +- dev-python/pyblake2/{pyblake2-1.0.0.ebuild => pyblake2-1.0.1.ebuild} | 5 ----- 2 files changed, 1 insertion(+), 6 deletions(-)} For temporary workaround, set PORTAGE_CHECKSUM_FILTER="* -blake2b" in /etc/portage/make.conf. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1a8386e80340387722af8bb40ce3c412cff0259c commit 1a8386e80340387722af8bb40ce3c412cff0259c Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2017-11-22 17:24:42 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2017-11-22 17:29:15 +0000 dev-python/pyblake2: Backport -fno-tree-vectorize to stable version Bug: https://bugs.gentoo.org/638428 .../pyblake2/{pyblake2-0.9.3.ebuild => pyblake2-0.9.3-r1.ebuild} | 5 +++++ 1 file changed, 5 insertions(+)} (In reply to Zac Medico from comment #16) > For temporary workaround, set PORTAGE_CHECKSUM_FILTER="* -blake2b" in > /etc/portage/make.conf. running beow made portage get back on track for me. PORTAGE_CHECKSUM_FILTER="* -blake2b" emerge -1 pyblake2 Thanks getting same segfaults (with clang-5) and pyblake2-1.0.1 Created attachment 506258 [details] build.log (with clang settings) As suggested on https://wiki.gentoo.org/wiki/Clang I am using separate config files to differentiate between GCC (7.2.0) and Clang (5.0.0). Currently the segfaults occur every time a BLAKE2B hash gets computed, if pyblake2 was compiled with Clang. With GCC it works. In particular, when fetching a source-file, portage simply reports * Fetch failed for 'app-admin/rsyslog-8.29.0' instead of exposing the segfault. As my current setup might be considered volatile, here are the values used for Clang and GCC: # Clang configuration as shown in the attached build.log CC="clang" CXX="clang++" CFLAGS="-flto=thin -march=broadwell -O2 -pipe" CXXFLAGS="-flto=thin ${CFLAGS}" LDFLAGS="-Wl,-O2 -Wl,--as-needed" AR="llvm-ar" NM="llvm-nm" RANLIB="llvm-ranlib" Created attachment 506260 [details]
build.log (with gcc settings)
# GCC configuration as shown in the attached build.log
CC="gcc"
CXX="g++"
CFLAGS="-flto=thin -mabm -frecord-gcc-switches ${CFLAGS}"
CXXFLAGS="-flto=thin -mabm -frecord-gcc-switches ${CXXFLAGS}"
FFLAGS="${FFLAGS} -frecord-gcc-switches"
FCFLAGS="${FCFLAGS} -frecord-gcc-switches"
AR="ar"
NM="nm"
RANLIB="ranlib"
try using -march=ivybridge or dropping to -O1 with clang for pyblake2, your issue sounds like https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fc0ba0fc7c93d56aeb421fbb8154fd57e7695623 Ok, I have an additional request to the people who can reproduce this. Could you please try building python:3.6 (with 'breaking' CFLAGS), unmerging pyblake2 and testing Portage via python3.6? It has a built-in BLAKE2 implementation, and I'd like to check if it's affected. (In reply to Michał Górny from comment #23) > Ok, I have an additional request to the people who can reproduce this. Could > you please try building python:3.6 (with 'breaking' CFLAGS), unmerging > pyblake2 and testing Portage via python3.6? It has a built-in BLAKE2 > implementation, and I'd like to check if it's affected. It does not appear affected: ~ # cd /usr/portage/sys-apps/portage portage # PYTHON_TARGETS="python3_6" ebuild portage-2.3.16.ebuild unpack * portage-2.3.16.tar.bz2 BLAKE2B SHA512 size ;-) ... [ ok ] * checking ebuild checksums ;-) ... [ ok ] * checking auxfile checksums ;-) ... [ ok ] * checking miscfile checksums ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking portage-2.3.16.tar.bz2 to /var/tmp/portage/sys-apps/portage-2.3.16/work >>> Source unpacked in /var/tmp/portage/sys-apps/portage-2.3.16/work portage # emerge -aC pyblake2 * This action can remove important packages! In order to be safer, use * `emerge -pv --depclean <atom>` to check for reverse dependencies before * removing packages. >>> These are the packages that would be unmerged: --- Couldn't find 'pyblake2' to unmerge. >>> No packages selected for removal by unmerge portage # PYTHON_TARGETS="python3_6" ebuild portage-2.3.16.ebuild compile ... >>> Source compiled. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=95bff9cb86a241fcbb6702350b8472aca78ac3f0 commit 95bff9cb86a241fcbb6702350b8472aca78ac3f0 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2017-11-24 17:49:43 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2017-11-24 17:53:57 +0000 dev-python/pyblake2: Bump to 1.1.0 (with impl from CPython) Update the package to the new release that features implementation copied from CPython git. This will hopefully solve all the optimization problems reported. Bug: https://bugs.gentoo.org/show_bug.cgi?id=638428 dev-python/pyblake2/Manifest | 1 + dev-python/pyblake2/pyblake2-1.1.0.ebuild | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+)} Please test 1.1.0 now, in all failing environments. Hopefully reusing the code from CPython solves all the issues. v1.1.0 passes test as shown gentoo bug, for me on my machine. Thank you. Now, three to five other people need to test it. Waiting. dev-python/pyblake2-1.1.0 fixes this issue for me. Nice to run into a problem and immediately discover it's been resolved an hour ago. Thank you everyone for your work on this! * Upgraded pyblake2-1.0.0 -> pyblake2-1.1.0 (all tests pass) * Now able to upgrade portage-2.3.14 -> portage-2.3.16 (all tests pass) FWIW, I use gcc-6.4, and have "-O2 -ftree-vectorize -fno-tree-loop-vectorize" in my CFLAGS, so -ftree-slp-vectorize is probably the trigger on gcc. Think I'll just avoid the whole -ftree-vectorize family entirely from now on. Doesn't do much without PGO anyway. Thanks again! Thanks for testing. Could someone with appropriate hardware also test with clang, please? Fabian, could you test if the OSX issue is resolved as well? @Fabian: tl;dr: this issue cannot be resolved by changing the flags as you suggested. In particular, I - changed -march=ivybridge, kept -O2 - changed -O1, kept -march=broadwell - changed -march=ivybridge -O1 All variations exhibited previous behavior. @Michał I confirmed that portage-2.3.16 is capable to emerge 2 packages (which have BLAKE2b hashes and previously failed for me) successfully, without pyblake2 installed and with FEATURES=python3_6. So it seems the built-in blake2 implementation works fine. *thumbs up* Same is true for pyblake2 with clang and my usual use-flags. Works like a charm as well. Thanks. @mgorny: the issue was resolved for me after the change of feature flags (I suppose it disabled stuff for me) hence I removed the fugly workaround code immediately afterwards. Ok, thanks. We'll reopen if anybody else hits this. *** Bug 638978 has been marked as a duplicate of this bug. *** (In reply to M. B. from comment #21) > Created attachment 506260 [details] > build.log (with gcc settings) > > # GCC configuration as shown in the attached build.log > CC="gcc" > CXX="g++" > CFLAGS="-flto=thin -mabm -frecord-gcc-switches ${CFLAGS}" > CXXFLAGS="-flto=thin -mabm -frecord-gcc-switches ${CXXFLAGS}" > FFLAGS="${FFLAGS} -frecord-gcc-switches" > FCFLAGS="${FCFLAGS} -frecord-gcc-switches" > AR="ar" > NM="nm" > RANLIB="ranlib" waat? -flto=thin with GCC. rly? when it became applicable? amd64 compiled with clang I found that deleting/renaming /usr/lib64/python3.4/site-packages/pyblake2.cpython-34m.so fixed the core dump issue. Unmerging pyblake2 also fixes it, until your next world update, at which time it will be re-installed. FYI: A gdb stack trace... Program received signal SIGSEGV, Segmentation fault. 0x00007ffff3e1a58c in blake2b_init_param () from /usr/lib64/python3.4/site-packages/pyblake2.cpython-34m.so (gdb) where #0 0x00007ffff3e1a58c in blake2b_init_param () from /usr/lib64/python3.4/site-packages/pyblake2.cpython-34m.so #1 0x00007ffff3e19bd5 in ?? () from /usr/lib64/python3.4/site-packages/pyblake2.cpython-34m.so #2 0x000000305491899a in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #3 0x000000305491b70e in ?? () from /usr/lib64/libpython3.4m.so.1.0 #4 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #5 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #6 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #7 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #8 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #9 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #10 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #11 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #12 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #13 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #14 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #15 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #16 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #17 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #18 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #19 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #20 0x000000305491b70e in ?? () from /usr/lib64/libpython3.4m.so.1.0 #21 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #22 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #23 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #24 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #25 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #26 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #27 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #28 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #29 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #30 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #31 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #32 0x000000305491b786 in ?? () from /usr/lib64/libpython3.4m.so.1.0 #33 0x00000030549184cb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0 #34 0x0000003054912ba7 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0 #35 0x0000003054911ec5 in PyEval_EvalCode () from /usr/lib64/libpython3.4m.so.1.0 #36 0x000000305493e70f in PyRun_FileExFlags () from /usr/lib64/libpython3.4m.so.1.0 #37 0x000000305493dc0e in PyRun_SimpleFileExFlags () from /usr/lib64/libpython3.4m.so.1.0 #38 0x00000030549556ac in Py_Main () from /usr/lib64/libpython3.4m.so.1.0 #39 0x0000000000400a86 in main () (gdb) For a start, please explicitly specify pyblake2 version. (In reply to Michał Górny from comment #36) > For a start, please explicitly specify pyblake2 version. was pyblake2-0.9.3-r1 but I keyworded to ~amd64 and that brings it up to 1.1.0. 1.1.0 did not core dump. More info is available in a forum post https://forums.gentoo.org/viewtopic-t-1073600.html |
Created attachment 505600 [details] emerge --info Relevant errors: zsh: segmentation fault emerge -1 portage traps: emerge[17903] general protection ip:7f73c81d4ec6 sp:7ffcab4f2668 error:0 in pyblake2.cpython-35m-x86_64-linux-gnu.so[7f73c81d0000+8000] Portage works when .so file manually removed.