Summary: | =sys-devel/gcc-11.2.1_p20211127[lto] fails on any ARCH != amd64: ICE (lto1: internal compiler error: in read_cgraph_and_symbols, at lto/lto-common.c:2739) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | matoro <matoro_bugzilla_gentoo> |
Component: | Current packages | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | matoro_bugzilla_gentoo, sam |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 732706, 822036 | ||
Attachments: |
build logs
build.log.gz build logs from arm64 build.log.gz from arm64 0001-11.3.0-fix-CET-patch.patch 26_all_enable-cet.patch (fixed) |
Description
matoro
2021-12-05 04:22:41 UTC
Created attachment 757445 [details]
build logs
(Note that it does build fine with PGO + LTO on arm* for me). I'd also be interested in a few other things: - emerge --info gcc - the build.log of when it fails (I can't _see_ it in the tarball, but it's late, and I can see the eclass tries to include it, so if it's in there, forgive me!) - whether it happens with more vanilla CFLAGS (although most problematic things should be stripped out already) - ideally a log from another machine which crashes (e.g. maybe an arm64 one to compare with the ppc64 one). I've been running with this version of GCC PGO + LTO'd without any issues on arm and arm64 for quite some time, but trying it on sparc now. Created attachment 757446 [details]
build.log.gz
Created attachment 757447 [details]
build logs from arm64
Created attachment 757448 [details]
build.log.gz from arm64
Sorry about that - here's the build.log and also the same from the arm64 box. Yes those are from https://github.com/InBetweenNames/gentooLTO.git but they should be stripped out because I am not using USE=custom-cflags. Just to test I tried disabling them, using this minimal set: CFLAGS="-O3 -mcpu=native -mtune=native -pipe" but got the same result. I have also tried enabling/disabling PGO to no effect - it works fine if LTO is disabled, does not work if LTO is enabled. ================================================================= Package Settings ================================================================= sys-devel/gcc-11.2.0::gentoo was built with the following: USE="(cxx) graphite lto nls nptl openmp pch pgo (pie) sanitize ssp (-ada) -custom-cflags -d -debug -doc (-fixed-point) -fortran -go (-hardened) -jit (-libssp) (-multilib) -objc -objc++ -objc-gc -systemtap -test -valgrind -vanilla (-vtv) -zstd" CFLAGS="-mcpu=native -mtune=native -pipe -Wl,-O1 -Wl,--as-needed -O2" CXXFLAGS="-mcpu=native -mtune=native -pipe -Wl,-O1 -Wl,--as-needed -O2" FEATURES="pid-sandbox sandbox assume-digests parallel-fetch splitdebug preserve-libs binpkg-docompress unmerge-orphans distlocks network-sandbox multilib-strict buildpkg usersandbox ipc-sandbox parallel-install binpkg-logs config-protect-if-modified userpriv strict xattr compressdebug ebuild-locks news userfetch merge-sync unmerge-logs protect-owned unknown-features-warn qa-unresolved-soname-deps usersync compress-build-logs fixlafiles binpkg-dostrip sfperms" LDFLAGS="-Wl,-O1 -Wl,--as-needed -mcpu=native -mtune=native -pipe" Note, I have just received my new sparc host and have confirmed that this bug also affects sparc (running under 64-bit userland), so that's at least 3 arches that cannot build this gcc patchset with LTO. (In reply to matoro from comment #9) > Note, I have just received my new sparc host and have confirmed that this > bug also affects sparc (running under 64-bit userland), so that's at least 3 > arches that cannot build this gcc patchset with LTO. It definitely can for me, but of course, I don't deny it's happening for you. Will note that your last *FLAGS don't have -O2. Can you report this upstream? I'm struggling to get any debugging information out of this, sadly. (In reply to Sam James from comment #10) > (In reply to matoro from comment #9) > > Note, I have just received my new sparc host and have confirmed that this > > bug also affects sparc (running under 64-bit userland), so that's at least 3 > > arches that cannot build this gcc patchset with LTO. > > It definitely can for me, but of course, I don't deny it's happening for you. > > Will note that your last *FLAGS don't have -O2. > > Can you report this upstream? I'm struggling to get any debugging > information out of this, sadly. So, out of respect for upstream I figured I should submit details with USE=vanilla, but when I tested just now it surprisingly worked with this flag set (tested on ppc64, with LTO on). If I'm understanding what this means, doesn't that basically confirm that a Gentoo-specific patch is the cause? Given that it also does not appear on ~sys-devel/gcc-11.2.0, what are the new patches added that I can attempt to narrow it down to? (In reply to matoro from comment #11) > (In reply to Sam James from comment #10) > > (In reply to matoro from comment #9) > > > Note, I have just received my new sparc host and have confirmed that this > > > bug also affects sparc (running under 64-bit userland), so that's at least 3 > > > arches that cannot build this gcc patchset with LTO. > > > > It definitely can for me, but of course, I don't deny it's happening for you. > > > > Will note that your last *FLAGS don't have -O2. > > > > Can you report this upstream? I'm struggling to get any debugging > > information out of this, sadly. > > So, out of respect for upstream I figured I should submit details with > USE=vanilla, but when I tested just now it surprisingly worked with this > flag set (tested on ppc64, with LTO on). If I'm understanding what this > means, doesn't that basically confirm that a Gentoo-specific patch is the > cause? Given that it also does not appear on ~sys-devel/gcc-11.2.0, what > are the new patches added that I can attempt to narrow it down to? Here's the list of patches: https://gitweb.gentoo.org/proj/gcc-patches.git/tree/11.3.0/gentoo. We generally don't patch GCC (or glibc) in any interesting way to avoid problems like this. That's why I didn't even think to suggest it! The only new patches were CET related which shouldn't affect non-amd64. We can compare the dir with https://gitweb.gentoo.org/proj/gcc-patches.git/tree/11.2.0/gentoo. But this sounds really odd! (In reply to Sam James from comment #12) > (In reply to matoro from comment #11) > > (In reply to Sam James from comment #10) > > > (In reply to matoro from comment #9) > > > > Note, I have just received my new sparc host and have confirmed that this > > > > bug also affects sparc (running under 64-bit userland), so that's at least 3 > > > > arches that cannot build this gcc patchset with LTO. > > > > > > It definitely can for me, but of course, I don't deny it's happening for you. > > > > > > Will note that your last *FLAGS don't have -O2. > > > > > > Can you report this upstream? I'm struggling to get any debugging > > > information out of this, sadly. > > > > So, out of respect for upstream I figured I should submit details with > > USE=vanilla, but when I tested just now it surprisingly worked with this > > flag set (tested on ppc64, with LTO on). If I'm understanding what this > > means, doesn't that basically confirm that a Gentoo-specific patch is the > > cause? Given that it also does not appear on ~sys-devel/gcc-11.2.0, what > > are the new patches added that I can attempt to narrow it down to? > > Here's the list of patches: > https://gitweb.gentoo.org/proj/gcc-patches.git/tree/11.3.0/gentoo. > > We generally don't patch GCC (or glibc) in any interesting way to avoid > problems like this. That's why I didn't even think to suggest it! > > The only new patches were CET related which shouldn't affect non-amd64. > > We can compare the dir with > https://gitweb.gentoo.org/proj/gcc-patches.git/tree/11.2.0/gentoo. But this > sounds really odd! So what I've done is created a reverse patch using interdiff for each of the patches in question and dumped them in my user-patches folder. Using that, I got a successful compile. Now, I will remove each reverse-patch one at a time after each build until I can identify which one is causing the problem. Will update with findings... (In reply to matoro from comment #13) > So what I've done is created a reverse patch using interdiff for each of the > patches in question and dumped them in my user-patches folder. Using that, > I got a successful compile. Now, I will remove each reverse-patch one at a > time after each build until I can identify which one is causing the problem. > Will update with findings... Thank you for your work here so far & sorry I've not been able to be that helpful so far. If you have any questions which need to be answered a bit quicker which don't fit in the bug, feel free to come to #gentoo-toolchain on libera.chat (IRC). (In reply to Sam James from comment #14) > (In reply to matoro from comment #13) > > So what I've done is created a reverse patch using interdiff for each of the > > patches in question and dumped them in my user-patches folder. Using that, > > I got a successful compile. Now, I will remove each reverse-patch one at a > > time after each build until I can identify which one is causing the problem. > > Will update with findings... > > Thank you for your work here so far & sorry I've not been able to be that > helpful so far. > > If you have any questions which need to be answered a bit quicker which > don't fit in the bug, feel free to come to #gentoo-toolchain on libera.chat > (IRC). Something else I'm interested in (because none of the patches are that interesting other than say, the CET one, which I'd really hope isn't related here (and shouldn't be)): this is happening for you on every one of your machines, and you've just setup a new one. Can you walk me through _anything_ interesting about this? The fact it's on *every one* of your !amd64 boxes is suspicious given I've not managed to hit it. My gut is that there's some additional environmental factor, possibly something you do naturally so you've forgotten. (I'd note again that not all of these flags are safe globally and are generally likely to trigger compiler bugs at least on random arches). Was binutils and the rest of it also built with vanilla *FLAGS when testing? Does it happen if you fire up a chroot on one of these machines from a vanilla stage3? How much can you vary a chroot before it starts to fail like the host does? (In reply to Sam James from comment #15) > > Was binutils and the rest of it also built with vanilla *FLAGS when testing? > (also, please verify at least @system, including stuff like gmp, mpfr, ... is built with normal flags too when testing) I've reproduced on ppc64le system ^[[01m^[[Klto1:^[[m^[[K ^[[01;31m^[[Kinternal compiler error: ^[[m^[[Kin read_cgraph_and_symbols, at lto/lto-common.c:2739 Please submit a full bug report, with preprocessed source if appropriate. See <https://bugs.gentoo.org/> for instructions. lto-wrapper: fatal error: /var/tmp/portage/sys-devel/gcc-11.2.1_p20211127/work/build/./prev-gcc/xgcc returned 1 exit status compilation terminated. /usr/powerpc64le-unknown-linux-gnu/bin/ld: error: lto-wrapper failed ok it's not LTO to blame, it's graphite/isl without USE=graphite xgcc works for me. so probably another weird graphite ricerbug =) actually NVM, build still failed later. not graphite. ok I pinpointed it down to -flto=jobserver trying jobserver is what triggers an ice in read_cgraph_and_symbols on this line gcc_assert (num_objects == nfiles); will dump it here for history <trofi> lto/lto-common.c:2739 is an assert in mismatch on number of objects seen in .res file and amount of objects collected by linker in it's input arguments. you probably want to compare actual file list in .res file and compare it to '**fnames' of read_cgraph_and_symbols() to find the discrepancy. <+gyakovlev> trofi: hey! thanks <+gyakovlev> yeah I slowly crawling thru this. <+gyakovlev> but my gdb crashes with SIGABRT itself, does not help at all. <+gyakovlev> I collected all temps output here, just in case: <+gyakovlev> https://dpaste.com/BLMSMAM56 <trofi> my guess is that number of collected files is collected and written by plugin at https://github.com/gcc-mirror/gcc/blob/master/lto-plugin/lto-plugin.c#L557 (which read then fails to reconcile with plugin's state) <+gyakovlev> interesting thing is that -flto=<int> does not trigger it, just -flto=jobserver <+gyakovlev> so I'm guessing it's related to parallel/auto_parallel handling <trofi> empty '' parameter looks slightly suspicious <+gyakovlev> seeing that <+gyakovlev> else if (parallel > 1) <+gyakovlev> { <+gyakovlev> char buf[256]; <+gyakovlev> sprintf (buf, "-fwpa=%i", parallel); <+gyakovlev> set -fwpa <+gyakovlev> and it's 32 for me ( number of cores) <+gyakovlev> it takes that codepath, I guess. <+gyakovlev> hmm. it started failing with -flto=1 too <+gyakovlev> I need to get some sleep and jump at it with working gdb and fresh brain. <+gyakovlev> ok I compared args to working gcc and borked one. <+gyakovlev> difference is on borked gcc <+gyakovlev> instead of <+gyakovlev> -fcf-protection=none <+gyakovlev> I have "" <+gyakovlev> here's normal flags file <+gyakovlev> ==> tempswpa.args.0 <== <+gyakovlev> tempst.o <+gyakovlev> and here's borked <+gyakovlev> ==> tempswpa.args.0 <== <+gyakovlev> "" <+gyakovlev> tempst.o <+gyakovlev> so I guess gcc interprets that "" that came out of nowhere as a file and number of files does not mach actual objects <+gyakovlev> manually adding '-fcf-protection=none' to args solves the ICE... <+gyakovlev> now need to figure out why it's empty on borked gcc =) <+gyakovlev> sam_: we have winner! <+gyakovlev> 26_all_enable-cet.patch messes up with fcf-protection <+gyakovlev> so the args gets emptied somehow <+gyakovlev> this leads to this lto weirdness <+gyakovlev> note it breaks even if USE=cet is not in use My testing has also confirmed that 26_all_enable-cet.patch is the cause. I suppose that also explains why it appears only on non-amd64. For now I have deployed the reverse patch to my hosts. Will this patch just be reverted entirely? yeah it will be fixed or reverted ofc, package will be bumped when it happens. working on finding best way to do it. Created attachment 760602 [details, diff]
0001-11.3.0-fix-CET-patch.patch
Attached a patch-to-the-patch.
Created attachment 760603 [details, diff]
26_all_enable-cet.patch (fixed)
matoro, could you test this for me please? Revert the old patch first / drop it.
(In reply to Sam James from comment #25) > Created attachment 760603 [details, diff] [details, diff] > 26_all_enable-cet.patch (fixed) > > matoro, could you test this for me please? Revert the old patch first / drop > it. Confirmed working on ppc64, thanks for the effort! The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/gcc-patches.git/commit/?id=2b36f3ad2ba0114eae1d32bae5e395e098b3714b commit 2b36f3ad2ba0114eae1d32bae5e395e098b3714b Author: Sam James <sam@gentoo.org> AuthorDate: 2021-12-28 03:44:47 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2021-12-28 03:55:44 +0000 11.3.0: fix CET patch Our patch was causing unhandled state to leak into the LTO metadata writer, it shouldn't have got that far though. Instead of messing about with GCC's option handling, use the macro they provide for purposes like this, which makes things far simpler (and less fragile). Bug: https://bugs.gentoo.org/828400 Bug: https://bugs.gentoo.org/822036 Thanks-to: Sergei Trofimovich <slyich@gmail.com> (debugging help in #gentoo-toolchain) Thanks-to: Georgy Yakovlev <gyakovlev@gentoo.org> (debugging) Reported-by: matoro <matoro@airmail.cc> Signed-off-by: Sam James <sam@gentoo.org> 11.3.0/gentoo/26_all_enable-cet.patch | 65 +++++------------------------------ 1 file changed, 9 insertions(+), 56 deletions(-) (In reply to matoro from comment #26) > (In reply to Sam James from comment #25) > > Created attachment 760603 [details, diff] [details, diff] [details, diff] > > 26_all_enable-cet.patch (fixed) > > > > matoro, could you test this for me please? Revert the old patch first / drop > > it. > > Confirmed working on ppc64, thanks for the effort! Thank you! I'm going to see if soap wants to take a newer snapshot (there's not that many changes) or not before deciding if we should just revbump the current version. The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=b96fd11e3e5626181c32c38381f814aba21fb9f0 commit b96fd11e3e5626181c32c38381f814aba21fb9f0 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-01-18 13:16:42 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-01-18 13:19:30 +0000 sys-devel/gcc: add 11.2.1_p20220115 Fairly minor changes upstream since the last snapshot of the 11 stable branch. Includes more CET fixes and the upstream cross-compile patch. Also, the PCH ICE fix, although we've since masked PCH globally due to its instability. Bug: https://bugs.gentoo.org/822036 Closes: https://bugs.gentoo.org/803371 Closes: https://bugs.gentoo.org/828400 Closes: https://bugs.gentoo.org/822690 Signed-off-by: Sam James <sam@gentoo.org> profiles/base/package.use.mask | 2 +- sys-devel/gcc/Manifest | 3 +++ sys-devel/gcc/gcc-11.2.1_p20220115.ebuild | 26 ++++++++++++++++++++++++++ 3 files changed, 30 insertions(+), 1 deletion(-) |