Suspected gcc bug. With gcc-13.2.1_p20240113-r1 I've seen systemd and python fail to build with an error message like: > 1 {standard input}: Assembler messages: > 1285 {standard input}:3629: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'
Created attachment 883486 [details] emerge --info
Created attachment 883487 [details] build.log from systemd
Can you run that 'FAILED:' command for compiling timesyncd-manager.c manually , but with -save-temps? Then upload timesyncd-manager.i and timesyncd-manager.s. Can you also then try `as timesyncd-manager.s` and see if that fails? For bonus points, can try cvise timesyncd-manager.i, use gcc as a sanity check in the script, and have it run as and grep for the error you're getting.
I can't reproduce it yet :(
Created attachment 883506 [details] timesyncd-manager.c.i
Created attachment 883507 [details] timesyncd-manager.c.s
`as timesyncd-manager.c.s` reproduces the issue. grep shows there are other immediate values used in vmov.f64 instructions, but the #6.:e+0 is the only one that looks like that has a : > $ grep 'vmov.f64.*#' timesyncd-manager.c.s > vmov.f64 d4, #5.0e-1 > vmov.f64 d6, #5.0e-1 > vmov.f64 d0, #6.:e+0 > vmov.f64 d5, #3.0e+0
I can reproduce it with `as ...` but I can't yet reproduce it from your preprocessed source.
Created attachment 883508 [details] timesyncd-manager.c.s from 13.2.1_p20230826 Running the same command with gcc-13.2.1_p20230826 works -- and with -save-temps it produces the attached file. The diff between the good and bad .s files is > --- timesyncd-manager.c.s.bad 2024-01-28 18:28:28.516049319 -0500 > +++ timesyncd-manager.c.s.good 2024-01-28 18:36:08.637596567 -0500 > @@ -3626,7 +3626,7 @@ > vsub.f64 d7, d7, d5 > vmla.f64 d6, d7, d7 > bne .L406 > - vmov.f64 d0, #6.:e+0 > + vmov.f64 d0, #7.0e+0 > str r2, [fp, #-628] > str r3, [fp, #-636] > str r1, [fp, #-624] > @@ -5538,5 +5538,5 @@ > .size __func__.28, 24 > __func__.28: > .ascii "manager_set_server_name\000" > - .ident "GCC: (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113" > + .ident "GCC: (Gentoo 13.2.1_p20230826 p7) 13.2.1 20230826" > .section .note.GNU-stack,"",%progbits
I wasn't able to reproduce it on jiji in my arm chroot either, even with the same CFLAGS. Unsure why.
Can you try cvise? It works over SSH too but then you can't have any parallelism (-> -n 1). Something like this adapted script from https://wiki.gentoo.org/wiki/GCC_ICE_reporting_guide#.5Bbonus.5D_minimize_self-contained_source_using_cvise: ``` #!/usr/bin/env bash gcc -c foo.i || exit 1 # make sure it's not totally junk C gcc -S foo.i || exit 1 as foo.i || exit 1 # make sure it assembles fine w/o the -march & -mtune gcc -O2 -march=armv7-a -mtune=marvell-pj4 foo.i >gcc_out.txt 2>&1 grep "garbage following instruction" gcc_out.txt >/dev/null 2>&1 ```
Will do.
Created attachment 883511 [details] cvise'd timesyncd-manager.c.i Reproduces the issue. Bafflingly, there's a syntax error in the file...? > $ ssh cubox gcc -O2 -march=armv7-a -mtune=marvell-pj4 -x c -c - < foo.i > <stdin>:5:1: warning: no semicolon at end of struct or union > <stdin>:7:1: warning: return type defaults to ‘int’ [-Wimplicit-int] > /tmp/ccr2Q3q9.s: Assembler messages: > /tmp/ccr2Q3q9.s:37: Error: garbage following instruction -- `vmov.f64 d7,#6.:e+0' > struct Manager { > int samples[8]; > double samples_jitter > } manager_send_request() { > } Adding a semi-colon after `samples_jitter` fixes the "garbage following instruction", so I think gcc must have an out of bounds access or something.
Can you try run it under Valgrind? I'm reluctant to report it upstream myself yet (normally I would've by now) because I can't really give much actionable information to them yet.
(In reply to Sam James from comment #14) > Can you try run it under Valgrind? Just tried and valgrind-3.21.0-r1 fails with > disInstr(arm): unhandled instruction: 0xECECA102 > cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2) > ... > ==32164== Process terminating with default action of signal 4 (SIGILL): dumping core > ==32164== Illegal opcode at address 0x401D620 > ==32164== at 0x401D620: __sigsetjmp (in /usr/lib/ld-linux-armhf.so.3) > ==32164== by 0x4000F4F: _dl_catch_exception (in /usr/lib/ld-linux-armhf.so.3) I see some hits on Google. Looks like this is an IWMMXT instruction in glibc. (This CPU has IWMMXT) Going to try with the latest unstable. Alternatively, I wonder if there's an environment variable I can use to disable IWMMXT support at runtime in glibc.
Same result with valgrind-3.22.0-r2.
Doesn't look like there's a way to disable IWMMXT support at runtime, and the particular code is saving/restoring IWMMXT state -- not a fast path that could be disabled. Going to rebuild glibc with -DARM_ASSUME_NO_IWMMXT and see if that lets things work.
After rebuilding glibc with `#define ARM_ASSUME_NO_IWMMXT 1` stuffed into sysdeps/arm/arm-features.h I can run valgrind but it doesn't find anything wrong /o\
I was thinking about > - vmov.f64 d0, #6.:e+0 > + vmov.f64 d0, #7.0e+0 and why and how a colon could appear in a floating-point literal. `:` is ASCII 58. `0` is ASCII 48. Lots of integer-digit to character conversions do `i + '0'` to convert an integer 0 to 9 to a character '0' to '9'. If the integer was 10 (how? dunno yet), then 10 + '0' would produce ':'. And that kind of makes sense in that the double value *should* be 7.0. That is, 7 and 0 tenths. But if the ':' indicates a 10, then the broken value is sort of... 6 and 10 tenths. So printf is doing something bad? I went to write a program that just prints the 10 double values immediately below 7.0... and it failed to compile with the same error I've been chasing down! > /tmp/ccvMdDbH.s: Assembler messages: > /tmp/ccvMdDbH.s:24: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0' Turns out, I can reproduce the problem with just: double wtf(void) { return 7.0; }
I'm feeling like something's going wrong in gcc's real_to_decimal_for_mode().
(In reply to Matt Turner from comment #20) > I'm feeling like something's going wrong in gcc's real_to_decimal_for_mode(). Is your mpfr (or maybe gmp) miscompiled, I wonder?
I tried compiling dev-libs/{mpc,gmp,mpfr} with -O0. Still fails. I tried compiling dev-libs/gmp with USE="-asm -cpudetection". Still fails. I'm going to rebuild gcc-13.2.1_p20230826 and see if it works. If a clean build of it works, then I can be reasonably sure it's something with gcc. If a clean build of it fails, I can be reasonably sure that it's something outside of gcc.
Works with 13.2.1_p20230826. To summarize: - gcc-13.2.1_p20230826 is good - gcc-13.2.1_p20240113-r1 is bad - gcc-13.2.1_p20240210 is bad I have not tried gcc-13.2.1_p20231216 yet. I'll do that next.
(In reply to Matt Turner from comment #23) > Works with 13.2.1_p20230826. > > To summarize: > > - gcc-13.2.1_p20230826 is good > - gcc-13.2.1_p20240113-r1 is bad > - gcc-13.2.1_p20240210 is bad > > I have not tried gcc-13.2.1_p20231216 yet. I'll do that next. - gcc-13.2.1_p20231216 is bad.
If I understand correctly, these two snapshots are https://gcc.gnu.org/pub/gcc/snapshots/13-20230826/ https://gcc.gnu.org/pub/gcc/snapshots/13-20231216/ and those pages helpfully list the SHA1s they're generated from: 13-20230826 - c17326381802cc8b999ba07f1c9d8559b1371aab 13-20231216 - 3d00aa17c7b8981e75e5f8719ce1d40a0d017cee git log --pretty=oneline c17326381802cc8b999ba07f1c9d8559b1371aab..3d00aa17c7b8981e75e5f8719ce1d40a0d017cee | grep -v 'Daily bump' | wc -l ... says 296.
(In reply to Matt Turner from comment #25) > If I understand correctly, these two snapshots are > > https://gcc.gnu.org/pub/gcc/snapshots/13-20230826/ > https://gcc.gnu.org/pub/gcc/snapshots/13-20231216/ > > and those pages helpfully list the SHA1s they're generated from: > https://github.com/thesamesam/sam-gentoo-scripts/tree/main/gcc > 13-20230826 - c17326381802cc8b999ba07f1c9d8559b1371aab
I've tried and tried and tried to bisect this and have not been successful. Nothing I've built with gcc-13.3.9999 has reproduced the issue. Nothing I've built with Sam's bisect-gcc script has reproduced the issue, regardless of whether I've built gcc with portage or without. I've tried gcc from stage1 (--disable-bootstrap), stage2 (make bootstrap2), and stage3. I took a binary package of gcc that reproduces the issue on my ARM system (a Solid Run CuBox with a Marvell Armada 510 CPU) and installed it in my 32-bit arm chroot on jiji — the issue doesn't reproduce there. I took a binary package of gcc that I built in the 32-bit arm chroot on jiji (that does not reproduce the issue on jiji) and installed it on my ARM system and the issue reproduces. That tells me that the issue isn't in gcc itself. I don't have any further ideas for debugging this.
I updated to gcc-14.1.1_p20240518 on the CuBox and it doesn't show the bug.
Bizarre. I'm sorry I couldn't give you any better suggestions and I'm still baffled.