923154 – sys-apps/systemd and dev-lang/python fail to build on arm with "Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'"

Bug 923154 - sys-apps/systemd and dev-lang/python fail to build on arm with "Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'"

Summary: sys-apps/systemd and dev-lang/python fail to build on arm with "Error: garbag...

Status:	RESOLVED CANTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo Toolchain Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	915000
	Show dependency tree

Reported:	2024-01-28 16:44 UTC by Matt Turner
Modified:	2024-05-26 22:58 UTC (History)
CC List:	3 users (show)

See Also:	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109765
Package list:
Runtime testing required:	---

Attachments
emerge --info (emerge-info,5.50 KB, text/plain) 2024-01-28 16:46 UTC, Matt Turner	Details
build.log from systemd (build.log.gz,39.31 KB, application/gzip) 2024-01-28 16:47 UTC, Matt Turner	Details
timesyncd-manager.c.i (timesyncd-manager.c.i,925.66 KB, text/plain) 2024-01-28 23:30 UTC, Matt Turner	Details
timesyncd-manager.c.s (timesyncd-manager.c.s,97.46 KB, text/plain) 2024-01-28 23:30 UTC, Matt Turner	Details
timesyncd-manager.c.s from 13.2.1_p20230826 (timesyncd-manager.c.s,97.46 KB, text/plain) 2024-01-28 23:39 UTC, Matt Turner	Details
cvise'd timesyncd-manager.c.i (foo.i,525 bytes, text/plain) 2024-01-29 04:57 UTC, Matt Turner	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matt Turner gentoo-dev

2024-01-28 16:44:19 UTC

Suspected gcc bug. With gcc-13.2.1_p20240113-r1 I've seen systemd and python fail to build with an error message like:

>      1 {standard input}: Assembler messages:
>   1285 {standard input}:3629: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'

Comment 1 Matt Turner gentoo-dev

2024-01-28 16:46:22 UTC

Created attachment 883486 [details]
emerge --info

Comment 2 Matt Turner gentoo-dev

2024-01-28 16:47:02 UTC

Created attachment 883487 [details]
build.log from systemd

Comment 3 Sam James archtester

2024-01-28 16:54:01 UTC

Can you run that 'FAILED:' command for compiling timesyncd-manager.c manually , but with -save-temps? Then upload timesyncd-manager.i and timesyncd-manager.s.

Can you also then try `as timesyncd-manager.s` and see if that fails?

For bonus points, can try cvise timesyncd-manager.i, use gcc as a sanity check in the script, and have it run as and grep for the error you're getting.

Comment 4 Sam James archtester

2024-01-28 19:37:01 UTC

I can't reproduce it yet :(

Comment 5 Matt Turner gentoo-dev

2024-01-28 23:30:08 UTC

Created attachment 883506 [details]
timesyncd-manager.c.i

Comment 6 Matt Turner gentoo-dev

2024-01-28 23:30:26 UTC

Created attachment 883507 [details]
timesyncd-manager.c.s

Comment 7 Matt Turner gentoo-dev

2024-01-28 23:35:01 UTC

`as timesyncd-manager.c.s` reproduces the issue.

grep shows there are other immediate values used in vmov.f64 instructions, but the #6.:e+0 is the only one that looks like that has a :

> $ grep 'vmov.f64.*#' timesyncd-manager.c.s
> 	vmov.f64	d4, #5.0e-1
> 	vmov.f64	d6, #5.0e-1
> 	vmov.f64	d0, #6.:e+0
> 	vmov.f64	d5, #3.0e+0

Comment 8 Sam James archtester

2024-01-28 23:38:48 UTC

I can reproduce it with `as ...` but I can't yet reproduce it from your preprocessed source.

Comment 9 Matt Turner gentoo-dev

2024-01-28 23:39:02 UTC

Created attachment 883508 [details]
timesyncd-manager.c.s from 13.2.1_p20230826

Running the same command with gcc-13.2.1_p20230826 works -- and with -save-temps it produces the attached file.

The diff between the good and bad .s files is

> --- timesyncd-manager.c.s.bad	2024-01-28 18:28:28.516049319 -0500
> +++ timesyncd-manager.c.s.good	2024-01-28 18:36:08.637596567 -0500
> @@ -3626,7 +3626,7 @@
>  	vsub.f64	d7, d7, d5
>  	vmla.f64	d6, d7, d7
>  	bne	.L406
> -	vmov.f64	d0, #6.:e+0
> +	vmov.f64	d0, #7.0e+0
>  	str	r2, [fp, #-628]
>  	str	r3, [fp, #-636]
>  	str	r1, [fp, #-624]
> @@ -5538,5 +5538,5 @@
>  	.size	__func__.28, 24
>  __func__.28:
>  	.ascii	"manager_set_server_name\000"
> -	.ident	"GCC: (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113"
> +	.ident	"GCC: (Gentoo 13.2.1_p20230826 p7) 13.2.1 20230826"
>  	.section	.note.GNU-stack,"",%progbits

Comment 10 Matt Turner gentoo-dev

2024-01-28 23:39:46 UTC

I wasn't able to reproduce it on jiji in my arm chroot either, even with the same CFLAGS. Unsure why.

Comment 11 Sam James archtester

2024-01-28 23:45:39 UTC

Can you try cvise? It works over SSH too but then you can't have any parallelism (-> -n 1).

Something like this adapted script from https://wiki.gentoo.org/wiki/GCC_ICE_reporting_guide#.5Bbonus.5D_minimize_self-contained_source_using_cvise:
```
#!/usr/bin/env bash
gcc -c foo.i || exit 1 # make sure it's not totally junk C

gcc -S foo.i || exit 1
as foo.i || exit 1 # make sure it assembles fine w/o the -march & -mtune

gcc -O2 -march=armv7-a -mtune=marvell-pj4 foo.i >gcc_out.txt 2>&1
grep "garbage following instruction" gcc_out.txt >/dev/null 2>&1
```

Comment 12 Matt Turner gentoo-dev

2024-01-28 23:55:08 UTC

Will do.

Comment 13 Matt Turner gentoo-dev

2024-01-29 04:57:50 UTC

Created attachment 883511 [details]
cvise'd timesyncd-manager.c.i

Reproduces the issue.

Bafflingly, there's a syntax error in the file...?

> $ ssh cubox gcc -O2 -march=armv7-a -mtune=marvell-pj4 -x c -c - < foo.i
> <stdin>:5:1: warning: no semicolon at end of struct or union
> <stdin>:7:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
> /tmp/ccr2Q3q9.s: Assembler messages:
> /tmp/ccr2Q3q9.s:37: Error: garbage following instruction -- `vmov.f64 d7,#6.:e+0'

> struct Manager {
>   int samples[8];
>   double samples_jitter
> } manager_send_request() {
> }

Adding a semi-colon after `samples_jitter` fixes the "garbage following instruction", so I think gcc must have an out of bounds access or something.

Comment 14 Sam James archtester

2024-01-29 08:23:14 UTC

Can you try run it under Valgrind?

I'm reluctant to report it upstream myself yet (normally I would've by now) because I can't really give much actionable information to them yet.

Comment 15 Matt Turner gentoo-dev

2024-01-29 15:32:36 UTC

(In reply to Sam James from comment #14)
> Can you try run it under Valgrind?

Just tried and valgrind-3.21.0-r1 fails with

> disInstr(arm): unhandled instruction: 0xECECA102                                                                                           
>                  cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2)                                                                             
> ...
> ==32164== Process terminating with default action of signal 4 (SIGILL): dumping core                                                       
> ==32164==  Illegal opcode at address 0x401D620
> ==32164==    at 0x401D620: __sigsetjmp (in /usr/lib/ld-linux-armhf.so.3)
> ==32164==    by 0x4000F4F: _dl_catch_exception (in /usr/lib/ld-linux-armhf.so.3)

I see some hits on Google. Looks like this is an IWMMXT instruction in glibc. (This CPU has IWMMXT)

Going to try with the latest unstable. Alternatively, I wonder if there's an environment variable I can use to disable IWMMXT support at runtime in glibc.

Comment 16 Matt Turner gentoo-dev

2024-01-29 17:12:08 UTC

Same result with valgrind-3.22.0-r2.

Comment 17 Matt Turner gentoo-dev

2024-01-29 17:21:21 UTC

Doesn't look like there's a way to disable IWMMXT support at runtime, and the particular code is saving/restoring IWMMXT state -- not a fast path that could be disabled.

Going to rebuild glibc with -DARM_ASSUME_NO_IWMMXT and see if that lets things work.

Comment 18 Matt Turner gentoo-dev

2024-01-29 19:57:21 UTC

After rebuilding glibc with `#define ARM_ASSUME_NO_IWMMXT 1` stuffed into sysdeps/arm/arm-features.h I can run valgrind but it doesn't find anything wrong /o\

Comment 19 Matt Turner gentoo-dev

2024-01-29 21:51:34 UTC

I was thinking about

> -	vmov.f64	d0, #6.:e+0
> +	vmov.f64	d0, #7.0e+0

and why and how a colon could appear in a floating-point literal.

`:` is ASCII 58. `0` is ASCII 48. Lots of integer-digit to character conversions do `i + '0'` to convert an integer 0 to 9 to a character '0' to '9'.

If the integer was 10 (how? dunno yet), then 10 + '0' would produce ':'.

And that kind of makes sense in that the double value *should* be 7.0. That is, 7 and 0 tenths. But if the ':' indicates a 10, then the broken value is sort of... 6 and 10 tenths.

So printf is doing something bad? I went to write a program that just prints the 10 double values immediately below 7.0... and it failed to compile with the same error I've been chasing down!

> /tmp/ccvMdDbH.s: Assembler messages:
> /tmp/ccvMdDbH.s:24: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'

Turns out, I can reproduce the problem with just:

double wtf(void) { return 7.0; }

Comment 20 Matt Turner gentoo-dev

2024-01-29 22:51:19 UTC

I'm feeling like something's going wrong in gcc's real_to_decimal_for_mode().

Comment 21 Sam James archtester

2024-01-30 06:47:34 UTC

(In reply to Matt Turner from comment #20)
> I'm feeling like something's going wrong in gcc's real_to_decimal_for_mode().

Is your mpfr (or maybe gmp) miscompiled, I wonder?

Comment 22 Matt Turner gentoo-dev

2024-03-23 03:49:50 UTC

I tried compiling dev-libs/{mpc,gmp,mpfr} with -O0. Still fails.

I tried compiling dev-libs/gmp with USE="-asm -cpudetection". Still fails.

I'm going to rebuild gcc-13.2.1_p20230826 and see if it works. If a clean build of it works, then I can be reasonably sure it's something with gcc. If a clean build of it fails, I can be reasonably sure that it's something outside of gcc.

Comment 23 Matt Turner gentoo-dev

2024-03-24 19:16:30 UTC

Works with 13.2.1_p20230826.

To summarize:

- gcc-13.2.1_p20230826    is good
- gcc-13.2.1_p20240113-r1 is bad
- gcc-13.2.1_p20240210    is bad

I have not tried gcc-13.2.1_p20231216 yet. I'll do that next.

Comment 24 Matt Turner gentoo-dev

2024-03-26 15:51:49 UTC

(In reply to Matt Turner from comment #23)
> Works with 13.2.1_p20230826.
> 
> To summarize:
> 
> - gcc-13.2.1_p20230826    is good
> - gcc-13.2.1_p20240113-r1 is bad
> - gcc-13.2.1_p20240210    is bad
> 
> I have not tried gcc-13.2.1_p20231216 yet. I'll do that next.

- gcc-13.2.1_p20231216 is bad.

Comment 25 Matt Turner gentoo-dev

2024-03-27 03:20:56 UTC

If I understand correctly, these two snapshots are

https://gcc.gnu.org/pub/gcc/snapshots/13-20230826/
https://gcc.gnu.org/pub/gcc/snapshots/13-20231216/

and those pages helpfully list the SHA1s they're generated from:

13-20230826 - c17326381802cc8b999ba07f1c9d8559b1371aab
13-20231216 - 3d00aa17c7b8981e75e5f8719ce1d40a0d017cee

git log --pretty=oneline c17326381802cc8b999ba07f1c9d8559b1371aab..3d00aa17c7b8981e75e5f8719ce1d40a0d017cee | grep -v 'Daily bump' | wc -l

... says 296.

Comment 26 Sam James archtester

2024-03-27 03:41:10 UTC

(In reply to Matt Turner from comment #25)
> If I understand correctly, these two snapshots are
> 
> https://gcc.gnu.org/pub/gcc/snapshots/13-20230826/
> https://gcc.gnu.org/pub/gcc/snapshots/13-20231216/
> 
> and those pages helpfully list the SHA1s they're generated from:
> 

https://github.com/thesamesam/sam-gentoo-scripts/tree/main/gcc

> 13-20230826 - c17326381802cc8b999ba07f1c9d8559b1371aab

Comment 27 Matt Turner gentoo-dev

2024-04-07 03:42:46 UTC

I've tried and tried and tried to bisect this and have not been successful.

Nothing I've built with gcc-13.3.9999 has reproduced the issue. Nothing I've built with Sam's bisect-gcc script has reproduced the issue, regardless of whether I've built gcc with portage or without. I've tried gcc from stage1 (--disable-bootstrap), stage2 (make bootstrap2), and stage3.

I took a binary package of gcc that reproduces the issue on my ARM system (a Solid Run CuBox with a Marvell Armada 510 CPU) and installed it in my 32-bit arm chroot on jiji — the issue doesn't reproduce there.

I took a binary package of gcc that I built in the 32-bit arm chroot on jiji (that does not reproduce the issue on jiji) and installed it on my ARM system and the issue reproduces.

That tells me that the issue isn't in gcc itself.

I don't have any further ideas for debugging this.

Comment 28 Matt Turner gentoo-dev

2024-05-26 22:29:57 UTC

I updated to gcc-14.1.1_p20240518 on the CuBox and it doesn't show the bug.

Comment 29 Sam James archtester

2024-05-26 22:58:43 UTC

Bizarre. I'm sorry I couldn't give you any better suggestions and I'm still baffled.