895866 – app-emulation/qemu-9999: qemu-x86_64 segfaults on ppc64le (4K page size) when downloading go dependencies: unexpected fault address 0x0

Bug 895866 - app-emulation/qemu-9999: qemu-x86_64 segfaults on ppc64le (4K page size) when downloading go dependencies: unexpected fault address 0x0

Summary: app-emulation/qemu-9999: qemu-x86_64 segfaults on ppc64le (4K page size) when...

Status:	UNCONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	PPC64 Linux

Importance:	Normal normal
Assignee:	Virtualization Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2023-02-22 11:03 UTC by darkbasic
Modified:	2025-03-02 01:00 UTC (History)
CC List:	2 users (show)

See Also:	https://gitlab.com/qemu-project/qemu/-/issues/1494
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description darkbasic 2023-02-22 11:03:01 UTC

qemu-x86_64 segfaults when trying to compile yay inside an Arch Linux x86_64 chroot from a Gentoo Linux ppc64le (4K page size) host. Hardware is a Raptor CS Talos 2 Power 9.
It works with qemu-7.2 so this is a regression in git master.

Upstream issue: https://gitlab.com/qemu-project/qemu/-/issues/1494

Unfortunately it's not 100% reproducible and I already failed to correctly bisect it twice.

This is my latest bisecting setup: https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1288167270

Since my last attempt I plan to increase the repetitions from 10 to at least 30. I also want to remove the mount/umount stuff out of the script (when it fails it doesn't umount cleanly and there is no reason to do it every time because I can keep it mounted all the time instead).
I also want to killall -9 qemu-x86_64 each time (it keeps running when it fails), but I'm not sure how to do so without potentially triggering an unsuccessful exit status if the process does not exist (which would mark the commit as bad automatically).
How can I stop the error code from propagating? Something like:
killall -9 qemu-x86_64 | enforce-it-to-always-return-success

Last but not the least I want to check again the known good commit (which is the 7.2 git tag) using the live ebuild instead of 7.2 one: chances are slim but there might be something different in the live ebuild which makes it fail compared to the 7.2 ebuild. I already did it but only with 10 repetitions, which apparently is still too low to accurately reproduce the issue.

I'm open to other ideas as well, I would really hate to see this bug in the next stable version (qemu user is already a regression hell).

Comment 1 Sam James archtester

2023-02-22 11:05:38 UTC

Just some rough thoughts to start:
- https://wiki.gentoo.org/wiki/Bisecting_with_live_ebuilds is how I normally do bisecting with live ebuilds (your setup seems similar)
- For a bug that we're in the middle of, we're doing this:
```
[...]
for x in $(seq 1 10); do
        # We deliberately exit *0* if it fails because we're bisecting for the bad commit, not good
        timeout 1m bin/llvm-tblgen \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/TargetPowerPC \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/include/ \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/ \
                /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/PPC.td \
                -o /dev/null \
                --gen-dag-isel \
                -d /dev/null \
                --time-phases \
                --write-if-changed
        timeout_result=$?

        case ${timeout_result} in
                124)
                        # Timed out
                        exit 1
                        ;;
                0)
                        ;;
                *)
                        # Something else happened, skip
                        exit 125
                        ;;
        esac
done

exit 0
```

You might want to adjust it so that a bad exit code is *also* a failure, but you get the idea.

Given you're automating the bisect, I'd say it's fine to give every run/commit several attempts to ensure it's definitely bad.

Comment 2 Sam James archtester

2023-02-22 11:07:55 UTC

(In reply to darkbasic from comment #0)
> How can I stop the error code from propagating? Something like:
> killall -9 qemu-x86_64 | enforce-it-to-always-return-success
> 

Try:
# Ignore the exit status of qemu-x86_64
killall -9 qemu-x86_64 || true

Comment 3 Sam James archtester

2023-02-22 11:15:37 UTC

(In reply to Sam James from comment #1)
> Just some rough thoughts to start:
> - https://wiki.gentoo.org/wiki/Bisecting_with_live_ebuilds is how I normally
> do bisecting with live ebuilds (your setup seems similar)
> - For a bug that we're in the middle of, we're doing this:
> ```
> [...]
> for x in $(seq 1 10); do
>         # We deliberately exit *0* if it fails because we're bisecting for
> the bad commit, not good
>         timeout 1m bin/llvm-tblgen \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/TargetPowerPC \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/include/ \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/ \
>                
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/PPC.
> td \
>                 -o /dev/null \
>                 --gen-dag-isel \
>                 -d /dev/null \
>                 --time-phases \
>                 --write-if-changed
>         timeout_result=$?
> 
>         case ${timeout_result} in
>                 124)
>                         # Timed out
>                         exit 1
>                         ;;
>                 0)
>                         ;;
>                 *)
>                         # Something else happened, skip
>                         exit 125
>                         ;;
>         esac
> done
> 
> exit 0
> ```
> 
> You might want to adjust it so that a bad exit code is *also* a failure, but
> you get the idea.
> 
i.e. change 'exit 125' to 'exit 1' too.

Comment 4 darkbasic 2023-02-22 12:07:14 UTC

> i.e. change 'exit 125' to 'exit 1' too.

Now that I think about it might be wise to return 125 if the ebuild compilation fails for whatever reason, so to skip the commit while not marking it bad.

Comment 5 darkbasic 2023-02-23 11:23:03 UTC

Finally bisected it: https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1289797177

It took over 16 hours this time, thanks for your help!

Comment 6 Sam James archtester

2023-02-23 11:35:30 UTC

(In reply to darkbasic from comment #5)
> Finally bisected it:
> https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1289797177
> 
> It took over 16 hours this time, thanks for your help!

No problem! What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.

Comment 7 darkbasic 2023-02-23 12:35:54 UTC

> What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.

I've already gave it a quick try, but it's not trivial to revert that commit on top of git master and would require some digging into subsequent changes.

Comment 8 Sam James archtester

2023-02-23 13:51:28 UTC

(In reply to darkbasic from comment #7)
> > What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.
> 
> I've already gave it a quick try, but it's not trivial to revert that commit
> on top of git master and would require some digging into subsequent changes.

ack :)

[+] Comment 9 Anthony Brown 2023-08-13 21:37:25 UTC Comment hidden (obsolete)

https://en.wikipedia.org/wiki/Dell_Dimension#/media/File:Dell_Dimension_433SV_486.jpghttps://en.wikipedia.org/wiki/Dell_Dimension#/media/File:Dell_Dimension_433SV_486.jpghttps://en.wikipedia.org/wiki/Dell_Dimension#/media/File:Dell_Dimension_433SV_486.jpg

I am using a Macbook Pro Here and remember something
about whether powerpc is dependant on static-libraries or not:

I See #Freescale lets look into their board here:
Uses ARM: ok well anyway : Using Static Libraries:

I see on gentoo it starts to bring in kernel headers :
->: so you need probably the entire kernel configured i
can post mine:

Its an Intel Macintosh so sorry if its non-conformant;

im just waiting on the build to actually start since
if your not using any specific set of useflags it seems to build
when i go to rebuild it it starts erroring: "but whatever i need
to correct my system anyway from trying to downgrade bluetooth to use alsa:
-> You can switch the EAPI it doesnt mean that itll Build the Binary:

https://pastebin.com/CNDvibHS
The Amiga PowerPC G4: ah ok thats what i was loooking for:
OK : Jaguar Web Browser sees macOS Using Linux-Kernel Translation:

ok as im typing im taking a ram hit so hmm

include a disk partition of about 32GB for the TMPFS
->: ok and wherever your swaps going lets say another 8GB:
im theorizing when the sysfs is used up itll start swapping not RAM:
k could be why these builds are failing:
as well as Compiler Flag Options: -Os --jobs=1 --load-average=1

and sitting out the ram hit its successfully compiled:

[+] Comment 10 Anthony Brown 2023-08-13 21:39:19 UTC Comment hidden (obsolete)

and dont subtract useflags except for test -test 
and include whichever things you need

[+] Comment 11 Anthony Brown 2023-08-13 21:44:17 UTC Comment hidden (obsolete)

and for windows you need the bochs tools iso for a standard emulation
installation:

[+] Comment 12 Anthony Brown 2023-08-13 22:13:43 UTC Comment hidden (obsolete)

I hope im getting this right i havent really tried doing it:
->: #not using #gtk or #virtiolibvirtmanager:
->: is what QEMU does not the Linux Kernel: is scheduled translation vs
a actual to be scheduled in the scheduler translation?:

//////////Running QEMU in BOCHS-Like-Mode: Just RUN it:
What it Uses: a Virtual SEABIOS: the SEABIOS is like a Dell Dimension Bios and
hardware Configured Emulator based on the CPU: it doesnt particularly have any
form of acceleration is what bochs is doing:

/////////What QEMU is allowing it to do:
use LSPCI or view Hardware and Device on WINDOWS:\\
-device i82801h11,bus=pci.00,addr=00,id=Intel DRAM Controller \
-device ioh3420

Comment 13 Sam James archtester

2023-08-14 09:51:58 UTC

(In reply to Anthony Brown from comment #12)
This is a Gentoo bug for a specific problem in QEMU. This is not the right venue for whatever your problem is.

Comment 14 Andreas K. Hüttel archtester

2025-03-02 01:00:03 UTC

Is this still happening?