Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 895866 - app-emulation/qemu-9999: qemu-x86_64 segfaults on ppc64le (4K page size) when downloading go dependencies: unexpected fault address 0x0
Summary: app-emulation/qemu-9999: qemu-x86_64 segfaults on ppc64le (4K page size) when...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC64 Linux
: Normal normal (vote)
Assignee: Virtualization Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-22 11:03 UTC by darkbasic
Modified: 2023-08-14 09:51 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description darkbasic 2023-02-22 11:03:01 UTC
qemu-x86_64 segfaults when trying to compile yay inside an Arch Linux x86_64 chroot from a Gentoo Linux ppc64le (4K page size) host. Hardware is a Raptor CS Talos 2 Power 9.
It works with qemu-7.2 so this is a regression in git master.

Upstream issue: https://gitlab.com/qemu-project/qemu/-/issues/1494

Unfortunately it's not 100% reproducible and I already failed to correctly bisect it twice.

This is my latest bisecting setup: https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1288167270

Since my last attempt I plan to increase the repetitions from 10 to at least 30. I also want to remove the mount/umount stuff out of the script (when it fails it doesn't umount cleanly and there is no reason to do it every time because I can keep it mounted all the time instead).
I also want to killall -9 qemu-x86_64 each time (it keeps running when it fails), but I'm not sure how to do so without potentially triggering an unsuccessful exit status if the process does not exist (which would mark the commit as bad automatically).
How can I stop the error code from propagating? Something like:
killall -9 qemu-x86_64 | enforce-it-to-always-return-success

Last but not the least I want to check again the known good commit (which is the 7.2 git tag) using the live ebuild instead of 7.2 one: chances are slim but there might be something different in the live ebuild which makes it fail compared to the 7.2 ebuild. I already did it but only with 10 repetitions, which apparently is still too low to accurately reproduce the issue.

I'm open to other ideas as well, I would really hate to see this bug in the next stable version (qemu user is already a regression hell).
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-22 11:05:38 UTC
Just some rough thoughts to start:
- https://wiki.gentoo.org/wiki/Bisecting_with_live_ebuilds is how I normally do bisecting with live ebuilds (your setup seems similar)
- For a bug that we're in the middle of, we're doing this:
```
[...]
for x in $(seq 1 10); do
        # We deliberately exit *0* if it fails because we're bisecting for the bad commit, not good
        timeout 1m bin/llvm-tblgen \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/TargetPowerPC \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/include/ \
                -I /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/ \
                /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/PPC.td \
                -o /dev/null \
                --gen-dag-isel \
                -d /dev/null \
                --time-phases \
                --write-if-changed
        timeout_result=$?

        case ${timeout_result} in
                124)
                        # Timed out
                        exit 1
                        ;;
                0)
                        ;;
                *)
                        # Something else happened, skip
                        exit 125
                        ;;
        esac
done

exit 0
```

You might want to adjust it so that a bad exit code is *also* a failure, but you get the idea.

Given you're automating the bisect, I'd say it's fine to give every run/commit several attempts to ensure it's definitely bad.
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-22 11:07:55 UTC
(In reply to darkbasic from comment #0)
> How can I stop the error code from propagating? Something like:
> killall -9 qemu-x86_64 | enforce-it-to-always-return-success
> 

Try:
# Ignore the exit status of qemu-x86_64
killall -9 qemu-x86_64 || true
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-22 11:15:37 UTC
(In reply to Sam James from comment #1)
> Just some rough thoughts to start:
> - https://wiki.gentoo.org/wiki/Bisecting_with_live_ebuilds is how I normally
> do bisecting with live ebuilds (your setup seems similar)
> - For a bug that we're in the middle of, we're doing this:
> ```
> [...]
> for x in $(seq 1 10); do
>         # We deliberately exit *0* if it fails because we're bisecting for
> the bad commit, not good
>         timeout 1m bin/llvm-tblgen \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/TargetPowerPC \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/include/ \
>                 -I
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/ \
>                
> /var/tmp/portage/sys-devel/llvm-16.0.0.9999/work/llvm/lib/Target/PowerPC/PPC.
> td \
>                 -o /dev/null \
>                 --gen-dag-isel \
>                 -d /dev/null \
>                 --time-phases \
>                 --write-if-changed
>         timeout_result=$?
> 
>         case ${timeout_result} in
>                 124)
>                         # Timed out
>                         exit 1
>                         ;;
>                 0)
>                         ;;
>                 *)
>                         # Something else happened, skip
>                         exit 125
>                         ;;
>         esac
> done
> 
> exit 0
> ```
> 
> You might want to adjust it so that a bad exit code is *also* a failure, but
> you get the idea.
> 
i.e. change 'exit 125' to 'exit 1' too.
Comment 4 darkbasic 2023-02-22 12:07:14 UTC
> i.e. change 'exit 125' to 'exit 1' too.

Now that I think about it might be wise to return 125 if the ebuild compilation fails for whatever reason, so to skip the commit while not marking it bad.
Comment 5 darkbasic 2023-02-23 11:23:03 UTC
Finally bisected it: https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1289797177

It took over 16 hours this time, thanks for your help!
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-23 11:35:30 UTC
(In reply to darkbasic from comment #5)
> Finally bisected it:
> https://gitlab.com/qemu-project/qemu/-/issues/1494#note_1289797177
> 
> It took over 16 hours this time, thanks for your help!

No problem! What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.
Comment 7 darkbasic 2023-02-23 12:35:54 UTC
> What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.

I've already gave it a quick try, but it's not trivial to revert that commit on top of git master and would require some digging into subsequent changes.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-23 13:51:28 UTC
(In reply to darkbasic from comment #7)
> > What I usually do as a sanity check is then revert that bad commit on top of master and verify that the problem is gone.
> 
> I've already gave it a quick try, but it's not trivial to revert that commit
> on top of git master and would require some digging into subsequent changes.

ack :)
Comment 9 Anthony Brown 2023-08-13 21:37:25 UTC Comment hidden (obsolete)
Comment 10 Anthony Brown 2023-08-13 21:39:19 UTC Comment hidden (obsolete)
Comment 11 Anthony Brown 2023-08-13 21:44:17 UTC Comment hidden (obsolete)
Comment 12 Anthony Brown 2023-08-13 22:13:43 UTC Comment hidden (obsolete)
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-08-14 09:51:58 UTC
(In reply to Anthony Brown from comment #12)
This is a Gentoo bug for a specific problem in QEMU. This is not the right venue for whatever your problem is.