Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 619628 - sys-boot/refind-0.10.7 fails to boot with gnu-efi >= 3.0.5
Summary: sys-boot/refind-0.10.7 fails to boot with gnu-efi >= 3.0.5
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Stéphane Veyret
URL: https://sourceforge.net/p/gnu-efi/bug...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-25 02:18 UTC by anoteros
Modified: 2017-07-13 13:51 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (file_619628.txt,6.25 KB, text/plain)
2017-05-25 02:18 UTC, anoteros
Details

Note You need to log in before you can comment on or make changes to this bug.
Description anoteros 2017-05-25 02:18:21 UTC
Created attachment 474150 [details]
emerge --info

I've installed (upgraded to, really) sys-boot/refind-0.10.7 and run refind-install (which reported no errors) only to find myself at a black screen forever at the next boot. 

I tried launching refind_x64.efi from a UEFI shell and I also got a hang with no error messages.

The only way I was able to boot back into my system was to use an archlinux live cd and copy refind_x64.efi from /usr/share/refind to my ESP over the version installed by this package. That booted just fine (and its version is also 0.10.7)

I tried building with and without gnuefi (since it seems the archlinux version is built with gnuefi), no difference there.

Since there were no error messages, I don't have much more detail to report, but I would be glad to help troubleshoot this.
Comment 1 Sam Jorna (wraeth) gentoo-dev 2017-05-31 02:19:08 UTC
What was the previous version of refind you successfully booted from (did you jump any versions)? And can you show what USE flags you built refind with please?
Comment 2 anoteros 2017-06-01 15:26:27 UTC
(In reply to Sam Jorna (wraeth) from comment #1)
> What was the previous version of refind you successfully booted from (did
> you jump any versions)? And can you show what USE flags you built refind
> with please?

I was using the archlinux version of refind beforehand as well. I never had a version of refind from Gentoo

I built refind with btrfs ext2 ext4 gnuefi iso9660
Comment 3 Matthias Dahl 2017-06-29 18:30:13 UTC
I can confirm this. I spent the better half of the day tracking the problem down, so let's hope we can circle in on the problem:

I tested with rEFInd 1.0.8 which builds fine with recent gnu-efi versions, so I bumped the ebuild and removed the appropriate patch.

The problem is actually at first sight with gnu-efi >= 3.0.5. I tested everything from 3.0.3 to 3.0.6 (inclusive). Just to be sure, I tested against gcc 5.4 and 6.3 as well as clang 4.0.1 since both gnu-efi and rEFInd are a little over sensitive when it comes to compilers. ;-)

Compiler did not make any difference but w/ gnu-efi >= 3.0.5, I also got the black screen while booting. ArchLinux is still on 3.0.3 which is why their rEFInd works just fine (downloaded their rEFInd 1.0.8 archive and tested it on my system as well).

I don't know if this is a problem specifically with gnu-efi or if it is a problem between rEFInd and gnu-efi... or both. I am not familiar with the source code of either and I figured diving in now blindly would not make any sense at all. I wanted to do a git bisect but ran out of time, unfortunately.

So, for now, I would suggest at least putting in a block in the rEFInd ebuild for gnu-efi >= 3.0.5 until this is figured out, so others will not run into this problem as well.
Comment 4 Matthias Dahl 2017-06-30 08:49:49 UTC
A git bisect on gnu-efi revealed this as the bad commit:

commit b2c4db065f594fd453be0e39c1c213b0c73fb513 (refs/bisect/bad)
Author: Nigel Croxon <nigel.croxon@hpe.com>
Date:   Fri Jun 17 10:07:25 2016 -0400

    I did a quick review of the MS x86_64 calling convention for floating
    point and as far as I can tell it agrees with the UEFI spec.
    The attached patch removes -mno-mmx and -mno-sse for x86_64 and adds
    a new Print target, "%f", to print float and double types.

    It seems to compile for ia32, although I'm not sure why - shouldn't
    it be throwing errors because the new function FloatToStr() in print.c
    accepts a float, yet I left -no-sse for ARCH=ia32? A better solution
    might be to add -msoft-float for targets where the floating point
    calling convention doesn't match the UEFI spec. As I'm not familiar
    with UEFI on ia32, I didn't make any changes to it.

    Signed-off-by: Nathan Blythe <nblythe@lgsinnovations.com>
    Signed-off-by: Nigel Croxon <nigel.croxon@hpe.com>

I will see if I can get in contact with both the gnu-efi and rEFInd maintainers to get this further investigated and fixed.
Comment 5 Matthias Dahl 2017-06-30 13:44:46 UTC
Ok, since I cannot post a ticket on the Project page over at SourceForge (I haven't managed to successfully register an account), I have contacted both maintainers via mail with the appropriate information.

As soon as there is something new to report, I will update this bug.
Comment 6 Matthias Dahl 2017-06-30 13:52:42 UTC
Oh well: The mail bounced for gnu-efi's maintainer Nigel Croxon, apparently that mail address no longer exists?!

So either way some one else posts a ticket over at the Project page or I will do it once I have my account properly activated (which is out of my hands as I am waiting for feedback from their support).
Comment 7 Matthias Dahl 2017-06-30 14:09:33 UTC
Finally got the account activated, posted the ticket and added it to this bug.
Comment 8 Matthias Dahl 2017-07-04 08:28:25 UTC
Ok, I tracked the problem down:

If gnu-efi >= 3.0.5 is compiled with -march=native on a CPU that is AVX-capable, the compiler will usually generate AVX instructions which naturally blow up and result in a non-functional gnu-efi build. Prior to 3.0.5, gnu-efi set "-mno-sse" which also implied that no AVX should be used. I doubled checked the binaries generated, and they contained several AVX instructions.

I updated the upstream bug report accordingly.

I opened two pull requests:

1) https://github.com/gentoo/gentoo/pull/5039

This basically adds a custom-cflags flag to gnu-efi which is disabled by default. I don't think it is wise and/or necessary to build gnu-efi with any custom flags as it has an unusually high risk to cause trouble for the average joe.

If custom-cflags is set, though, it will append mno-avx to prevent this exact bug we are talking about here.

2) https://github.com/gentoo/gentoo/pull/5038

This bumps rEFInd to 1.0.8 which contains the official fixes for gnu-efi >= 3.0.5 so we can drop our custom patch here.

If nobody minds, I will leave this bug open until this is fixed in the tree.
Comment 9 Matthias Dahl 2017-07-13 07:52:07 UTC
Ok, gnu-efi now has a 'custom-cflags' USE flag as well which is (along with refind's) masked to stopped the user from running into issues like this.

Since this has been merged to tree, I am closing this bug but will continue trying to get a fix merged upstream for this particular issue (avx).

Thanks to everyone involved!
Comment 10 Matthias Dahl 2017-07-13 13:51:04 UTC
For completeness: A fix has been merged upstream such that "-mno-avx" is appended automatically if necessary. See commit 99d94682de590719f9333fcf091910a9581b44c0.