Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 601014 - >=sys-devel/gcc-4.9.4 generates broken kernel on ia64, leading to unbootable/uninstallable systems
Summary: >=sys-devel/gcc-4.9.4 generates broken kernel on ia64, leading to unbootable/...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: IA64 Linux
: Normal blocker (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://gcc.gnu.org/PR60465
Whiteboard: gs-4.11.7,gs-4.9.34
Keywords: InVCS, REGRESSION
Depends on:
Blocks: 915000 595560 gcc-5-stable
  Show dependency tree
 
Reported: 2016-11-27 16:44 UTC by Émeric Maschino
Modified: 2023-10-01 06:28 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch (gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch,1.75 KB, patch)
2017-03-18 11:43 UTC, Sergei Trofimovich (RETIRED)
Details | Diff
linux-kernel-ia64-fix-module-loading-for-gcc-5.4.0.patch (linux-kernel-ia64-fix-module-loading-for-gcc-5.4.0.patch,1.73 KB, patch)
2017-03-18 21:07 UTC, Sergei Trofimovich (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Émeric Maschino 2016-11-27 16:44:30 UTC
Hi,

Following upgrade to now stable gcc 4.9.4, I've rebuilt my system packages, including gentoo-sources. Upon restart, I'm left with an unbootable system. The following errors are displayed:

>> Loading modules
   :: Loading from pata:
   :: Loading from sata:
   :: Loading from scsi:
   :: Loading from usb: usb_storage: invalid slot number 1 for IMM64

   :: Loading from firewire:
   :: Loading from waitscan:
   :: Loading from dmraid:
   :: Loading from mdadm: raid0: invalid slot number 1 for IMM64
raid1: invalid slot number 1 for IMM64
async_tx: invalid slot number 1 for IMM64
async_tx: invalid slot number 1 for IMM64
async_tx: invalid slot number 1 for IMM64
raid10: invalid slot number 1 for IMM64

   :: Loading from fs: jbd2: invalid slot number 1 for IMM64
jbd2: invalid slot number 1 for IMM64
jbd2: invalid slot number 1 for IMM64
sunrpc: invalid slot number 1 for IMM64
fuse: invalid slot number 1 for IMM64

   :: Loading from net: libphy:

   :: Loading from iscsi:
>> Initializing root device...
>> Mounting /dev/sdb3 as root...
>> Detected fstype: ext4
>> Using mount fstype: ext4
>> Using mount opts: -o ro
jbd2: invalid slot number 1 for IMM64
mount: mounting /dev/sdb3 on /newroot failed: No such device
!! Cannot mount /dev/sdb3, trying with -t auto
UFS-fs: warning (device sdb3): udf_fill_super: No partition found (2)
mount: mounting /dev/sdb3 on /newroot failed: Invalid argument
!! Cannot mount /dev/sdb3 with -t auto, giving up
!! Could not mount specified ROOT, try again
!! Could not find the root block device in /dev/sdb3.
!! Please specify another value or:
!! - press Enter for the same
!! - type "shell" for a shell
!! - type "q" to skip...
root block device(/dev/sdb3) ::

Nothing helps here.

As a workaround, I've downloaded and burnt current install-ia64-minimal-20161122.iso. Depending on your hardware configuration, you may be able to boot your ia64 system, but you hardly may be able to install Gentoo on it since modules (such as network drivers) produce "invalid slot number 1 for IMM64" errors. Please have a look at the attached screenshot of install-ia64-minimal-20161122.iso booted on my ia64 workstation zx6000.

This is a regression from gcc 4.9.3. Last working Gentoo ISO image was install-ia64-minimal-20161111.iso that came with gcc 4.9.3 as confirmed by releng.

     Émeric
Comment 1 Émeric Maschino 2016-11-27 16:48:48 UTC
Upload limit prevents me from attaching the photo of install-ia64-minimal-20161122.iso booted on my ia64 workstation zx6000. Basically, it's dmesg | grep IMM64 results showing that all modules are failing to load with "invalid slot number 1 for IMM64 errors".

     Émeric
Comment 2 Anthony Basile gentoo-dev 2017-01-01 22:32:52 UTC
Is this a problem in =sys-devel/gcc-4.9.3.  I'm wondering if its a regression or not.
Comment 3 Émeric Maschino 2017-03-03 23:26:31 UTC
(In reply to Anthony Basile from comment #2)
> Is this a problem in =sys-devel/gcc-4.9.3.  I'm wondering if its a
> regression or not.

Just found some spare time to check :oP

So gcc 4.9.3 is fine, while gcc 4.9.4 produces broken binaries. So yes, it's a regression from gcc 4.9.3. Thanks for pointing this out, I initially thought this was a regression from gcc 4.8.

Any idea what has gone wrong between gcc 4.9.3 and 4.9.4? Will git bisect help here or am I better directly trying a newer gcc version (5.x/6.x)?

     Émeric
Comment 4 SpanKY gentoo-dev 2017-03-07 03:57:28 UTC
(In reply to Émeric Maschino from comment #3)

that is odd that 4.9.3 works but not 4.9.4.  looking at the patchsets, i don't see anything significant on our end.

does USE=vanilla w/gcc-4.9.3 work ?  how about USE=vanilla w/gcc-4.9.4 ?
Comment 5 Émeric Maschino 2017-03-08 20:29:49 UTC
(In reply to SpanKY from comment #4)
> that is odd that 4.9.3 works but not 4.9.4.  looking at the patchsets, i
> don't see anything significant on our end.
> 
> does USE=vanilla w/gcc-4.9.3 work ?  how about USE=vanilla w/gcc-4.9.4 ?

USE=vanilla w/gcc-4.9.3 still works, as w/o USE=vanilla.
USE=vanilla w/gcc-4.9.4 still doesn't work, as w/o USE=vanilla.
Comment 6 SpanKY gentoo-dev 2017-03-08 20:52:57 UTC
(In reply to Émeric Maschino from comment #5)

man, that really really sucks

before we spend time trying to bisect this, can you try gcc-5.4 ?  we want to stabilize that version soon anyways, so if it fixes things here, we might as well jump up to it now.
Comment 7 Émeric Maschino 2017-03-11 01:07:13 UTC
(In reply to SpanKY from comment #6)
> 
> man, that really really sucks
> 
> before we spend time trying to bisect this, can you try gcc-5.4 ?  we want
> to stabilize that version soon anyways, so if it fixes things here, we might
> as well jump up to it now.

As you say, because gcc-5.4.0-r3 (I didn't check with 5.4.0) is affected too :-(
Comment 8 SpanKY gentoo-dev 2017-03-11 05:59:29 UTC
unfortunately i don't have an ia64 system locally i can bisect down kernel builds on.  you could try it using gcc's git.
Comment 9 Émeric Maschino 2017-03-16 20:16:17 UTC
(In reply to SpanKY from comment #8)
> unfortunately i don't have an ia64 system locally i can bisect down kernel
> builds on.  you could try it using gcc's git.

OK. Here's git bisect result:

17cf10c6dd2cb018df19f635371d2f13b326e42a is the first bad commit
commit 17cf10c6dd2cb018df19f635371d2f13b326e42a
Author: vapier <vapier@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Jan 19 23:15:12 2016 +0000

    ia64: don't use dynamic relocations for local symbols
    
    Backported from trunk for PR other/60465.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@232595 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 042be1f8db20e14efad5f09a7506b6a1f394f410 8cdd3cf017b494c5c9a51e17be0a66acc5b9107e M	gcc

Does it make sense? The invalid slot number 1 for IMM64 errors that this commit triggers come from arch/ia64/kernel/module.c:

static int
apply_imm64 (struct module *mod, struct insn *insn, uint64_t val)
{
	if (slot(insn) != 2) {
		printk(KERN_ERR "%s: invalid slot number %d for IMM64\n",
		       mod->name, slot(insn));
		return 0;
	}
	ia64_patch_imm64((u64) insn, val);
	return 1;
}

     Émeric
Comment 10 Émeric Maschino 2017-03-17 08:45:18 UTC
(In reply to Émeric Maschino from comment #9)
> 
> OK. Here's git bisect result:
> 
> 17cf10c6dd2cb018df19f635371d2f13b326e42a is the first bad commit
> commit 17cf10c6dd2cb018df19f635371d2f13b326e42a
> Author: vapier <vapier@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Jan 19 23:15:12 2016 +0000
> 
>     ia64: don't use dynamic relocations for local symbols
>     
>     Backported from trunk for PR other/60465.
>     
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@232595
> 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> :040000 040000 042be1f8db20e14efad5f09a7506b6a1f394f410
> 8cdd3cf017b494c5c9a51e17be0a66acc5b9107e M	gcc

Confirmed. Simply reverting this patch from Gentoo's gcc 4.9.4 source yields to a working gcc binary that produces a bootable kernel.

I didn't have time to try reverting this patch from gcc-5.4.0-r3 yet, but I bet that the results will be the same.

Problem is that reverting this patch is probably not the right answer, as upstream PR 60465 was addressing a real issue with gcc on ia64.

     Émeric
Comment 11 SpanKY gentoo-dev 2017-03-18 03:16:28 UTC
ugh, we get to choose between a bootable kernel or a non-crashing glibc :)

maybe Sergei can offer some insight here since he authored that patch
Comment 12 Émeric Maschino 2017-03-18 10:17:11 UTC
(In reply to SpanKY from comment #11)
> ugh, we get to choose between a bootable kernel or a non-crashing glibc :)
> 
> maybe Sergei can offer some insight here since he authored that patch

Which really puzzles me, as even with PR 60465 reverted from my local Gentoo's gcc 4.9.4 source code, I was able to rebuild a working =sys-libs/glibc-2.23-r3 using =sys-devel/binutils-2.25.1-r1. I thought that PR 60465 was here to allow building glibc with gcc >= 4.8, which doesn't seem required right now. Unless the workaround from [1] that you've applied to glibc is still in place for =sys-devel/glibc-2.23-r3. If I understand correctly, this workaround allowed to build glibc with >= gcc 4.8, even if PR 60465 is missing from gcc >= 4.8. Am I right?

Anyway, as a final check, I'm currently rebuilding my entire system (well, dependencies of libc.so.6.1 in fact) using my locally rebuilt gcc 4.9.4 that is lacking PR 60465. Everything's fine at the moment.

     Émeric

[1] https://bugs.gentoo.org/show_bug.cgi?id=503838#c16
Comment 13 Sergei Trofimovich (RETIRED) gentoo-dev 2017-03-18 11:36:19 UTC
(In reply to SpanKY from comment #11)
> ugh, we get to choose between a bootable kernel or a non-crashing glibc :)
> 
> maybe Sergei can offer some insight here since he authored that patch

My theory is:

Kernel's dynamic elf linker does not support all relocation types and
implements only things that appear in the wild.

My guess is gcc did not use to generate gprel64 instructions for kernel before.

We have a few options:
- [will attach gcc patch in a minute] disable gprel64 generation only when kernel is compiled on gcc side (say, when -mconstant-gp is specifid)
- extend kernel code to handle gprel64 relocation

As a very crude workaround: does kernel work if you compile it without module support?
That way all the gprel64 relocation should be fixed by binutils linker.

I'll try to look at ia64_patch_imm64 to see if it's simple enough.
Comment 14 Sergei Trofimovich (RETIRED) gentoo-dev 2017-03-18 11:43:55 UTC
Created attachment 467436 [details, diff]
gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch

gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch is a patch that disabled gprel64 relocations only for -mconstant-gp mode.
linux kernel is supposed to use that.

The patch is against gcc master. Applies cleanly on gentoo's gcc-4.9.4 and 5.4.0-r3.
Comment 15 Sergei Trofimovich (RETIRED) gentoo-dev 2017-03-18 11:51:35 UTC
(In reply to Émeric Maschino from comment #12)
> Unless the workaround from [1] that you've applied
> to glibc is still in place for =sys-devel/glibc-2.23-r3. If I understand
> correctly, this workaround allowed to build glibc with >= gcc 4.8, even if
> PR 60465 is missing from gcc >= 4.8. Am I right?
>
> [1] https://bugs.gentoo.org/show_bug.cgi?id=503838#c16

Yes, the workaround is still in place and the likely cause why it work for you:

https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-libs/glibc/glibc-2.23-r3.ebuild#n170
Comment 16 Sergei Trofimovich (RETIRED) gentoo-dev 2017-03-18 21:07:48 UTC
Created attachment 467472 [details, diff]
linux-kernel-ia64-fix-module-loading-for-gcc-5.4.0.patch

Fixing kernel also appeared to be trivial.

Kernel already has advanced relocation loader
code for ia64. It already handles many types
of 64-bit absolute and relative relocations.

Kernel was very defensive against MLX instructions:
kernel's loader assumed that relocation should always
point to slot=2 instruction in an instruction bundle.
LX takes both slot=1 and slot=2.

binutils is slightly inconsistent and points to slot=1
in some cases.

Kernel loader was fixed to handle both slot=1 and slot=2 cases
upstream in https://www.sourceware.org/bugzilla/show_bug.cgi?id=1433
but it did not lift the precondition.

This patch drops the precondition.

I'm not very confident it works in all cases.

But i did test cross-built kernel in ski:
compiled 'fuse' and 'btrfs' as modules and tried to use btrfs filesystem.

Nothing broke so far.
Comment 17 Sergei Trofimovich (RETIRED) gentoo-dev 2017-03-19 22:41:31 UTC
Sent kernel patch (#c16) upstream as: https://lkml.org/lkml/2017/3/19/173

I'm hesitant to send gcc workaround (#c14) upstream. But it's still
an option for Gentoo if backporting kernel patch is too tedious.
Comment 18 SpanKY gentoo-dev 2017-03-20 07:26:03 UTC
(In reply to Émeric Maschino from comment #12)

if the kernel patch is accepted upstream, then that's fine enough for us

thanks !
Comment 19 Émeric Maschino 2017-03-21 08:37:11 UTC
(In reply to Sergei Trofimovich from comment #13)
> 
> <snip>
>  
> As a very crude workaround: does kernel work if you compile it without
> module support?

System rebuild 100% complete.

But as you made big progresses in the meantime (thanks!), I didn't check with a non-modular kernel.

     Émeric
Comment 20 Émeric Maschino 2017-03-21 08:39:28 UTC
(In reply to Sergei Trofimovich from comment #14)
> Created attachment 467436 [details, diff] [details, diff]
> gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch
> 
> gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch is a patch
> that disabled gprel64 relocations only for -mconstant-gp mode.
> linux kernel is supposed to use that.
> 
> The patch is against gcc master. Applies cleanly on gentoo's gcc-4.9.4 and
> 5.4.0-r3.

Confirmed that this patch yields to a bootable kernel with untouched (i.e. not reverting PR 60465) Gentoo's gcc 4.9.4.

     Émeric
Comment 21 Émeric Maschino 2017-03-21 08:44:35 UTC
(In reply to Émeric Maschino from comment #20)
> 
> Confirmed that this patch yields to a bootable kernel with untouched (i.e.
> not reverting PR 60465) Gentoo's gcc 4.9.4.
> 
>      Émeric

Sorry, I've replied to the wrong comment! I wanted to reply to comment #16, I didn't check with gcc-ia64-don-t-generate-gprel64-relocations-for-linux-ke.patch, my bad.

     Émeric
Comment 22 Émeric Maschino 2017-03-21 08:45:25 UTC
(In reply to Sergei Trofimovich from comment #16)
> Created attachment 467472 [details, diff] [details, diff]
> linux-kernel-ia64-fix-module-loading-for-gcc-5.4.0.patch
> 
> Fixing kernel also appeared to be trivial.
> 
> Kernel already has advanced relocation loader
> code for ia64. It already handles many types
> of 64-bit absolute and relative relocations.
> 
> Kernel was very defensive against MLX instructions:
> kernel's loader assumed that relocation should always
> point to slot=2 instruction in an instruction bundle.
> LX takes both slot=1 and slot=2.
> 
> binutils is slightly inconsistent and points to slot=1
> in some cases.
> 
> Kernel loader was fixed to handle both slot=1 and slot=2 cases
> upstream in https://www.sourceware.org/bugzilla/show_bug.cgi?id=1433
> but it did not lift the precondition.
> 
> This patch drops the precondition.
> 
> I'm not very confident it works in all cases.
> 
> But i did test cross-built kernel in ski:
> compiled 'fuse' and 'btrfs' as modules and tried to use btrfs filesystem.
> 
> Nothing broke so far.

Confirmed that this patch yields to a bootable kernel with untouched (i.e. not reverting PR 60465) Gentoo's gcc 4.9.4.

     Émeric
Comment 23 Émeric Maschino 2017-03-21 08:46:22 UTC
(In reply to SpanKY from comment #18)
> (In reply to Émeric Maschino from comment #12)
> 
> if the kernel patch is accepted upstream, then that's fine enough for us
> 
> thanks !

Totally agree here.

Thanks Sergei :-)

     Émeric
Comment 24 Sergei Trofimovich (RETIRED) gentoo-dev 2017-05-02 20:09:27 UTC
> if the kernel patch is accepted upstream, then that's fine enough for us

Patch found it's way upstream as: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a25fb8508c1b80dce742dbeaa4d75a1e9f2c5617
Comment 25 Sergei Trofimovich (RETIRED) gentoo-dev 2017-06-21 21:22:57 UTC
kernel@, please apply the patch from #c24 to supported kernels.
Should apply cleanly on a wide range without any problems.
Comment 26 Mike Pagano gentoo-dev 2017-06-21 23:50:08 UTC
Queued up for the next gentoo-sources-4.11 release. I will be walking this down.
Comment 27 Sergei Trofimovich (RETIRED) gentoo-dev 2017-11-12 09:46:39 UTC
fixed kernels are stable on ia64 for a while.

Thanks all!