Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 579278 - ia64 can't boot recent InstallCD
Summary: ia64 can't boot recent InstallCD
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Release Media
Classification: Unclassified
Component: InstallCD (show other bugs)
Hardware: IA64 Linux
: Normal critical (vote)
Assignee: Gentoo Release Team
URL:
Whiteboard:
Keywords:
Depends on: 518130 575300
Blocks:
  Show dependency tree
 
Reported: 2016-04-07 19:34 UTC by stanton_arch
Modified: 2018-05-02 06:59 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg serial output (file_579278.txt,22.94 KB, text/plain)
2018-02-11 18:15 UTC, stanton_arch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description stanton_arch 2016-04-07 19:34:55 UTC
I am trying to install a recent version of gentoo for an ia64
HP9000/rp3410/rx2600 dual itanium and not having luck.  2 ilo system
logs are below, but basically I couldn't boot any install image after
2008.

My machine works with freebsd 10.1, but since freebsd ia64 is about to
be unsupported I thought I'd switch to gentoo.

I've tried various versions of the install images:

 2007.0 - default elilo seems to boot, but no serial console.
    "gentoo-serial console=ttyS1,9600" booted but no serial console message
    until livecd root prompt, but then could proceed with the install.

 2008.0
     "gentoo-serial console=ttyS1,9600" booted, but after kernel message
      ERROR: cannot start nfsmount as rpc.statd could not start
     "Video Card"
     "Starting local ..."
     the system seemed to hang.

     booting with default values was ok, (no serial, but vga works)

 20160405
     booting with default and   
    "gentoo-ilo console=ttyS1,9600" kernel uncompressed itself, looked like it tried to
     boot then, system light flashed red, and machine reset itself

Recent images looks like elilo works ok, and kernel uncompresses itself:

 ELILO boot: gentoo-serial console=ttyS1,9600
 Uncompressing Linux... done
 Loading file \efi\boot\gentoo.igz...done

but my system reboots itself without any boot messages (serial or
video) right after.  So I'd say something early on in kernel boot is
causing a problem, and it looks like it has been an issue since
sometime after 2008.

I tried to install with 2007.0 version, but so many things were out of
date and masked, and I couldn't upgrade the system or even emerge
portage.

Here are some ilo logs capture right after elilo exits (I think):

install-ia64-minimal-2007.0.iso (works and boots to livecd shell):

SFW  0   0  0x1600030700E00000 0000000000000002 EFI_SYSTEM_STATE_RUNNING_OK
SFW  0   1  0x2000020800E00000 0000000000000000 EFI_EXIT_BOOT_SERVICES
SFW  0   0  0x0000001600E00000 0000000000000000 BOOT_CELL_VIRTUALIZE_EFI
SFW  0   0  0x0000001800E00000 0000000000000000 BOOT_CELL_VIRTUALIZE_PAL
SFW  0   0  0x0000001A00E00000 0000000000000000 BOOT_CELL_VIRTUALIZE_SAL

install-ia64-minimal-20160405.iso (system reboots itself shortly after
"Loading file \efi\boot\gentoo.igz...done"):

SFW  0   0  0x1600030700E00000 0000000000000002 EFI_SYSTEM_STATE_RUNNING_OK
SFW  0   1  0x2000020800E00000 0000000000000000 EFI_EXIT_BOOT_SERVICES
SFW  1   2  0x5680028501E007C0 0000000000000000 MC_INITIALIZED_RSE
SFW  0   2  0x5680028500E007E0 0000000000000000 MC_INITIALIZED_RSE
SFW     *7  0xC157062FA2020800 013FA17000130300 Type-02 137001 1273857
SFW     *7  0xC157062FA2020810 003FA17000130300 Type-02 137001 1273857
SFW  1  *7  0xF680009801E00820 000000000000000B MC_INITIATED
SFW  0  *7  0xF680009800E00840 000000000000000B MC_INITIATED
SFW  1   2  0x568002A101E00860 20000000FFF21320 MC_PSP
SFW  0   2  0x568002A100E00880 A8000000FFF21330 MC_PSP
SFW  1   2  0x5680011501E008A0 0000000000000000 UNCORRECTED_MC
SFW  0   2  0x5680011500E008C0 0000000000000000 UNCORRECTED_MC
SFW  1  *3  0x7680010701E008E0 0000000000000000 OS_MCA_NOT_REGISTERED
BMC      2  0x2057062FA2020900 FFFF027000120300 Type-02 127002 1208322
SFW      2  0xC157062FA4020910 FFFF000A001D0300 Type-02 1d0a00 1903104
SFW  1   1  0x5680006301E00920 0000000000000000 BOOT_START

Please let me know if I can provide any other or more info.

Thanks
Comment 1 SpanKY gentoo-dev 2016-04-14 00:36:09 UTC
we have remote servers, but i'm not sure anyone has local desktops anymore.
i tried to get one a while back from another dev, but that stalled.
Comment 2 stanton_arch 2016-04-14 09:13:47 UTC
I received an email saying kernel version 3.14.14 runs fine

Linux guppy 3.14.14-gentoo #2 SMP Tue Aug 12 08:25:24 GMT 2014 ia64
Dual-Core Intel(R) Itanium(R) Processor 9040 GenuineIntel GNU/Linux

but I don't know about the install.  I thought it might be kernel issue originally because debian was giving me same error.

Just for info

gentoo 2007.0 kernel 2.6.18 - boots ok and gets to install prompt
gentoo 2008.0 kernel 2.6.24 - boots ok and gets to install prompt
20160405 kernel 4.1.15 - crashes

debain 7 wheezy kernel 3.2.78-1 - same error gentoo 20160405
debian 6.0 squeeze kernel 2.6.32 - boots ok
debian 5.0 lenny kernel 2.6.26 - boots ok

Would it be difficult to cross-compile current install image with another kernel version, say 3.14.14 ?
Comment 3 Sergei Trofimovich (RETIRED) gentoo-dev 2017-11-11 22:42:45 UTC
Xeha on #gentoo-ia64 found out the thread the forums thread https://forums.gentoo.org/viewtopic-p-8066280.html
where 20160702 ISO happened to boot on at least rx3600

I've recovered that ISO at: https://dev.gentoo.org/~slyfox/isos/install-ia64-minimal-20160702.iso
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2018-01-27 18:29:07 UTC
We have a potential fix in bug #575300 . Might take a while to build repaired .isos.
Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev 2018-02-08 07:33:43 UTC
Bug #575300 was fixed but now I think it was unrelated. Autobuilt .iso files never contained broken elilo as they are still from 2006.

What I think happens in this bug is incorrectly configured serial console in ISO boot options. EFI menu allows you to configure serial console. What do you have set up there?

Here is how I got shell booting from Gentoo's live CD on rx2600:

Our rx2600 has a /dev/ttyS1 configures as 115200n8. Don't know if it's a default or custom setup.

I've dropped into read/write EFI shell accessed via iLO ('CO Ctrl-E f c' commands, picked "[EFI shell]" at boot menu) and loaded kernel directly from CDROM manually as:

    fs0:\efi\boot\bootia64.efi -i gentoo.igz gentoo initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=ttyS1,115200n8

[ shamelessly copied from https://wiki.gentoo.org/wiki/Project:Infrastructure/Developer_Machines/ia64#Recovery_notes ]

Can you try newer .iso and report back?
    http://distfiles.gentoo.org/releases/ia64/current-iso/
Comment 6 stanton_arch 2018-02-08 18:19:29 UTC
I tried the newer .iso without any luck.  I pretty much get the same
behavior as before no matter what serial options I select. If I type
in your suggested boot args, or pretty much any boot command it seems,
this is printed out

 ELILO boot: gentoo console=ttyS1,9600n8
 Uncompressing Linux... done
 Loading file \efi\boot\gentoo.igz...done

then nothing seems to happen for about 10 seconds, then the System
light on the front panel flash red for a few seconds, and then the
system reboots itself.

Not sure if this is useful or not, but this is some of dmesg from
FreeBSD about the serial and video devices:

uart0: <HP Auxiliary Diva Serial Port> mem 0xf4051000-0xf405100f irq 82 at device 1.0 on pci224
puc0: <HP Diva Serial [GSP] Multiport UART - Everest SP2> mem 0xf4050000-0xf4050fff,0xf4020000-0xf403ffff irq 82 at device 1.1 on pci224
uart1: <Non-standard ns8250 class UART with FIFOs> at port 1 on puc0
uart1: console (9600,n,8,1)
uart2: <Non-standard ns8250 class UART with FIFOs> at port 2 on puc0
uart3: <Non-standard ns8250 class UART with FIFOs> at port 3 on puc0
vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem 0xf0000000-0xf3ffffff,0xf4040000-0xf404ffff at device 2.0 on pci224
vgapci0: Boot video device
uart4: <16550 or compatible> iomem 0xff5e0000-0xff5e0007 irq 34 on acpi0
uart5: <16550 or compatible> iomem 0xff5e2000-0xff5e2007 irq 35 on acpi0

If other info would be useful, let me know.

Thanks
Comment 7 Sergei Trofimovich (RETIRED) gentoo-dev 2018-02-08 23:03:08 UTC
(In reply to stanton_arch from comment #6)
> I tried the newer .iso without any luck.  I pretty much get the same
> behavior as before no matter what serial options I select. If I type
> in your suggested boot args, or pretty much any boot command it seems,
> this is printed out

You don't have access to real serial to inspect actual output and using iLO console only, right? Which .iso version was that exactly?

>  ELILO boot: gentoo console=ttyS1,9600n8
>  Uncompressing Linux... done
>  Loading file \efi\boot\gentoo.igz...done

I know very little about elilo syntax How does it apply that 'console=ttyS1,9600n8' setting? Just appends it to already built kernel commandline?

Linux kernel also has magic 'console=hcdp' EFI console support:
  https://elixir.free-electrons.com/linux/latest/source/drivers/firmware/pcdp.c#L101
Worth a try.

> then nothing seems to happen for about 10 seconds, then the System
> light on the front panel flash red for a few seconds, and then the
> system reboots itself.

That's unfortunate. It's quite hard to debug without any output.

> Not sure if this is useful or not, but this is some of dmesg from
> FreeBSD about the serial and video devices:
> 
> uart0: <HP Auxiliary Diva Serial Port> mem 0xf4051000-0xf405100f irq 82 at
> device 1.0 on pci224
> puc0: <HP Diva Serial [GSP] Multiport UART - Everest SP2> mem
> 0xf4050000-0xf4050fff,0xf4020000-0xf403ffff irq 82 at device 1.1 on pci224
> uart1: <Non-standard ns8250 class UART with FIFOs> at port 1 on puc0
> uart1: console (9600,n,8,1)
> uart2: <Non-standard ns8250 class UART with FIFOs> at port 2 on puc0
> uart3: <Non-standard ns8250 class UART with FIFOs> at port 3 on puc0
> vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem
> 0xf0000000-0xf3ffffff,0xf4040000-0xf404ffff at device 2.0 on pci224
> vgapci0: Boot video device
> uart4: <16550 or compatible> iomem 0xff5e0000-0xff5e0007 irq 34 on acpi0
> uart5: <16550 or compatible> iomem 0xff5e2000-0xff5e2007 irq 35 on acpi0

I'm not familiar with FreeBSD output but it seems to agree that 'ttyS1,9600n8' is the correct mode. Perhaps something else is crashing kernel early enough.
Comment 8 stanton_arch 2018-02-09 18:12:51 UTC
> --- Comment #7 from Sergei Trofimovich <slyfox@gentoo.org> ---
>
> You don't have access to real serial to inspect actual output and using iLO
> console only, right? Which .iso version was that exactly?
>

I tried with a real serial port (com0) and iLO both.  Both behaved
pretty much the same.  Although, when I was just plugged into the serial
port, the System light didn't flash red.  But it still rebooted itself.

>
> I know very little about elilo syntax How does it apply that
> 'console=ttyS1,9600n8' setting? Just appends it to already built kernel
> commandline?

I believe so.

>
> Linux kernel also has magic 'console=hcdp' EFI console support:
>  
> https://elixir.free-electrons.com/linux/latest/source/drivers/firmware/pcdp.c#L101
> Worth a try.
>

I tried that console option, and many others (debug earlyprintk ...) as
well disabling stuff in the "Hardware options" listed here
 https://wiki.gentoo.org/wiki/Handbook:IA64/Installation/Media
but none seemed to have any effect.

>
> That's unfortunate. It's quite hard to debug without any output.
>

Yep.

>
> I'm not familiar with FreeBSD output but it seems to agree that 'ttyS1,9600n8'
> is the correct mode. Perhaps something else is crashing kernel early enough.

I agree, it still feels to me like the kernel is crashing before console
output is ready.

Thanks
Comment 9 stanton_arch 2018-02-09 18:17:00 UTC
> Which .iso version was that exactly?

I tried the one in the link: install-ia64-minimal-20180201T031003Z.iso
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2018-02-10 12:01:16 UTC
(In reply to stanton_arch from comment #8)
> I agree, it still feels to me like the kernel is crashing before console
> output is ready.

Yeah, sounds like it. We have at least one known yet unfixed bug that reads random memory at very early bootup stage: https://bugs.gentoo.org/show_bug.cgi?id=518130

I've patches it locally and built the kernel on ia64 machine. Can you try the same kernel to see if it shows you anything? These kernels don't require initramfs:
    https://dev.gentoo.org/~slyfox/ia64-kernels/

Try config-4.9.72-gentoo from there. If it happens to print anything for you you can double check against config-4.9.72-gentoo.broken (this lacks single patch).

I think you only need to drop this file to EFI partition and tweak elilo.conf accordingly. Machine that runs this kernel now has the following command:
  image=/boot/vmlinuz-4.9.72-gentoo
	label=gentoo
	root=/dev/cciss!c0d0p3
	read-only
	append="console=ttyS1,115200n8"

I had a feeling though that any text only appears on iLO console only when machine fully boots and reaches operating system. I'll try to play a bit more with console parameters to see the difference.
Comment 11 Sergei Trofimovich (RETIRED) gentoo-dev 2018-02-10 13:20:16 UTC
(In reply to Sergei Trofimovich from comment #10)
> I'll try to play a bit more with console parameters to see the difference.

Tl:DR: To get early console you need to:

- remove any 'console=' (including existing ones) options from kernel command line.

  I've tested it from EFI shellas 'Shell> fs1:\EFI\gentoo\elilo.efi boot\vmlinuz-4.9.72-gentoo root=/dev/cciss!c0d0p3'

- setup single primary serial in EFI. I did not touch existing config. For me configuration is 'P Serial Acpi(HWP0002,PNP0A03,0)/Pci(1|2) Vt100+ 115200'.

I've got to it by rereading https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ia64/serial.txt and experimenting with few options on guppy (rx3600 box).

Now elilo.conf looks like:
    image=/boot/vmlinuz-4.9.72-gentoo
        label=gentoo
        root=/dev/cciss!c0d0p3
        read-only
Comment 12 stanton_arch 2018-02-11 18:15:13 UTC
Created attachment 519112 [details]
dmesg serial output
Comment 13 stanton_arch 2018-02-11 18:16:19 UTC
The kernel vmlinuz-4.9.72-gentoo booted for me successfully with
serial output (the port labeled "SERIAL A" on the back of my rp3410),
so I'd say the patch is a success :).

I copied all the stuff in https://dev.gentoo.org/~slyfox/ia64-kernels/
to a spare msdos partition on my 2nd disk, and then copied elilo.efi
from the 20180201T031003Z .iso image to that same partition.

I booted to the EFI shell, then did

Shell> fs2:
fs2:\> cd gentoo

fs2:\gentoo> elilo.efi
near line 3: Unkown option boot
near line 1: Unkown option boot
forcing interactive mode due to config file error(s)

ELILO boot: vmlinuz-4.9.72-gentoo
Loading vmlinuz-4.9.72-gentoo...Loading Linux... ..done

And then kernel boot messages are displayed.  I've attached the boot
log.  It panics at the end because I don't have a root file system it
understands, which I think is expected.  I got pretty much the same
panic with root=/dev/cciss!c0d0p3 and didn't try messing with any
other args.

vmlinuz-4.9.72-gentoo.broken didn't work for me.  It behaved the same
as other gentoo installs with no serial output, and rebooting itself a
few seconds after the "Loading Linux .. done"

I don't have a linux machine I can build stuff on at the moment but if
someone wants to build an iso with the patch I will try to do a gentoo
install with the patched kernel.

Thanks
Comment 14 Sergei Trofimovich (RETIRED) gentoo-dev 2018-02-11 18:46:22 UTC
(In reply to stanton_arch from comment #13)
> The kernel vmlinuz-4.9.72-gentoo booted for me successfully with
> serial output (the port labeled "SERIAL A" on the back of my rp3410),
> so I'd say the patch is a success :).

Woohoo! Thanks so much for the test!

> I don't have a linux machine I can build stuff on at the moment but if
> someone wants to build an iso with the patch I will try to do a gentoo
> install with the patched kernel.

Once the patch gets accepted upstream we'll push it to gentoo kernel package and new .iso will be rebuilt automatically. I'll try to post an update to this bug when all of above happens.
Comment 15 Larry the Git Cow gentoo-dev 2018-03-18 23:26:05 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ff2c7b91695f91aa82e5cba3e10e56086f4ab74f

commit ff2c7b91695f91aa82e5cba3e10e56086f4ab74f
Author:     Sergei Trofimovich <slyfox@gentoo.org>
AuthorDate: 2018-03-18 23:25:41 +0000
Commit:     Sergei Trofimovich <slyfox@gentoo.org>
CommitDate: 2018-03-18 23:25:41 +0000

    sys-kernel/gentoo-sources: ia64 stable, bug #518130
    
    Stabilize kernel with ptrace() fix as it fixes boot
    for some types fo ia64 machines.
    
    Bug: https://bugs.gentoo.org/579278
    Bug: https://bugs.gentoo.org/518130
    Package-Manager: Portage-2.3.24, Repoman-2.3.6

 sys-kernel/gentoo-sources/gentoo-sources-4.9.85.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)}
Comment 16 Sergei Trofimovich (RETIRED) gentoo-dev 2018-03-27 21:51:50 UTC
I've stabled 4.9.85 with the fix ahead of time and kicked catalyst to build an iso. Can you try this one (or newer):
    http://distfiles.gentoo.org/releases/ia64/autobuilds/20180326T064016Z/

and check if it still boots for you?
Comment 17 stanton_arch 2018-03-28 16:26:19 UTC
I burned the iso from
 http://distfiles.gentoo.org/releases/ia64/autobuilds/20180326T064016Z/
to a CD and booted it with the "gentoo-serial" option.

I believe all the dmesg output was displayed and it made it all the way to the livedcd ~ #
prompt, so I'd say it is working fine.

I may try to go through the install process later when I have some more time.

Thanks!
Comment 18 Sergei Trofimovich (RETIRED) gentoo-dev 2018-03-28 19:44:36 UTC
(In reply to stanton_arch from comment #17)
> I burned the iso from
>  http://distfiles.gentoo.org/releases/ia64/autobuilds/20180326T064016Z/
> to a CD and booted it with the "gentoo-serial" option.
> 
> I believe all the dmesg output was displayed and it made it all the way to
> the livedcd ~ #
> prompt, so I'd say it is working fine.
> 
> I may try to go through the install process later when I have some more time.
> 
> Thanks!

Yay! Thank you!
Comment 19 Larry the Git Cow gentoo-dev 2018-05-02 06:59:18 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e4408fdd30e54ec17b4be7596d27a8afe9225bd1

commit e4408fdd30e54ec17b4be7596d27a8afe9225bd1
Author:     Sergei Trofimovich <slyfox@gentoo.org>
AuthorDate: 2018-05-02 06:58:49 +0000
Commit:     Sergei Trofimovich <slyfox@gentoo.org>
CommitDate: 2018-05-02 06:59:11 +0000

    sys-boot/elilo: stable 3.16-r2 for ia64, bug #579278
    
    Bug: https://bugs.gentoo.org/579278
    Package-Manager: Portage-2.3.31, Repoman-2.3.9
    RepoMan-Options: --include-arches="ia64"

 sys-boot/elilo/elilo-3.16-r2.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9357d74a31434c004128c0a6582b3daf2b35a3d8

commit 9357d74a31434c004128c0a6582b3daf2b35a3d8
Author:     Sergei Trofimovich <slyfox@gentoo.org>
AuthorDate: 2018-05-02 06:58:42 +0000
Commit:     Sergei Trofimovich <slyfox@gentoo.org>
CommitDate: 2018-05-02 06:59:11 +0000

    sys-boot/gnu-efi: stable 3.0.6-r2 for ia64, bug #579278
    
    Bug: https://bugs.gentoo.org/579278
    Package-Manager: Portage-2.3.31, Repoman-2.3.9
    RepoMan-Options: --include-arches="ia64"

 sys-boot/gnu-efi/gnu-efi-3.0.6-r2.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)