Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 87154

Summary: Segmentation fault in grub
Product: Gentoo Linux Reporter: Henrique Dias <hdias>
Component: [OLD] Core systemAssignee: AMD64 Project <amd64>
Status: RESOLVED NEEDINFO    
Severity: critical    
Priority: High    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Henrique Dias 2005-03-29 11:22:44 UTC
When I try configure grub, I get a segmentation fault and the system not boot "Grub Error 2"

Reproducible: Always
Steps to Reproduce:
1. grub

2. grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd

3. grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd1)"...  22 sectors are embedded.
succeeded
Segmentation fault

Actual Results:  
Segmentation fault

Expected Results:  
nothing

Portage 2.0.51.19 (default-linux/amd64/2004.3, gcc-3.4.3,
glibc-2.3.4.20041102-r1, 2.6.10 x86_64)
=================================================================
System uname: 2.6.10 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System version 1.4.16
Python:              dev-lang/python-2.3.4-r1 [2.3.4 (#1, Feb  9 2005, 12:59:08)]
dev-lang/python:     2.3.4-r1
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.5, 1.6.3, 1.7.9-r1, 1.4_p6, 1.9.4, 1.8.5-r3
sys-devel/binutils:  2.15.92.0.2-r1
sys-devel/libtool:   1.5.10-r4
virtual/os-headers:  2.6.8.1-r4
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CFLAGS="-O2"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/share/config /var/bind /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks"
GENTOO_MIRRORS="http://distfiles.gentoo.org
http://distro.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 acpi alsa berkdb bitmap-fonts crypt font-server fortran gif gpm ipv6
jp2 jpeg lzw lzw-tiff mbox milter mp3 multilib ncurses nls opengl oss pam perl
png python readline samba sasl ssl tcpd tiff truetype truetype-fonts type1-fonts
usb userlocales xml2 xpm xrandr xv zlib"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, PORTDIR_OVERLAY
Comment 1 Danny van Dyk (RETIRED) gentoo-dev 2005-03-30 00:23:41 UTC
Please comment the line "unset CFLAGS" in the ebuild and install grub again via

  FEATURES="nostrip" CFLAGS="-O -ggdb" emerge grub.

Then run grub via

  gdb grub
  > run

and try to reproduce the segfault. When it happens, type "backtrace" in gdb and
attach the output of the complete gdb session here.
Comment 2 Henrique Dias 2005-03-30 07:43:44 UTC
# gdb grub
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
 
(gdb) run
Starting program: /sbin/grub
Warning:
Cannot insert breakpoint -2.
Error accessing memory address 0xbc20: Input/output error.
 
(gdb) backtrace
#0  0x00000000 in ?? ()
(gdb)
Comment 3 Simon Stelling (RETIRED) gentoo-dev 2005-03-30 08:08:24 UTC
I/O error on RAM? i guess your RAM has some errors, so it's probably hardware related
Comment 4 Henrique Dias 2005-03-30 08:30:41 UTC
No, the RAM is ok. I get this message from dmesg
# dmesg
end_request: I/O error, dev fd0, sector 0
grub[14302]: segfault at 00000000556f2ac0 rip 00000000556f2ac0 rsp 00000000556f29ec error 15
end_request: I/O error, dev fd0, sector 0
grub[14369]: segfault at 00000000556f2ac0 rip 00000000556f2ac0 rsp 00000000556f29ec error 15
end_request: I/O error, dev fd0, sector 0
grub[14446]: segfault at 00000000556f2ac0 rip 00000000556f2ac0 rsp 00000000556f29ec error 15
Comment 5 Ricardo Correia 2005-04-03 21:50:18 UTC
I can confirm the bug - the exact same thing happens here (although no fd errors, I'm running from the hard disk).

I was booting with grub-static, but now I wanted to install the "normal" grub.
I'm trying to install it on a running system, but it always segfaults.

The kernel logs also show a similar line:
Apr  4 01:24:35 [kernel] grub[12462]: segfault at 000000005570dac0 rip 000000005570dac0 rsp 000000005570d9ec error 15

Maybe it is related to the amd64 noexec protection?
I always boot the kernel with the 'noexec=on noexec32=all' parameters.
Comment 6 Ricardo Correia 2005-04-03 21:57:57 UTC
Correction: there's a fd0 error after all, but it happens the moment I do 'grub> root(hd0,1)', so it doesn't seem to be related.

The segfault itself happens when I run setup:

grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/xfs_stage1_5" exists... yes
 Running "embed /boot/grub/xfs_stage1_5 (hd0)"...  25 sectors are embedded.
succeeded
Segmentation fault
Comment 7 Henrique Dias 2005-04-04 04:10:00 UTC
Remove grub and install grub-static.
I don't have problem's with grub-static.
Comment 8 Robert Moss (RETIRED) gentoo-dev 2005-04-24 09:27:52 UTC
Does this still happen if you use the "--no-floppy" argument when installing grub?
Comment 9 Henrique Dias 2005-04-26 11:01:15 UTC
Yes I get a segmentation fault with or without "--no-floppy" option.
Comment 10 Olivier Crete (RETIRED) gentoo-dev 2005-05-23 20:37:52 UTC
is there a fd0 in /boot/grub/device.map and is there a floppy drive in your
computer ?
Comment 11 Joshua Schmidlkofer 2005-06-07 14:48:02 UTC
I had the same problem, on an Intel EMT64 Xeon.   I got around it by compiling grub-0.96-r2.  Also, a friend of mine insists that he addded USE='xfs XFS' and that fixed it.   I have no clue, all I do know is that I am using XFS, and I can see no way in which that use flag could have any effect.  In either case, try 0.96-r2, and if that doesn't work, try USE="xfs XFS" with it.
Comment 12 Danny van Dyk (RETIRED) gentoo-dev 2005-06-25 04:07:18 UTC
(In reply to comment #2)
> # gdb grub
> (gdb) run
> Starting program: /sbin/grub
> Warning:
> Cannot insert breakpoint -2.
> Error accessing memory address 0xbc20: Input/output error.
>  
> (gdb) backtrace
> #0  0x00000000 in ?? ()
^^^ there is no debug information here. This info is useless for us. I
explicitly asked you to
1. comment the line "unset CFLAGS" in your grub ebuild
2. remerge grub by running:
  FEATURES="nostrip" CFLAGS="-O -ggdb" emerge grub
Please do this and attach the information here. I really want this one to be
resolved.

(In reply to comment #11)
> I had the same problem, on an Intel EMT64 Xeon.   I got around it by compiling
> grub-0.96-r2.  Also, a friend of mine insists that he addded USE='xfs XFS' and
> that fixed it.   I have no clue, all I do know is that I am using XFS, and I
> can see no way in which that use flag could have any effect.  In either case, 
> try 0.96-r2, and if that doesn't work, try USE="xfs XFS" with it.
This is nonsense. USE flags are case insensitive.
Comment 13 Jakub Moc (RETIRED) gentoo-dev 2005-07-01 05:40:07 UTC
(In reply to comment #6)
> The segfault itself happens when I run setup:
> 
> grub> setup (hd0)
> setup (hd0)
>  Checking if "/boot/grub/stage1" exists... yes
>  Checking if "/boot/grub/stage2" exists... yes
>  Checking if "/boot/grub/xfs_stage1_5" exists... yes
>  Running "embed /boot/grub/xfs_stage1_5 (hd0)"...  25 sectors are embedded.
> succeeded
> Segmentation fault
> 

If you are using xfs on your boot partition, it is Bug 90845. Fixed in 0.96-r2
(and 0.97 upstream).
Comment 14 Simon Stelling (RETIRED) gentoo-dev 2005-08-01 03:29:03 UTC
Ricardo: can you reboot your system without these options? i think that's the
simplest way to find out whether noexec is the problem or not. also, can you
please do what Danny asked in comment 1? This will hopefully give us some useful
debug information
Comment 15 Simon Stelling (RETIRED) gentoo-dev 2005-08-31 08:36:33 UTC
we can't fix this without debug information, sorry