Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 216213 - sys-boot/grub-0.97-r5 fails to start
Summary: sys-boot/grub-0.97-r5 fails to start
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-04 17:39 UTC by Davide Pesavento
Modified: 2008-04-07 21:31 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Davide Pesavento (RETIRED) gentoo-dev 2008-04-04 17:39:58 UTC
After upgrading to grub-0.97-r5, I ran:
grub-install --recheck --no-floppy /dev/sda3

At next reboot, grub took a slightly longer time to start, but right after the "Loading stage 1.5 ..." screen, the machine suddenly rebooted. I couldn't even see the screen with the list of kernels to choose from.
The system has only one reiserfs partition.

Reproducible: Always

Steps to Reproduce:
1. emerge "=sys-boot/grub-0.97-r5"
2. grub-install
3. reboot

Actual Results:  
Machine reboots endlessly.

Expected Results:  
grub loads itself and a linux kernel.
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-05 00:52:46 UTC
Run the grub shell manually from the OS, and do root/setup yourself.
It sounds like the MBR copy isn't looking at the right offset for the stage2.
Comment 2 Chris Smith 2008-04-06 01:43:34 UTC
(In reply to comment #1)
> Run the grub shell manually from the OS, and do root/setup yourself.
> It sounds like the MBR copy isn't looking at the right offset for the stage2.
> 

That worked on my x86_64 system but not on my x86 system. Had to mask r5 on x86.
Comment 3 Davide Pesavento (RETIRED) gentoo-dev 2008-04-06 12:25:55 UTC
Running "root (hd0,2)" and "setup (hd0)" from the grub shell does work, the machine boots fine again. However now I get two separate entries for linux in refit's menu, one of which does not boot.
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-06 18:06:07 UTC
davide: rEFIt is outside the scope of this bug. I don't know how it works, consult with it to see how it creates 'linux' entries.

chris: on your x86 box, could you make sure that the /boot/grub/ files are updated to the -r5 series? I'm seeing other folks where the stage files aren't being updated, and that's causing all sorts of trouble in booting.
Comment 5 Chris Smith 2008-04-06 18:35:13 UTC
(In reply to comment #4)
> chris: on your x86 box, could you make sure that the /boot/grub/ files are
> updated to the -r5 series? I'm seeing other folks where the stage files aren't
> being updated, and that's causing all sorts of trouble in booting.
>

The timestamps of the files get updated. I emerge r5 twice in a row with /boot mounted and the timestamps were updated. The ran grub "root (hd0,0)" and "setup (hd0)" before rebooting. It still failed. Moving back to r4 works fine.

Comment 6 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-06 18:54:31 UTC
chris: post some more details about your hardware, and if you're capable enough, do a binary search on testing with and without the following patches:

patch/016_all_grub-0.97-multiboot-memory-amount.patch
patch/550_all_grub-0.97-long-commandline.patch
patch/600_all_grub-0.97-gpt-partition-table.patch
patch/810_all_grub-0.97-ext3_256byte_inode.patch
Comment 7 Chris Smith 2008-04-06 19:25:57 UTC
I also deleted the files in /boot/grub then emerge r5 then ran grub and it still failed.

Hardware is pretty normal, old P4 based Celeron on Genuine Intel board:
==========================================================
# lspci
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)
01:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Celeron(R) CPU 1.70GHz
stepping        : 3
cpu MHz         : 1699.985
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pebs bts sync_rdtsc
bogomips        : 3402.22
clflush size    : 64

# free
             total       used       free     shared    buffers     cached
Mem:        513996     151792     362204          0       7096      96704
-/+ buffers/cache:      47992     466004
Swap:       979924          0     979924

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             236M   62M  174M  27% /
udev                   10M  104K  9.9M   2% /dev
/dev/sda3             192M     0  192M   0% /tmp
/dev/sda5             2.8G  1.3G  1.6G  46% /usr
/dev/sdb2              19G  4.2G   15G  23% /var
/dev/sdb3             479M   46M  433M  10% /var/log
/dev/sdb5             1.4G  1.2G  252M  83% /usr/portage
/dev/sdb6             236M   33M  204M  14% /opt
/dev/sda6              19G  1.2G   18G   6% /home
shm                   192M     0  192M   0% /dev/shm

/dev/sda1               /boot           ext2            noauto,noatime  1 2
/dev/sda2               /               reiserfs        noatime         0 1
/dev/sda3               /tmp            ext2            noatime         0 1
/dev/sda5               /usr            reiserfs        noatime         0 0
/dev/sdb2               /var            reiserfs        noatime         0 0
/dev/sdb3               /var/log        reiserfs        noatime         0 0
/dev/sdb5               /usr/portage    reiserfs        noatime         0 0
/dev/sdb6               /opt            reiserfs        noatime         0 0
/dev/sda6               /home           reiserfs        noatime,acl,user_xattr  0 0
/dev/sdb1               none            swap            sw              0 0
/dev/cdrom1             /mnt/cdrom      iso9660         noauto,ro       0 0

proc                    /proc           proc            defaults        0 0

shm                     /dev/shm        tmpfs           nodev,nosuid,noexec,size=192m   0 0
==========================================================
As a note, the only not so ordinary thing I'm doing is running the following in /etc/conf.d/local.start

mkdir /dev/shm/tmp
chmod 1777 /dev/shm/tmp
mount --bind /dev/shm/tmp /tmp

(on both systems)
==========================================================

A binary search in what for what?

Comment 8 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-06 20:43:27 UTC
consider the patches A,B,C,D.

1. Test with just A+B.
2. Test with just C+D.

3. If the problem exists at BOTH of them, then either none of them is at fault, or a combination thereof is at fault. Report back here.
4. If the problem only exists at 1, then A or B is at fault.
5. If the problem only exists at 2, then C or D is at fault.

6-7. On the two boots, test the just the single patch from the above.
Comment 9 Chris Smith 2008-04-07 14:58:41 UTC
(In reply to comment #8)
> consider the patches A,B,C,D.
> 
> 1. Test with just A+B.
> 2. Test with just C+D.
> 
> 3. If the problem exists at BOTH of them, then either none of them is at fault,
> or a combination thereof is at fault. Report back here.
> 4. If the problem only exists at 1, then A or B is at fault.
> 5. If the problem only exists at 2, then C or D is at fault.
> 
> 6-7. On the two boots, test the just the single patch from the above.
> 

Went to do this today but noticed that the patchset had changed. So I first remerged with the new patchset and all was OK. Looks like the change in the commandline patch resolved my problem on x86.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-07 19:06:01 UTC
chris: what's your kernel commandline from grub?
please paste it here, preserving the whitespace exactly.
Comment 11 Chris Smith 2008-04-07 20:08:29 UTC
(In reply to comment #10)
> chris: what's your kernel commandline from grub?
> please paste it here, preserving the whitespace exactly.
> 

"kernel /boot/kernel-2.6.24-gentoo-r4 root=/dev/sda2"

without the quotes (of course - just to preserve whitespace at bol and eol)
Comment 12 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-07 20:20:15 UTC
ok, that's really weird then, because the change in the patchset was specifically to fix corruption at EOL - which in itself should not have caused any instant reboots like you saw, because the kernel didn't even boot yet in your case.
Comment 13 Chris Smith 2008-04-07 21:00:54 UTC
I must apologize for the confusion. My bad, as the symptoms I had were as bug #216307 (which could be worked around by added a space at the end of the kernel command line). I guess the reason I posted here was because of Comment #1 which drew my attention and did fix the issue on my x86_64 box, but not on my x86 box, and that advice did not appear in bug #216307.
Comment 14 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-04-07 21:22:16 UTC
Ok, this is closing as invalid, since it was because the MBR copy was not updated as the warning instructed. This was the problem for the original poster and chris's x86_64 box failed.
Comment 15 Davide Pesavento (RETIRED) gentoo-dev 2008-04-07 21:31:17 UTC
I used grub-install to update the MBR copy the first time, so the boot failure might mean that the grub-install script has a bug...