Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 212165 - sys-boot/grub-0.97-r4 makes platform reboot suddenly since GPT patch
Summary: sys-boot/grub-0.97-r4 makes platform reboot suddenly since GPT patch
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-03 10:01 UTC by Thibault Hild
Modified: 2008-03-14 17:49 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thibault Hild 2008-03-03 10:01:14 UTC
Platform is using an intel S5000PAL motherboard with a 2GB disk on module as the master on the primary IDE channel (first partition is FAT32 and contains necessary grub files), grub is installed this way:
grub> root (hd0,0)
grub> embed /boot/grub/stage1_5 (hd0)
grub> install /boot/grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/boot/grub/stage2 /boot/grub/menu.lst
grub> quit

Grub is installed this way because (hd0,0) is FAT32 without long name support and stage 1.5 default name is fat_stage1_5 (modified to the short name "stage1_5" in order to be handled by the filesystem)

Before the grub GPT patch, the platform was behaving correctly. After GPT patch, grub does not display anything and the platform reboots suddenly (over and over since hd0 is the first boot drive).

I say that grub does not display anything but I need to test with a serial line config to be 100% sure. I'm almost sure that grub doesn't even have the time to read the configuration file so I guess it won't help. I'll try anyway on spare time.

The only difference between both config is using grub ebuild 0.97-r4 instead of 0.97-r3. I've tested twice both config with repeating rebuild/install steps and just emerging the needed grub version first.
Comment 1 Thibault Hild 2008-03-04 10:49:07 UTC
Still haven't done the serial line test, in the meantime, here is the output of 'emerge --info':

Portage 2.1.4.4 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.23-gentoo-r8-20080212 i686)
=================================================================
System uname: 2.6.23-gentoo-r8-20080212 i686 Intel(R) Xeon(TM) CPU 3.20GHz
Timestamp of tree: Fri, 29 Feb 2008 05:00:01 +0000
app-shells/bash:     3.2_p17-r1
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.61-r1
sys-devel/automake:  1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://ftp.free.fr/mirrors/ftp.gentoo.org/"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://portage/gentoo-portage"
USE="minimal symlink x86" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1   emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m        maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="cfontz ncurses" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i740 i810 imstt        mach64 mga neomagic nsc nv r128 radeon rendition s3 s3virge savage       siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware         voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
Comment 2 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-03-04 20:15:50 UTC
Ok, that's bizarre. Does your grub config use the parttype or hide/unhide commands - I know there are bugs in those, that i'm going to be patching in r5.

The GPT changes are basically:
foreach(partition) {
 if(GPT signature) {
  goto GPT-scan;
 }
 if(valid) {
  valid-partition = $THIS;
  goto boot;
 }
}
goto failure;
GPT-scan:
 foreach(GPT partition) {
  if(valid) {
   valid-partition = $THIS;
   goto boot;
  }
 }
boot:
 use valid-partition to boot;
failure:
 we fucked up.
Comment 3 Thibault Hild 2008-03-11 10:47:08 UTC
Instead of doing the whole build/install process with grub-0.97-r4, I have done this:
- platform with old build/install (grub 0.97-r3): OK
- replaced stage1 and stage1_5 by the ones from r4 (by using grub r4 shell): OK
- replaced stage2 by the one from r4 (copy to the boot partition): KO

But this time, instead of having the reboot behavior, The platform acts as described in bug #211584.
So perhaps the reboot behavior is just another side effect of using "unhide".
I'll wait for the r5 release and will test again with the whole build/install  procedure.
But for sure, I didn't spot something which makes the difference between "endless reboot" and "Error 22: No such partition".
Comment 4 Thibault Hild 2008-03-11 11:01:03 UTC
Ooops! I'm mister stupid.
During the previous tests, I've forgot to copy stage1 and stage1_5 before the grub shell part.
I'm currently restarting the tests from scratch...
Comment 5 Thibault Hild 2008-03-11 11:20:42 UTC
Ok, I've reproduced the "endless reboot" and identified the faulty grub component.

To summarize my tests:
stage1[r3], fat_stage1_5[r3], stage2[r3]: OK
stage1[r3], fat_stage1_5[r3], stage2[r4]: bug #211584
stage1[r4], fat_stage1_5[r3], stage2[r3]: OK
stage1[r4], fat_stage1_5[r4], stage2[r3]: endless reboot
stage1[r4], fat_stage1_5[r4], stage2[r4]: endless reboot

So, from my point of view, the grub stage from grub-0.97-r4 which induces the "endless reboot" bug is the FAT stage 1.5.
Comment 6 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-03-11 18:50:37 UTC
Instead of using the manual embed+install, please try using 'setup (hd0)'.

Also, could you try to make your first partition ext2 instead and see if the bug persists?
Comment 7 Thibault Hild 2008-03-14 17:49:40 UTC
Robin, you've put me on the right track by telling me to use the "setup" command.
When I've migrated to grub-0.97-r4, I didn't notice that the fat_stage1_5 size had been increased by one sector.
So when calling the "install" command, I was still using "(hd0)1+15" instead of "(hd0)1+16".
So this is not a bug but a misuse of grub. Sorry for the hassle.

A cool thing would be to modify the default name of the fat_stage1_5 so it would fit in a FAT partition with no long name support (this way the "setup" command will work).
I'll send you a patch for this if you find the "feature" useful and if I have some more spare time.