Summary: | sys-boot/grub-0.97-r5 floating point exception when re-installing stage1 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alexandru Toma <flash3001> |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aoz.syn, gentoo, KevinOfOz, sergey.zhelnin |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | patch 810 w/proper revision checking |
Description
Alexandru Toma
2008-05-06 22:10:19 UTC
Right now I can't enter my Gentoo installation anymore. This morning when I started the computer it booted and when it got to loading grub it restarted. If I don't turn the computer off it keeps restarting, booting until it starts to load grub and then restarting again in a loop. I don't know what happened. I didn't do anything out of the ordinary, apart from the commands I mentioned in the previous post. (In reply to comment #0) Exactly the same error after upgrading grub to sys-boot/grub-0.97-r5. Add also this information: cat /etc/mtab: /dev/hda8 / reiserfs rw,noatime 0 0 proc /proc proc rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0 udev /dev tmpfs rw,nosuid 0 0 devpts /dev/pts devpts rw,nosuid,noexec 0 0 /dev/hda2 /mnt/disk_c vfat rw,noatime,iocharset=utf8,codepage=866 0 0 /dev/hda5 /mnt/disk_d vfat rw,noatime,iocharset=utf8,codepage=866 0 0 none /dev/shm tmpfs rw 0 0 usbfs /proc/bus/usb usbfs rw,noexec,nosuid,devmode=0664,devgid=85 0 0 securityfs /sys/kernel/security securityfs rw,noexec,nosuid,nodev 0 0 /dev/hda6 /boot ext2 rw,noatime 0 0 cat /etc/fstab: /dev/hda6 /boot ext2 noauto,noatime 1 1 /dev/hda8 / reiserfs noatime 0 0 /dev/hda7 none swap sw 0 0 /dev/cdroms/cdrom0 /mnt/cdrom iso9660 user,noauto,ro 0 0 #/dev/fd0 /mnt/floppy auto noauto 0 0 /dev/hda2 /mnt/disk_c vfat noatime,iocharset=utf8,codepage=866 0 0 /dev/hda5 /mnt/disk_d vfat noatime,iocharset=utf8,codepage=866 0 0 /dev/sda1 /mnt/usb auto rw,user 0 0 # NOTE: The next line is critical for boot! none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0 (In reply to comment #0) > > grub> setup (hd0)Floating point exception > I have exactly the same problem. Tried to delete the boot partition and create it again - didn't help. In the grub shell it can execute 'cat' for any other partition, but fails at 'boot'. I fixed the problem by downloading and installing the original grub 0.97 from ftp://alpha.gnu.org/gnu/grub/ (In reply to comment #3) ... > I fixed the problem by downloading and installing the original grub 0.97 from > ftp://alpha.gnu.org/gnu/grub/ > You can use (almost) any live cd - the update should've saved your old boot stuff so all you need to do is copy /boot/grub/stage2.old to /boot/grub/stage2 (In reply to comment #4) > (In reply to comment #3) > You can use (almost) any live cd - the update should've saved your old boot > stuff so all you need to do is copy /boot/grub/stage2.old to /boot/grub/stage2 > Thank you! That worked. Still, I wonder why this happened in the first place. I have a "me too" on this setup: newton ~ # emerge --info Portage 2.1.4.4 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r7 i686) ================================================================= System uname: 2.6.24-gentoo-r7 i686 Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz Timestamp of tree: Sun, 11 May 2008 16:45:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] ccache version 2.4 [enabled] app-shells/bash: 3.2_p33 dev-java/java-config: 1.3.7, 2.1.6 dev-lang/python: 2.4.4-r9 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 2.0.0 sys-apps/openrc: 0.2.3 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-mtune=prescott -O2 -pipe -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-mtune=prescott -O2 -pipe -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distcc distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://mirror.internode.on.net/pub/gentoo http://mirror.pacific.net.au/linux/Gentoo" LANG="en_US.UTF-8" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/mnt/data/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://ptolemy/gentoo-portage" USE="3dnow 3dnowext X a52 acl alsa amr apache2 arts audiofile berkdb bzip2 cdr cli cracklib crypt css cups curl dri dts dvb dvd dvdr encode ethereal exif expat fam ffmpeg flac fortran gd gdbm gif glut gmp gphoto2 gpm gtk guile hal iconv idn imagemagick imap isdnlog java jpeg kde lcms lirc mhash midi mmx mmxext mng moznocompose moznoirc moznomail mp3 mpeg mplayer mudflap mysql ncurses networking nls nptl nptlonly nsplugin nvidia ogg opengl openmp pam pcre perl php png ppds pppd python qt3 quicktime readline reflection regex samba session slang speex spl sse sse2 ssl svg tcltk tcpd tetex theora tiff transcode truetype unicode usb v4l v4l2 vorbis win32codecs x264 x86 xine xml2 xorg xv xvid xvmc zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIRC_DEVICES="devinput" USERLAND="GNU" VIDEO_CARDS="vesa nv nvidia v4l" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS In case it is pertinent my /boot partition is EXT2 at (hd0,1). Same probleme here with grub-0.97-r6. In my case /boot is an ext2 file system on /dev/sda3. I had a closer look by recompiling with -g and plugging it in gdb. # gdb grub gdb> run ... grub> root (hd0,2) Filesystem type is ext2fs, partition type 0x83 grub> setup (hd0) Program received signal SIGFPE, Arithmetic exception. 0x0804ee44 in ext2fs_dir (dirname=0xf75ceebc "/boot/grub/stage1") at fsys_ext2fs.c:293 293 __asm__ ("bsfl %1,%0" (gdb) bt #0 0x0804ee44 in ext2fs_dir (dirname=0xf75ceebc "/boot/grub/stage1") at fsys_ext2fs.c:293 #1 0x0804e313 in grub_open (filename=0xf75ceebc "/boot/grub/stage1") at disk_io.c:1728 #2 0x08059b2c in check_file (file=0xf75ceebc "/boot/grub/stage1") at builtins.c:4008 #3 0x0805ccda in setup_func (arg=0xf757dc5a "(hd0)", flags=1) at builtins.c:4155 #4 0x0805ddf8 in enter_cmdline (heap=0xf757dc54 "setup (hd0)", forever=1) at cmdline.c:172 #5 0x08057a7e in cmain () at stage2.c:1079 #6 0x0804c389 in init_bios_info () at common.c:337 #7 0x0804a7e9 in doit () at asmstub.c:180 #8 0x0804a96b in grub_stage2 () at asmstub.c:263 #9 0x08049700 in main (argc=1, argv=0xffaf2d14) at main.c:264 (gdb) The line numbers seem to be misleading due to inlined and macroed coded. The offending instruction is 'idiv %ecx' with %ecx == 0. When stepping through ext2fs_dir (fsys_ext2fs.c) the trap occurs at line 616. raw_inode = (struct ext2_inode *)((char *)INODE + ((current_ino - 1) & (EXT2_INODES_PER_BLOCK (SUPERBLOCK) - 1)) * EXT2_INODE_SIZE (SUPERBLOCK)); The folded macro EXT2_INODES_PER_BLOCK has a division. #define EXT2_INODE_SIZE(s) (SUPERBLOCK->s_inode_size) #define EXT2_INODES_PER_BLOCK(s) (EXT2_BLOCK_SIZE(s)/EXT2_INODE_SIZE(s)) I do not know how an ext2 superblock has to look like. However, patch 810_all_grub-0.97-ext3_256byte_inode.patch affects this line. I remember it used to work before, i.e., r5 broke it and r5 came with that patch. Workaround: convert ext2 into ext3 using "tune2fs -j". Can you please try r6 with 810_all_grub-0.97-ext3_256byte_inode.patch reverted? Unfortunately, it seems that I am unable to reproduce the bug after migrating to ext3. I rebuilt the ext2 file system from scratch (mke2fs), but without success. The bug no longer occurs. Surprisingly, "fsck.ext2 -vfc" did not report any error earlier while the bug occurred. The file systems must have been corrupted such that it could not be detected by fsck. I have no other explanation available. I did a little further debugging and found the problem: #define EXT2_INODE_SIZE(s) (SUPERBLOCK->s_inode_size) #define EXT2_INODES_PER_BLOCK(s) (EXT2_BLOCK_SIZE(s)/EXT2_INODE_SIZE(s)) s_inode_size == 0, so we get a divide by 0. It looks like s_inode_size isn't always present, so I made this change to the EXT2_INODE_SIZE() macro: #define EXT2_INODE_SIZE(s) ((SUPERBLOCK->s_inode_size) ? \ SUPERBLOCK->s_inode_size : \ sizeof(struct ext2_inode)) Rebuilt, installed, ran grub setup (hd0), no problems :-) In general, I think there's a better way to check whether the new fields added by 810_all_grub-0.97-ext3_256byte_inode.patch are valid than to check if they're 0, but I didn't dig any deeper. The kernel source would be a good reference I assume. James Jones: thanks, why didn't you read the comment 7, three spots above yours, where it was traced to that exact code already. Then in comment 8 I asked somebody to test with the 810 patch dropped, and rene couldn't because he did a workaround in the meantime. I did read that comment, that's why I knew to look at that particular patch specifically... I tested with r6 with the patch applied. Dropping the patch would obviously fix the problem, but don't you want the functionality of that patch? Dropping the 810 patch works for r6, but I'd think doing so would do more harm than good, given what it does. This is just a bad upstream/borrowed patch - the first Fedora patch (http://cvs.fedora.redhat.com/viewcvs/devel/grub/grub-fedora-9.patch) assumes the filesystem is of type EXT2_DYNAMIC_REV instead of checking s_rev_level at run-time to confirm that. Of course, the old revision just padded the remainder of the block w/zeroes, so we get where we are today. Whether or not it was made in a vacuum, James' patch above pretty closely matches what linux/ext2_fs.h does. Attached is patch 810 re-done with code lifted straight from linux/ext2_fs.h - it properly checks the filesystem's revision before blindly grabbing a presumed inode_size. I've tested it against an ext2 image build of OpenWRT, where this bug has been ghosting me ever since -r5 made it into the tree. Created attachment 160981 [details, diff]
patch 810 w/proper revision checking
Please test 0.97-r8 that is in package.mask very carefully (aka have a livecd handy), but this should now be fixed. 50% tested (against workstation w/dynamic inodes). Will be a few days before I can test on an embedded system (static inodes), but since 810 is the same patch I submitted, it should be okay. |