Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 94097

Summary: Machine Check Exception Causing System to freeze on Boot
Product: Gentoo Linux Reporter: Bharath Ramesh <krosswindz>
Component: [OLD] Core systemAssignee: Daniel Drake (RETIRED) <dsd>
Status: RESOLVED FIXED    
Severity: critical CC: kernel
Priority: High    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
URL: http://bugme.osdl.org/show_bug.cgi?id=4861
Whiteboard:
Package list:
Runtime testing required: ---

Description Bharath Ramesh 2005-05-26 10:19:16 UTC
I have dual opteron storage server. I just upgraded it to use the
gentoo-2.6.11-r7 the system doesnt not boot and always freezes with machine
check exception. The error displayed on the console is the following.

CPU 0: Machine Check Exception                 7 Bank 4: b400000000a13
RIP 10:<ffffffff8010c444> {default_idle+0x24/0x30}
TSC fc233b962 ADDR 3d520070
Kernel panic - not syncing : Uncorrected machine check

I tried booting using nomce feature but system keeps rebooting. Currently its
running the 

Reproducible: Always
Steps to Reproduce:
1. Reboot machine
2.
3.

Actual Results:  
System freezes with machine check exception

Expected Results:  
System should boot normally

Portage 2.0.51.19 (default-linux/amd64/2004.3, gcc-3.3.4,
glibc-2.3.4.20041102-r1, 2.6.5-gentoo-r1 x86_64)
=================================================================
System uname: 2.6.5-gentoo-r1 x86_64 AMD Opteron(tm) Processor 244
Gentoo Base System version 1.4.16
Python:              dev-lang/python-2.3.5 [2.3.5 (#1, May 12 2005, 20:35:16)]
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
dev-lang/python:     2.3.5
sys-apps/sandbox:    [Not Present]
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.9.5, 1.5, 1.8.5-r3, 1.6.3, 1.7.9-r1, 1.4_p6
sys-devel/binutils:  2.15.92.0.2-r8
sys-devel/libtool:   1.5.16
virtual/os-headers:  2.6.8.1-r4
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CFLAGS="-pipe -O2"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-pipe -O2"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.mirrors.pair.com/
http://mirror.datapipe.net/gentoo ftp://mirrors.tds.net/gentoo"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 acpi alsa berkdb bitmap-fonts cdr crypt cups font-server fortran gdbm
gif gpm imlib ipv6 jp2 jpeg ldap lzw lzw-tiff motif mp3 ncurses nls opengl oss
pam perl png python readline slang ssl tcpd tiff truetype truetype-fonts
type1-fonts usb userlocales xml2 xpm xrandr xv zlib userland_GNU kernel_linux
elibc_glibc"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Comment 1 Daniel Drake (RETIRED) gentoo-dev 2005-05-31 16:38:34 UTC
Please test development-sources-2.6.12_rc5
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2005-06-13 15:43:14 UTC
See comment #1
Comment 3 Bharath Ramesh 2005-06-13 21:39:40 UTC
I installed 2.6.12-rc6 instead of rc5. I still see the machine check exception
the TSC and ADDR values differ but the bank and RIP values are the same. I
downloaded the machine check exception parser from

http://www.codemonkey.org.uk/parsemce/

but I couldnt get it to parse the input. Not sure how to use it. Any help
appreciated
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2005-06-19 03:35:44 UTC
So it is not possible to boot the machine to a console at all, even with the
nomce option?
Comment 5 Bharath Ramesh 2005-06-19 06:41:47 UTC
When I use the nomce option and try booting using 2.6.11-r7 or the 2.6.12-rc6
then the system goes in a loop of rebooting. It doesnt reboot after a fixed time
so which makes it difficult for me. If I dont use the nomce option it keeps
freezing with the machine check exception with a different TSC and ADDR. This
freeze also doesnt occur after a fixed time. I dont see the problem with the
2.6.5-r1 kernel which seems just work fine as its been working for over the last
one year. I am not sure how to use parsemce tool found at link I posted in
comment #3.
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2005-06-19 07:23:56 UTC
This doesn't really answer my question, since you haven't said at which point
the reboot/freeze actually occurs.
Comment 7 Bharath Ramesh 2005-06-19 10:29:39 UTC
The reboot occurs at random places. Sometimes while loading modules, sometimes
while mounting the filesystem. Sometimes anywhere in between when the services
are being started. There is no fixed place where it freezes/reboots.
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2005-06-26 02:38:34 UTC
OK, if you boot with init=/bin/bash as a kernel parameter (without nomce), do
you get some time on the console before it reboots? I think we should be able to
parse the MCEs that way.
Comment 9 Bharath Ramesh 2005-06-26 11:55:08 UTC
The machine still reboots but I managed to get a better example to run with
parsemce and the output of parsemce is as follows

jeeves ~ # ./parsemce -b 4 -s b442200000000a13 -e 0000000000000007 -a 0
Status: (7) Machine Check in progress.
Error IP valid
Restart IP valid.
parsebank(4): b442200000000a13 @ 0
        External tag parity error
        Uncorrectable ECC error
        Address in addr register valid
        Error enabled in control register
        Error not corrected.
        Bus and interconnect error
        Participation: Local processor responded to request
        Timeout: Request did not timeout
        Request: Generic error
        Transaction type : Instruction
        Memory/IO : Other
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2005-07-07 14:35:05 UTC
Not sure what to make of this. Could you please report it at
http://bugzilla.kernel.org as it seems to be an upstream bug. Please post the
new bug URL here.
Comment 11 Bharath Ramesh 2005-07-07 15:32:26 UTC
Filed bug upstream. 

http://bugme.osdl.org/show_bug.cgi?id=4861
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2005-08-05 08:01:02 UTC
If you have time, please test with 2.6.13_rc6 (soon to be released) and update
the usptream bug as appropriate. If it is still an issue I will attempt to get
the bug listed on Andrew Morton's to-be-fixed list :)
Comment 13 Daniel Drake (RETIRED) gentoo-dev 2005-08-20 05:11:18 UTC
Fixed in upstream patch, will include in next gentoo-sources release.
Comment 14 Daniel Drake (RETIRED) gentoo-dev 2005-08-29 08:53:24 UTC
Fixed in gentoo-sources-2.6.13