Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 68641 - Hal or DBUS Causes Kernel Panic on Boot
Summary: Hal or DBUS Causes Kernel Panic on Boot
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High major (vote)
Assignee: foser (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-10-23 08:00 UTC by Mark Duckworth
Modified: 2005-12-27 11:36 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Duckworth 2004-10-23 08:00:07 UTC
This bug isn't going to be complete as of yet.  I will update more information as I have it but the basic scenario is this:  Running Gentoo with ~amd64 keyword.  Recently installed gcc 3.4 and updated glibc and libstdc++.  After this my system fails on reboot.  A good bit of time after hal/dbus load (actually when sshd loads so a long time after hal and/or dbus loads) I get a Machine check exception and then a bank address is printed.  And then kernel panic.  This is on kernel 2.6.9-rc3.  I wanted to file this bug just in case someone else has this and suspects their hardware.  This only happens with Hal and DBUS enabled on boot, if I disable them I boot just fine with no kernel panic and a rock solid system.  Memtest86 reports no problems with my ram.  I will try a number of things later to narrow the problem to specifically hal or dbus and attach my results.  The first thing I have tried to rememdy the situation is to reemerge hald and dbus which did not help.

Reproducible: Always
Steps to Reproduce:
1. install gcc 3.4
2. rebuild glibc and libstdc++
3. crash




Portage 2.0.51-r2 (gcc34-amd64-2004.1, gcc-3.4.2, glibc-2.3.4.20041006-r0,
2.6.9-rc3 x86_64)
=================================================================
System uname: 2.6.9-rc3 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System version 1.5.3
Autoconf: sys-devel/autoconf-2.59-r5
Automake: sys-devel/automake-1.8.5-r1
Binutils: sys-devel/binutils-2.15.92.0.2-r1
Headers:  sys-kernel/linux26-headers-2.6.8.1-r1
Libtools: sys-devel/libtool-1.5.2-r5
ACCEPT_KEYWORDS="amd64 ~amd64"
AUTOCLEAN="yes"
CFLAGS="-march=k8 -O2 -pipe -fomit-frame-pointer -frename-registers -fPIC"
CHOST="x86_64-pc-linux-gnu"
COMPILER=""
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config
/usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown
/usr/kde/3/share/config /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=k8 -O2 -pipe -fomit-frame-pointer -frename-registers -fPIC"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distlocks"
GENTOO_MIRRORS="http://gentoo.mirrors.pair.com"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acpi alsa amd64 apm arts avi berkdb bitmap-fonts bonobo cdr crypt curl
dga dvd encode esd f77 fbcon foomaticdb gdbm gif gnome gpm gtk gtk2 gtkhtml
guile imlib jpeg jpg libg++ libwww mikmod motif mpeg multilib mysql ncurses nls
oggvorbis opengl oss pam pdflib perl png python qt quicktime readline samba sasl
sdl slang spell ssl svg tcltk tcpd theora truetype xinerama xml2 xmms xprint xv
zlib"
Comment 1 foser (RETIRED) gentoo-dev 2004-10-25 06:11:10 UTC
i'd suggest at least trying with other kernel versionss to see if it is not 2.6.9 related ?

Also, have you added hald/dbus to boot runlevel ? Although it shouldn't matter, they probably should be in default.

Comment 2 Mark Duckworth 2004-10-25 06:59:57 UTC
I don't think it's in the boot runlevel.  That is they don't start if I start in single user mode.  I unfortunately cannot try any other kernels.  2.6.8 is not only unstable for me but several drivers for my motherboard are missing.... That's not to say I couldn't see if the thing actually boots or not.  I'll try it.  2.6.9-rc series are the ones that work and I have not tried rc4 versus rc3 for instance but I will do that.  I have an Nforce 3 250gb (asus k8n-e deluxe) and amd64 3000+ which I figure is likely a combination others are to have.
Comment 3 foser (RETIRED) gentoo-dev 2005-01-16 12:43:22 UTC
any further news here ? judging by the lack of reports i got, i'd say it's not a common issue (on amd64).
Comment 4 ziegs 2005-04-27 00:39:01 UTC
i have the exact same problem...well almost.

i can actually run hal/dbus, however upon insterting a cd into my dvd drive, i get the error described.  perhaps the error is a combination of hardware and software issues?

i'm running a nvidia nforce4 mobo, amd64 3200+.  i have tried different kernels.
Comment 5 Mark Duckworth 2005-04-27 01:20:15 UTC
Unfortunately this problem has not gotten better.  I keep up with all the kernels and I am running 2.6.11 now..  No improvement.  Oh well :-/  I guess my CPU is bugged.  I don't have any problems with *anything* else.  This isn't a gentoo issue either, hal on fedora takes the system down in an identical way.
Comment 6 foser (RETIRED) gentoo-dev 2005-04-27 03:49:14 UTC
can any of the amd64 devs confirm ( i guess it might be very hardware dependant)? Or can the kernel team give some insight ?

Could anyone give the exact kernel panic error message ?

I think this should go upstream on the other hand we might first have to check out how it behaves with the unstable hal/dbus tree of (0.5) .. which i will probably add in the near future when i have some spare time.
Comment 7 Carlos Silva (RETIRED) gentoo-dev 2005-04-27 04:01:10 UTC
I suggest he tries all the ~amd64 version of everything... kernel and hal/dbus.
If the problem doesn't go away this should be send upstream but really don't know where... :/ kernel or hal/dbus
Comment 8 Herbie Hopkins (RETIRED) gentoo-dev 2005-04-27 04:32:46 UTC
I can only confirm that I've never had a problem with hal/dbus on amd64. Mark, are you using an nforce4 chipset also?
Comment 9 Marcus D. Hanwell (RETIRED) gentoo-dev 2005-04-27 04:42:28 UTC
I have not encountered this bug and have been using ~amd64 hal/dbus with 2.6.10 and 2.6.11 gentoo-sources kernels. I have the nforce3 250 chipset on these two systems. I have been through several versions of libs with these too.
Comment 10 Daniel Gryniewicz (RETIRED) gentoo-dev 2005-04-27 08:35:40 UTC
I have not encounterd this problem either, including buring DVDs.  I have VIA VT82xx.
Comment 11 Mark Duckworth 2005-04-27 11:11:31 UTC
For those that asked, my chipset is NForce3 250GB (asus board with 6 SATA ports).  And my CPU is AMD64 3000+.  I am running ~amd64 everything, and if it pleases the court, I'll enable hal, reboot and write down the MCE.
Comment 12 Mark Duckworth 2005-04-27 11:22:06 UTC
Strange but pleasing.  It now works.  The only difference between now and way back then that I can think of besides some updates, is that I no longer run VGA console.  Plus this has been fixed recently as the latest Fedora Core AMD64 doesn't work still.  Chances are it was an unrelated kernel bug that just happened to manifest itself in this way for me...
Comment 13 ziegs 2005-04-27 21:20:58 UTC
ok the exact error message is:

CPU 0: Machine Check Exception:         4 Bank 4: b20000000000070f70f
TSC 2afbb09ee227
Kernel Panic - not syncing: Machine check

(I think i may have put an extra 0 in that long hex value...its hard to count)

i'll try to turn off vga in the console and see what that gets me
Comment 14 ziegs 2005-04-27 21:25:05 UTC
turning off the vga console had no affect.  the latest versions of dbus and hal from FDo cvs don't compile/run correctly straight out of cvs, so i suppose i'll await the ebuild
Comment 15 Carlos Silva (RETIRED) gentoo-dev 2005-04-28 06:29:31 UTC
@ziegs: Do u also have this problem? If so, did you try running vanilla-sources-2.6.12_rc3, hal-0.4.7-r2 and dbus-0.23.4?
Comment 16 Mark Duckworth 2005-04-28 07:15:04 UTC
Zeigs, Isn't that annoying?  That's pretty much identical to what I got except the address was different.  If I recall correctly though it was always "4 Bank 4".  I thought I had a RAM issue or something, but if someone else has this it must mean that there's a bug somewhere.. perhaps in a certain batch of AMD64 CPU's or god knows what ;-)  Hald definitely knows what wrong button to push though ;-)  For it currently working for me, I'm running Kernel 2.6.11-gentoo-r5.  I also realize that I'm not using the kernel NIC driver but instead am using hte nvidia binary driver now, nvnet.  I thought I was using their audio driver too but nope, I'm using standard alsa  intel 8x0.  Seems the net driver or a driver itself might be more likely to affect hal than vga console honestly, but it was worth a shot.
Comment 17 ziegs 2005-04-28 09:08:18 UTC
strangely, using nvnet causes the error to be held off as long as i was in the console, however upon starting gnome-volume-manager the system KPs again.

i'm compiling 2.6.12_rc2 now carlos
Comment 18 ziegs 2005-04-28 09:44:34 UTC
nope, using the .12_rc2 kernel has no effect.  the only thing that seems to be working so far kinda is using the nvnet but then it still crashes once HAL is actually used for something...
Comment 19 Mark Duckworth 2005-04-28 12:24:28 UTC
Interesting.  It seems we found a legitimate bug?  I wonder if there's that many Gentoo AMD64 users that never bothered with HAL.  I used it today.  I plugged my Digicam in and it popped right up on my desktop, asking me if I wanted to import pics.  Also my Zaurus was recognized and no crash.  Did precisely what it's supposed to.  My CDROM drives are functioning right too so Hal seems to be doing it's thing right for me.  Maybe now we have a difference between Nforce3 aqnd Nforce4.
Comment 20 ziegs 2005-04-28 22:38:18 UTC
the odd thing is that hal works fine for all my other devices (a bunch of usb sticks, firewire hard drive, etc), as well as for the cd drive set as slave (a burner).  could it be that hal has issues with specific pieces of hardware?

i'm ordering a new dvd burner soon, but i'm going to try swapping the position of the drives on the motherboard to see if that has any affect...
Comment 21 Mark Duckworth 2005-05-02 00:00:20 UTC
The plot thickens!  Insert DVD -> system crash.  Remove -> system boots.  Well after system is booted, Insert DVD -> Instant Crash.  So in MY case, having a DVD in the drive + not using nvnet is what is causing the MCE.  However, if I have a CD in my cd burner all is well, in addition to all the other devices and things hal supports, they all seem to work except DVD for me.
Comment 22 Mark Duckworth 2005-05-02 00:02:13 UTC
Zeigs, Do you use serial ATA?  Perhaps it could be something to do with hda not existing?  That's probably somewhat uncommon in a desktop linux system.
Comment 23 ziegs 2005-05-04 15:51:47 UTC
mark: i use sata for my hard disk, but the dvd drive and burner are on the secondary IDE, with 2 other hard disks on the primary.

i have that same deal with the insert dvd-> crash, remove->boot insert->crash, however even with nvnet it crashes...just it takes longer

my new dvd-rw drive is gonna arrive tomorrow so i held off on playing with ide cables until that gets here, then i'll do some playing to see if its the drive or the controller or whatever.  btw the current dvd drive is (according to dmesg) a _NEC DV-5700A.  it was ripped from an old dell (its the stock OEM drive they used to put in back in the day)
Comment 24 ziegs 2005-05-05 10:23:37 UTC
okay i installed my new dvd drive and...it works perfectly.  it reads dvds (both video and data), data cds, and audio cds, and i haven't gotten a panic yet.  i'm using the nvnet module so that could be it...it could also be the newer hardware.

perhaps if you have a newer drive lying around try that mark?
Comment 25 Mark Duckworth 2005-05-05 12:38:06 UTC
Grr, maybe I should have paid more than $12 for my DVD drive.  Some cheapo generic 4x I bought at a computer show YEARS ago.
Comment 26 Joseph 2005-07-21 14:45:56 UTC
I'm getting similar message on a new box: AMD64, Sata Drive:
 
Kernel panic - not syncing: Aiee, killing interupt handler.
I'm only able to compile 2 or three packages (from KDE-base) and it crashes.

#Joseph
Comment 27 Simon Stelling (RETIRED) gentoo-dev 2005-12-26 02:08:26 UTC
this bug was last changed >5 months ago, anybody still suffering from it with newer versions of hal/dbus/kernel? 
Comment 28 Doug Goldstein (RETIRED) gentoo-dev 2005-12-27 11:36:06 UTC
Anyone still having issues with the dbus-0.50 series or hal-0.5.x series stuff please comment and reopen this bug.