System totally locks up at random.
2.6.28* The last stable for me on x86.
2.6.29* Encountered suspend/hibernate issues with net.eth0 started.
2.6.30* System randomly freezes.
Steps to Reproduce:
Created attachment 194467 [details]
My Kernel Config
For kicks, here's my kernel config.
Since system log isn't showing much, all (I) can do is wait. :-/
Seems this is more of a bug using the =nvidia-driver-180.60 driver and =sys-kernel/gentoo-sources-2.6.30-r1 sources.
Using kernel 2.6.29, booting with framebuffer enabled did not hinder Xorg at all with the binary nvidia driver (although it was been known to do on previous versions).
However, 2.6.30* booting with framebuffer enabled conflicts with X/Xorg usage causing a hard system freeze!
(To the best of my knowledge, this is the source of my bug, as such, am renaming the title more appropriately!)
If somebody is on the NVidia mailing list, please feel free to report upstream!
Well, I was chugging along for several hours on 2.6.30, and it froze without framebuffer enabled.
The only thing I had done within the past minutes was change the sym link /usr/src/linux to point to /usr/src/linux-2.6.30-gentoo-r1.
(Changing the title back to a general hard system freeze on this kernel. This can probably be further troubleshot by running without any tainted modules. Past experience dictates it usually is the nvidia binary driver just acting up at very peculiar times!)
Please reopen this bug report once you have figured out what causes your system to hang. Right now it looks like it could be anything, including hardware failures induced by newer kernels' features.
1) What really /is/ a "hard freeze"?
2) Severity should certainly not be "Blocker".
3) You omitted posting `emerge --info'.
> Please reopen this bug report once you have figured out what causes your system
> to hang. Right now it looks like it could be anything, including hardware
> failures induced by newer kernels' features.
Right now, ruled-out with the use of kernel-2.6.29*. I'm doing just fine with 2.6.29. 2.6.30* *is* crashing. I'll gladly follow-up stating I'm wrong, but 2.6.29* was up all last night compiling.
New kernel features is why I filed this bug. If it is, it can be a very very long time until I track it down! (So, file it and make some noise so others are aware something is going on.)
> Extra hints:
> 1) What really /is/ a "hard freeze"?
A "hard freeze" is a brief simple analogy of a computer freezing without being able to be avert or stop. Basically, sys req keys stop functioning as well as tty (& usually including external serial terminals). However, I think it's a lot easier for experienced Linux users to just state something like "Hard Freeze".
Due to past numerous experiences with the NVidia binary driver, these hard system freezes with sparse debug info are usually caused by the proprietary binary drivers and is a very good starting point for debugging.
Leaving as "NEEDINFO" as I have no additional relevant info besides being able to theorize it's likely the NVidia driver -- especially due to the numerous kernel changes, as well as past history of the NVidia binary driver!
*NOTE* this bug occurs without framebuffer enabled. However, it makes itself more readily apparent at the start of X with it enabled.
you can check following bugreport if you see some similarity.
[Bug 13933] System lockup on dual Pentium-3 with kernel
I've run this down to two bugs with kernel-2.6.30
If the user has data=ordered within /etc/fstab for ext3 mount options, booting will halt because the kernel cannot mount the filesystem as the kernel feature is now optional and considered not desirable. <shrugs> I prefer stability rather then corrupt photos.
Intel e100 module is locking the kernel randomly. Tough to debug because even serial console freezes. But this can be related to the other numerous PCI bus issues, such as the bug you suggested.
Ditto, I do have dual P3's, but I got 16+ hours of uptime before rebooting without loading e100 (and black listing it from loading). So I'm pointing my finger at the e100 module, or maybe a PCI bus related issue. But with the numerous changes to e100.c within the past kernel version, bets are on e100.c as the cause.
I've examined http://bugzilla.kernel.org/show_bug.cgi?id=13933
and it appears to be a bug being spawned on SMP only systems.
A work around (hack) is to boot with "nosmp" kernel boot parameter. If it resolves the freezes, it's then related to Linux Kernel Bug 13933. As of now, the only detection of this bug is with the "nosmp" parameter as they haven't been able to get standard debugging working with the kernel yet. It looks like they're using GIT to hack in patching or something from what I've scanned over.
For the next few nights, I'll be simply testing the "nosmp" flag, along with determining if e100.c code is anyway involved.
<shrugs> Would rather get decent GDB/KGDB output rather then playing games with the kernel. ;-)
I can add some further confirmation of this problem.
I have been experiencing random freezes on a machine with two AMD Athlon MP processors (i.e. an SMP system) and with an Intel 100-megabit Ether Express NIC chipset. These problems only began when I upgraded to a 2.6.30 kernel, specifically gentoo-sources-2.6.30-r4.
As of now, I have reverted to a 2.6.29 kernel.
David, you need to see the Linux Kernel Bug I stated in Comment #9. Please go to the actual kernel.org url. The wiki incorrectly assumes the shortened Bug # as a gentoo.org bug. This should be a kernel.org bug.
And, this is fixed upstream in kernel.org.
Gentoo should probably backport the 2.6.31 fix to 2.6.30, as they have stated 2.6.30 as being rock stable (even after all of us SMP'ers complaining -- but will probably wait like me until 2.6.31 is out).
(In reply to comment #11)
> David, you need to see the Linux Kernel Bug I stated in Comment #9. Please go
> to the actual kernel.org url. The wiki incorrectly assumes the shortened Bug #
> as a gentoo.org bug. This should be a kernel.org bug.
Thanks, Roger. That bug reads like a whodunnit!
I'll wait and see what happens with the Gentoo kernels.
In the meantime, I have reopened this bug as it hasn't yet been fixed on Gentoo (downstream for 2.6.30) as it is a severe/critical issue on all SMP boxes.
I'll wait to mark it fixed/resolved as gentoo-sources are still using 2.6.30 as stable and wait to mark it as such, until 1) either the patch is backported or 2) -- more likely -- gentoo-sources releases >2.6.30 w/ the incorporated patch.
I too am now sticking with either 2.6.29 (or git-sources-2.6.31*).
Thanks for tracking down this bug and verifying.
That upstream SMP bug does look very relevant. Assigning report to kernel team, I think they will want to see this...
(In reply to comment #14)
> That upstream SMP bug does look very relevant. Assigning report to kernel team,
> I think they will want to see this...
I've been experiencing this problem ever since I tried to update from 2.6.28-r5 straight to the 2.6.30-rx series. All the 30's seem to have this instability and I haven't been able keep a machine running more than a day at best. I'm using old Dell server boxes (PIII's) with a frame-buffer, but I think it's an ATI Rage 128 video adapter, so I think the problem probably is not an nVidia issue. I do wonder about the e100 net card driver though - I am using that. On the 2.6.28 build I used the eepro100 driver and I did have to delve into the kernel config to bring e100 into play after it appears eepro100 was deleted (humph).
I wish I could shed more light on this, but there sod-all in the logs so I'm as much in the dark as everyone else really. I only post this comment in the hope that some thread emerges from our respective experiences.
This is a KERNEL BUG! It has been resolved upstream!
See for more info:
(FYI: This bug is still not fixed in 2.6.30* series. It is fixed in GIT versions.)
I'm going to change the title of this bug to show it's resolved upstream.
The fix for this bug:
x86: don't call '->send_IPI_mask()' with an empty mask
... has been now officially incorporated in 184.108.40.206 released today Sep 09,2009.
... so we just need to wait for this release to be merged into Portage now. ;-)
The patch is now released with the new genpatches-2.6.30-8 release.
I'm pressuming, a new gentoo-sources with this patch included will not be released for sometime???
From what I'm seeing gentoo-sources-2.6.30-r6 still is *only* at patch "genpatches-2.6.30-7 release"!
As such, I think it's a bit premature to close the bug since the release hasn't been published yet.
(In reply to comment #19)
> I'm pressuming, a new gentoo-sources with this patch included will not be
> released for sometime???
> From what I'm seeing gentoo-sources-2.6.30-r6 still is *only* at patch
> "genpatches-2.6.30-7 release"!
gentoo-souces-2.6.30-r7 contains genpatches-2.6.30-8 . Not sure what you're looking at.
Latest gentoo-source ChangeLog entry:
*gentoo-sources-2.6.30-r7 (16 Sep 2009)
16 Sep 2009; Mike Pagano <firstname.lastname@example.org>
Linux patch versions 220.127.116.11 and 18.104.22.168. Header fix for sysrq.h to
(In reply to comment #21)
> Latest gentoo-source ChangeLog entry: