Summary: | sys-kernel/gentoo-sources-2.6.25-r6: soft lockup on phenom while compiling glibc | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Marcin Deranek <marcin.deranek> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED INVALID | ||
Severity: | critical | CC: | loki_val |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
kernel config
CPU info lspci |
Description
Marcin Deranek
2008-07-12 19:34:22 UTC
Created attachment 160212 [details]
kernel config
Debugging has been done with the same config + netconsole + debugging options
http://lkml.org/lkml/2008/3/8/128 Short version: See if nmi_watchdog=1 fixes this. If so, it's probably a BIOS bug and upgrading it will solve your problem. Although some descriptions look very similar nmi_watchdog=1 did not help in my case (I own Asus M2N32 SLI DELUXE) - system got stuck at some point during glibc compilation as usual.. Created attachment 160217 [details]
CPU info
Created attachment 160219 [details]
lspci
Is your bios up to date? http://support.asus.com/download/download_item.aspx?product=1&model=M2N32-SLI%20Deluxe Yes, I'm running the latest BIOS 2001. I have tried vanilla kernel 2.6.26 with the same result. I have also tried other BIOS revision (first which supports Phenoms) with no luck either. When enabled nmi_watchdog=1 NMI counter does not increase, so I tried nmi_watchdog=2, but that one did not help either although NMI counter was increasing. For the last a couple of weeks I have been trying different setups (different gcc, kernel, glibc etc.), but none of them worked in the end. The worst part is that those lockups are random - sometimes they happen just a couple of minutes after computer starts.. Although I have already replaced PSU, chasis (better cooling) etc. they still persist. Initially I have compared that with WinXP 32-bit which did not have that. Recently I have installed WinXP x64 which has this problem as well. This would indicate that this a hardware problem - I suspect that it has something to do with USB 2.0 subsystem (USB 1.1 works fine) and possibly broken BIOS. In such case I'll close this bug as problem most probably lies somewhere else.. and sorry for bothering :-) Hi all. I am using debian/testing, but I have been experiencing the same lockups described here. I have been able to avoid the lockups (so far) by running 2.6.27-rc4 with the notsc boot flag. My current working hypothesis is that the problem is related to AMD erratum 280: ``Time Stamp Counter May Yield an Incorrect Value''. It seems that the time stamp counter on Phenom occasionally (once/day?) returns a bogus value. This problem confuses the softlockup detector, which incorrectly concludes that a task is stuck. It may also confuse the scheduler, but I have not been able to prove this. Kernel 2.6.27 accepts the notsc flag to ignore the time-stamp counter. Note that 2.6.26 and 2.6.25 accept this flag in 32-bit mode and ignore it in 64-bit mode. I haven't looked at earlier kernels. It seems like other people have reported similar lockups but nobody has a solution. If my theory turns out to be correct, the kernel should be patched to autodetect phenoms with the erratum and disable the time-stamp counter automatially. Thanx for the tip - I'll give it a try and let you know about the outcome. In the meantime I found the following post: http://www.overclock.net/amd-general/319031-phenom-9850-system-freezing-5.html which might explain why I did not have any problems under 32-bit Windows while 64-bit Linux and Windows were freezing. I tried it and it doesn't seem to help as system was locking up as usual. I saw that there was a new BIOS available on ASUS website, so I downloaded and installed it. The only visible change in the BIOS was a possibility to enable/disable AMD C1E feature (it's disabled by default). After upgrade system does not lockup that often, but it still happens (it locked up during 6th system recompilation - every round took a couple of hours to re-compile ~300 packages). I need more time to investigate it further, but due to random nature of this problem I'm rather skeptic.. It turned out to be a faulty CPU. I got the replacement a couple of days ago and from then I do not experience any problems. |