I have a 2.5GHz PowerMac (7,3 architecture) that frequently turns itself off under load when running the 2.5.12 kernel. Operations that will make this happen include calculating the module dependencies during boot and the command 'emerge --update --deep --pretend world'. In the latter case it crashes while calculating dependencies. When the crash does occur, the machine is starting to do something CPU intensive. The fans start getting louder and then the machine turns itself off. This problem does not occur with the 2.6.11 kernel, so I suspect that there is a problem with the thermal suppport for this machine in the newer kernel. Reproducible: Sometimes Steps to Reproduce: 1. Install the latest stable kernel on a 2.5GHz PowerMac 2. Boot, run 'emerge --sync', and 'emerge --update --deep --pretend world' 3. Actual Results: The machine should turn itself off when the fans start up Expected Results: Should run to completion The problem has also been mentioned in the forums: http://forums.gentoo.org/viewtopic-t-332504-highlight-ppc64+g5.html
can you tell me exactly which kernel you're using (gentoo-sources, vanilla-sources etc..)?
I had this problem when running both gentoo-sources 2.6.12-r6 and gentoo-sources 2.6.12-r4. The problem hasn't occurred when I'm running gentoo-sources 2.6.11-r8. In both cases my configuration has been the default config file (from arch/ppc64/configs) with the addition of support for the HFS+ filesystem. I hope that this helps.
Does this occur on other machines (trying to isolate if this maybe a a hardware issue)? also, is there anything in your system log that you would be able to share? dependant on how soon the machine powers off there maybe some interesting tidbits.
I am also running a PowerMac7,3 with kernel 2.6.12 and have no problems so far. Can you please provide your emerge --info output? Maybe you have some bad kernel headers, CFLAGS, or other "not so common" installation.
Created attachment 66042 [details] Kernel config
Created attachment 66043 [details] Syslog messages from boot to crash.
Created attachment 66044 [details] Syslog messages from boot to crash.
Comment on attachment 66043 [details] Syslog messages from boot to crash. Sorry for the double post
I have finally had a chance to perform some more testing on this problem. Alas I do not have access to another G5, but I haven't had similar problems when running MacOS 10.3 or 10.4. When running gentoo-sources-2.6.11-r8 this problem has happened once since I posted this bug, and the machine has had quite active loads applied to it in that time. In contrast I can get it to crash within 5 minutes using gentoo-sources-2.6.12-r8. Attached is my config file for gentoo-sources-2.6.12-r8 and the sysylog output from boot to crash. The execution sequence was: - Boot kernel - Let X bomb out because /dev/mouse wasn't present (this is the cause of the X errors in the syslog. I didn't see this 2.6.11-r4 and haven't investigated this further). - Login as root - Run 'emerge --update --deep --pretend world' multiple times (about 7) until the crash occurs. The contents of 'emerge --info' are: Portage 2.0.51.22-r2 (default-linux/ppc64/2005.0, gcc-3.4.4, glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r8 ppc64) ================================================================= System uname: 2.6.11-gentoo-r8 ppc64 PPC970FX, altivec supported Gentoo Base System version 1.6.13 dev-lang/python: 2.3.5 sys-apps/sandbox: 1.2.11 sys-devel/autoconf: 2.13, 2.59-r6 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5 sys-devel/binutils: 2.15.90.0.3-r5 sys-devel/libtool: 1.5.18-r1 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="ppc64" AUTOCLEAN="yes" CBUILD="powerpc64-unknown-linux-gnu" CFLAGS="-O2 -pipe -mcpu=970 -mtune=970" CHOST="powerpc64-unknown-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-O2 -pipe -mcpu=970 -mtune=970" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks sandbox sfperms strict" GENTOO_MIRRORS="ftp://malcolmpurvis:smokey@premium.ftp.planetmirror.com/pub/gentoo/" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.au.gentoo.org/gentoo-portage" USE="X altivec bash-completion berkdb bitmap-fonts bzip2 cups eds emacs emboss esd fam fortran gdbm gif gnome gpm gstreamer gtk gtk2 imlib java jpeg kde libwww motif mozilla ncurses nls opengl pam perl png postgres ppc64 python qt readline ssl tcpd tetex tiff truetype truetype-fonts type1-fonts unicode xml2 xv zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Do you get the same results with vanilla-sources?
Since the last update to this bug I've tried a number of different kernels and configurations with pretty much the same result. This includes vanilla-sources-2.6.13 and gentoo-sources-2.6.13. The problem does seem to be related to increasing load. A common crash point is during the calculation of module dependencies at boot. At this point the fans are at their loudest since startup. However, if the root file system is force-checked during boot (because the machine as been rebooted so often :-) then this reduces the CPU load, the fans slow down and then the calculation of module dependencies passes without incident. This problem has also occured while using the 2005.1 install CD. This installation kerel turned off while I was decompressing the stage3 tarball using the tar command line listed in the handbook. I'm beginning to suspect the suggestion above that it might be a hardware problem. My only hesitation is that I do not see this problem at all when running MacOS X (10.3 and 10.4). I'm not going to try installing some other distributions to see if that will provide some useful data on the problem.
Examing the source code to the G5 fan driver in vanilla 2.6.13.2 I have discovered a possible cause. When the machine first gets to hot (line therm_pm72.c:1664) or stays too hot for too long (line 1675) then the machine could be powered off, which is the symptom that I'm seeing. In the first case it is shut off because an attempt to run the userland program /sbin/critical_overtemp has failed. This program is not present on the 2005.1 minimal CD, nor in the stage3 tar ball so if this code path is taken the machine will shutdown.
I found that on my PowerMac 7,3 2x2.3Ghz that the fan just wasn't getting up to speed in time to cool the processors down. The fan is supposed to idle at 300rpm, but drops as low as 150rpm. It can take up to 2 seconds to get to full speed (3000rpm approx), but the cpu can reach a critical temperature in a much shorter time frame. The following change to the kernel keeps the fan at a decent speed a whilest keeping noise at a minimum. It eliminated the problem for me in a room temperature of 25C. --- linux/drivers/macintosh/therm_pm72.c 2005-10-02 17:38:44.000000000 +1000 +++ linux/drivers/macintosh/therm_pm72.c.original 2005-10-02 @@ -512,8 +511,8 @@ if (id == FCU_FAN_ABSENT_ID) return -EINVAL; - if (rpm < 1500) - rpm = 1500; + if (rpm < 300) + rpm = 300; else if (rpm > 8191) rpm = 8191; buf[0] = rpm >> 5; Increasing the min rpm to 2500 eliminated the problem at temperatures up to 29C, but drastically increased the noise level.
I forgot to mention, the critical temperature appears to be about 85C on either processor. If you get above that the kernel will call "/sbin/critical_overtemp" (which I'm guessing should be a script that runs '/sbin/shutdown -h now') waits 30 seconds then turns the power off. I've seen the cpu temperature reach 84.3C on one CPU in MacOS 10.4.2 without triggering a shutdown. If your kernel configuration contains "# CONFIG_IOMMU_VMERGE is not set" then the kernel cannot turn the power off preventing the pesky unprompted shutdowns (at a cost of "Shutdown timed out, power off now !" printks and the risk of turning your shinny PowerMac into an expensive stone).
Created attachment 70270 [details, diff] therm_pm72.c.patch I contacted benh and he gave me this patch with a good comment what is going on here: _______________________________________________________________________________ Hi Markus ! Here's what i posted to other people with the same problem recently, feel free to copy that to the gentoo bug, I'm still waiting for feedback on the proposed patch. Ben. -------- Forwarded Message -------- From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Subject: Problems with overtemp conditions on a PowerMac G5 Date: Fri, 07 Oct 2005 11:09:50 +1000 Hi! I'm doing this group mailing to people who have reported so far issues with the machine shutting down abruptly after putting some load on the CPUs. I've investigated the issue, and came up with a couple of facts here: One is that some machines seem to have either an incorrect thermal calibration data, or simply a defective CPU<->heatsink connection, and the other one is that Darwin/OS X makes this problem "invisible" by silently slowing the CPU down when it heats up too much (thus they can claim the problem doesn't exist and don't have to service the faulty machines I suppose). I have made a patch to the linux thermal driver that may help. The idea is that if the driver detects a critical thermal condition, it doesn't shut down right away, but gives a few seconds with fans at full speed for the condition to clear up instead of shutting down. Please let le know if that helps _______________________________________________________________________________ If you test this patch, then please leave a comment and/or contact benh (benh at kernel dot crashing dot org) directly.
Created attachment 75071 [details, diff] therm_pm72.c.patch now here we go! This patch solves the problem competely for me! Benh has already send this patch upstream, so it will be included in next kernel release!
@kernel herd: Would you mind adding this to gentoo-sources?
Looks good, but we'll wait for it to hit Linus' tree first.
it hit Linus' tree: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6ee7fb7e363aa8828b3920422416707c79f39007
Great, thanks
As the OP of this bug let me add my confirmation of Ben's comments in his git log. This latest patch doesn't eliminate the crashes entirely but reduces them so significantly that my machine is stable enough for my purposes. I'd encourage this latest patch be added to gentoo-sources as soon as possible.
Fixed in genpatches-2.6.14-7 (gentoo-sources-2.6.14-r6)