Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 100023 - 2.5GHz PowerMac frequently powers off under load.
Summary: 2.5GHz PowerMac frequently powers off under load.
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: PPC64 Linux
: High major (vote)
Assignee: Daniel Drake (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-23 06:04 UTC by Malcolm Purvis
Modified: 2005-12-29 11:58 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Kernel config (config-2.6.12-gentoo-r8,28.16 KB, text/plain)
2005-08-15 19:32 UTC, Malcolm Purvis
Details
Syslog messages from boot to crash. (messages,86.09 KB, text/plain)
2005-08-15 19:33 UTC, Malcolm Purvis
Details
Syslog messages from boot to crash. (messages,86.09 KB, text/plain)
2005-08-15 19:36 UTC, Malcolm Purvis
Details
therm_pm72.c.patch (therm_pm72.c.patch,1.48 KB, patch)
2005-10-10 01:36 UTC, Markus Rothe (RETIRED)
Details | Diff
therm_pm72.c.patch (therm_pm72.c.patch,611 bytes, patch)
2005-12-19 02:27 UTC, Markus Rothe (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Malcolm Purvis 2005-07-23 06:04:38 UTC
I have a 2.5GHz PowerMac (7,3 architecture) that frequently turns itself off   
under load when running the 2.5.12 kernel.   
   
Operations that will make this happen include calculating the module 
dependencies during boot and the command 'emerge --update --deep --pretend 
world'.  In the latter case it crashes while calculating dependencies.   
   
When the crash does occur, the machine is starting to do something CPU   
intensive. The fans start getting louder and then the machine turns itself off.  
  
This problem does not occur with the 2.6.11 kernel, so I suspect that there is  
a problem with the thermal suppport for this machine in the newer kernel.   

Reproducible: Sometimes
Steps to Reproduce:
1. Install the latest stable kernel on a 2.5GHz PowerMac  
2. Boot, run 'emerge --sync',  and 'emerge --update --deep --pretend world' 
3.  
  
Actual Results:  
The machine should turn itself off when the fans start up  

Expected Results:  
Should run to completion  

The problem has also been mentioned in the forums: 
http://forums.gentoo.org/viewtopic-t-332504-highlight-ppc64+g5.html
Comment 1 Omkhar Arasaratnam (RETIRED) gentoo-dev 2005-07-28 08:58:32 UTC
can you tell me exactly which kernel you're using (gentoo-sources,
vanilla-sources etc..)?
Comment 2 Malcolm Purvis 2005-07-29 05:06:35 UTC
I had this problem when running both gentoo-sources 2.6.12-r6 and 
gentoo-sources 2.6.12-r4.  The problem hasn't occurred when I'm running 
gentoo-sources 2.6.11-r8. 
 
In both cases my configuration has been the default config file (from 
arch/ppc64/configs) with the addition of support for the HFS+ filesystem. 
 
I hope that this helps. 
Comment 3 Omkhar Arasaratnam (RETIRED) gentoo-dev 2005-07-30 10:05:03 UTC
Does this occur on other machines (trying to isolate if this maybe a a hardware
issue)?

also, is there anything in your system log that you would be able to share?
dependant on how soon the machine powers off there maybe some interesting tidbits.
Comment 4 Markus Rothe (RETIRED) gentoo-dev 2005-08-08 12:42:48 UTC
I am also running a PowerMac7,3 with kernel 2.6.12 and have no problems so far.
Can you please provide your emerge --info output? Maybe you have some bad kernel
headers, CFLAGS, or other "not so common" installation.
Comment 5 Malcolm Purvis 2005-08-15 19:32:28 UTC
Created attachment 66042 [details]
Kernel config
Comment 6 Malcolm Purvis 2005-08-15 19:33:19 UTC
Created attachment 66043 [details]
Syslog messages from boot to crash.
Comment 7 Malcolm Purvis 2005-08-15 19:36:21 UTC
Created attachment 66044 [details]
Syslog messages from boot to crash.
Comment 8 Malcolm Purvis 2005-08-15 19:39:31 UTC
Comment on attachment 66043 [details]
Syslog messages from boot to crash.

Sorry for the double post
Comment 9 Malcolm Purvis 2005-08-15 20:10:11 UTC
I have finally had a chance to perform some more testing on this problem.  Alas  
I do not have access to another G5, but I haven't had similar problems when  
running MacOS 10.3 or 10.4.   When running gentoo-sources-2.6.11-r8 this 
problem has happened once since I posted this bug, and the machine has had 
quite active loads applied to it in that time.  In contrast I can get it to 
crash within 5 minutes using gentoo-sources-2.6.12-r8. 
 
Attached is my config file for gentoo-sources-2.6.12-r8 and the sysylog output 
from boot to crash. 
 
The execution sequence was: 
 
- Boot kernel 
- Let X bomb out because /dev/mouse wasn't present (this is the cause of the X 
errors in the syslog.  I didn't see this 2.6.11-r4 and haven't investigated 
this further). 
- Login as root 
- Run 'emerge --update --deep --pretend world' multiple times (about 7) until 
the crash occurs.   
   
The contents of 'emerge --info' are:   
   
Portage 2.0.51.22-r2 (default-linux/ppc64/2005.0, gcc-3.4.4,   
glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r8 ppc64)   
=================================================================   
System uname: 2.6.11-gentoo-r8 ppc64 PPC970FX, altivec supported   
Gentoo Base System version 1.6.13   
dev-lang/python:     2.3.5   
sys-apps/sandbox:    1.2.11   
sys-devel/autoconf:  2.13, 2.59-r6   
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5   
sys-devel/binutils:  2.15.90.0.3-r5   
sys-devel/libtool:   1.5.18-r1   
virtual/os-headers:  2.6.11-r2   
ACCEPT_KEYWORDS="ppc64"   
AUTOCLEAN="yes"   
CBUILD="powerpc64-unknown-linux-gnu"   
CFLAGS="-O2 -pipe -mcpu=970 -mtune=970"   
CHOST="powerpc64-unknown-linux-gnu"   
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"   
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"   
CXXFLAGS="-O2 -pipe -mcpu=970 -mtune=970"   
DISTDIR="/usr/portage/distfiles"   
FEATURES="autoconfig distlocks sandbox sfperms strict"   
GENTOO_MIRRORS="ftp://malcolmpurvis:smokey@premium.ftp.planetmirror.com/pub/gentoo/"   
MAKEOPTS="-j3"   
PKGDIR="/usr/portage/packages"   
PORTAGE_TMPDIR="/var/tmp"   
PORTDIR="/usr/portage"   
SYNC="rsync://rsync.au.gentoo.org/gentoo-portage"   
USE="X altivec bash-completion berkdb bitmap-fonts bzip2 cups eds emacs emboss   
esd fam fortran gdbm gif gnome gpm gstreamer gtk gtk2 imlib java jpeg kde   
libwww motif mozilla ncurses nls opengl pam perl png postgres ppc64 python qt   
readline ssl tcpd tetex tiff truetype truetype-fonts type1-fonts unicode xml2   
xv zlib userland_GNU kernel_linux elibc_glibc"   
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY   
   
   
Comment 10 Brent Baude (RETIRED) gentoo-dev 2005-08-16 06:36:02 UTC
Do you get the same results with vanilla-sources?
Comment 11 Malcolm Purvis 2005-09-18 05:34:14 UTC
Since the last update to this bug I've tried a number of different kernels and
configurations with pretty much the same result.  This includes
vanilla-sources-2.6.13 and gentoo-sources-2.6.13.

The problem does seem to be related to increasing load.  A common crash point is
during the calculation of module dependencies at boot.  At this point the fans
are at their loudest since startup.  However, if the root file system is
force-checked  during boot (because the machine as been rebooted so often :-)
then this reduces the CPU load, the fans slow down and then the calculation of
module dependencies passes without incident.

This problem has also occured while using the 2005.1 install CD.  This
installation kerel turned off while I was decompressing the stage3 tarball using
the tar command line listed in the handbook.

I'm beginning to suspect the suggestion above that it might be a hardware
problem.  My only hesitation is that I do not see this problem at all when
running MacOS X (10.3 and 10.4).

I'm not going to try installing some other distributions to see if that will
provide some useful data on the problem.
Comment 12 Malcolm Purvis 2005-10-01 06:23:37 UTC
Examing the source code to the G5 fan driver in vanilla 2.6.13.2 I have
discovered a possible cause.

When the machine first gets to hot (line therm_pm72.c:1664) or stays too hot for
too long (line 1675) then the machine could be powered off, which is the symptom
that I'm seeing.

In the first case it is shut off because an attempt to run the userland program
/sbin/critical_overtemp has failed.  This program is not present on the 2005.1
minimal CD, nor in the stage3 tar ball so if this code path is taken the machine
will shutdown.

Comment 13 David Craig 2005-10-05 02:58:02 UTC
I found that on my PowerMac 7,3 2x2.3Ghz that the fan just wasn't getting up to
speed in time to cool the processors down.  The fan is supposed to idle at
300rpm, but drops as low as 150rpm.  It can take up to 2 seconds to get to full
speed (3000rpm approx), but the cpu can reach a critical temperature in a much
shorter time frame.  The following change to the kernel keeps the fan at a
decent speed a whilest keeping noise at a minimum.  It eliminated the problem
for me in a room temperature of 25C.

--- linux/drivers/macintosh/therm_pm72.c        2005-10-02 17:38:44.000000000 +1000
+++ linux/drivers/macintosh/therm_pm72.c.original       2005-10-02 
@@ -512,8 +511,8 @@
        if (id == FCU_FAN_ABSENT_ID)
                return -EINVAL;
 
-       if (rpm < 1500)
-               rpm = 1500;
+       if (rpm < 300)
+               rpm = 300;
        else if (rpm > 8191)
                rpm = 8191;
        buf[0] = rpm >> 5;


Increasing the min rpm to 2500 eliminated the problem at temperatures up to 29C,
but drastically increased the noise level.
Comment 14 David Craig 2005-10-05 05:58:20 UTC
I forgot to mention, the critical temperature appears to be about 85C on either
processor.  If you get above that the kernel will call "/sbin/critical_overtemp"
(which I'm guessing should be a script that runs '/sbin/shutdown -h now') waits
30 seconds then turns the power off.

I've seen the cpu temperature reach 84.3C on one CPU in MacOS 10.4.2 without
triggering a shutdown.

If your kernel configuration contains "# CONFIG_IOMMU_VMERGE is not set" then
the kernel cannot turn the power off preventing the pesky unprompted shutdowns
(at a cost of "Shutdown timed out, power off now !" printks and the risk of
turning your shinny PowerMac into an expensive stone).  
Comment 15 Markus Rothe (RETIRED) gentoo-dev 2005-10-10 01:36:21 UTC
Created attachment 70270 [details, diff]
therm_pm72.c.patch

I contacted benh and he gave me this patch with a good comment what is going on
here:
_______________________________________________________________________________


Hi Markus ! Here's what i posted to other people with the same problem
recently, feel free to copy that to the gentoo bug, I'm still waiting
for feedback on the proposed patch.

Ben.


-------- Forwarded Message --------
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Problems with overtemp conditions on a PowerMac G5
Date: Fri, 07 Oct 2005 11:09:50 +1000

Hi!

I'm doing this group mailing to people who have reported so far issues
with the machine shutting down abruptly after putting some load on the
CPUs.

I've investigated the issue, and came up with a couple of facts here:
One is that some machines seem to have either an incorrect thermal
calibration data, or simply a defective CPU<->heatsink connection, and
the other one is that Darwin/OS X makes this problem "invisible" by
silently slowing the CPU down when it heats up too much (thus they can
claim the problem doesn't exist and don't have to service the faulty
machines I suppose).

I have made a patch to the linux thermal driver that may help. The idea
is that if the driver detects a critical thermal condition, it doesn't
shut down right away, but gives a few seconds with fans at full speed
for the condition to clear up instead of shutting down.

Please let le know if that helps
_______________________________________________________________________________


If you test this patch, then please leave a comment and/or contact benh (benh
at kernel dot crashing dot org) directly.
Comment 16 Markus Rothe (RETIRED) gentoo-dev 2005-12-19 02:27:59 UTC
Created attachment 75071 [details, diff]
therm_pm72.c.patch

now here we go! This patch solves the problem competely for me! Benh has already send this patch upstream, so it will be included in next kernel release!
Comment 17 Markus Rothe (RETIRED) gentoo-dev 2005-12-19 02:29:13 UTC
@kernel herd: Would you mind adding this to gentoo-sources?
Comment 18 Daniel Drake (RETIRED) gentoo-dev 2005-12-19 09:27:29 UTC
Looks good, but we'll wait for it to hit Linus' tree first.
Comment 20 Daniel Drake (RETIRED) gentoo-dev 2005-12-20 12:43:07 UTC
Great, thanks
Comment 21 Malcolm Purvis 2005-12-26 04:10:51 UTC
As the OP of this bug let me add my confirmation of Ben's comments in his git log.  This latest patch doesn't eliminate the crashes entirely but reduces them so significantly that my machine is stable enough for my purposes.

I'd encourage this latest patch be added to gentoo-sources as soon as possible.
Comment 22 Daniel Drake (RETIRED) gentoo-dev 2005-12-29 11:58:59 UTC
Fixed in genpatches-2.6.14-7 (gentoo-sources-2.6.14-r6)