Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 248820 - sys-power/athcool causes massive filesystem corruption; upstream was informed but did not respond
Summary: sys-power/athcool causes massive filesystem corruption; upstream was informed...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-25 20:29 UTC by Jorge Peixoto de Morais Neto
Modified: 2008-11-27 22:55 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jorge Peixoto de Morais Neto 2008-11-25 20:29:20 UTC
Hi. Athcool sets the "Disconnect enable when STPGNT detected" bit in the Northbridge. This drastically reduces CPU temperature but, when using a PixelView PV-M4900 FM.RC analog TV capture board, results in erratic system behavior, including crashes. The instability seems to be higher when I capture TV (with mencoder) as opposed to just watch it (with mplayer). Twice, the crashes resulted in serious filesystem corruptions that were only fixed with reiserfsck --rebuild-tree. In one of these occasions, reiserfsck was unable to recover some of the data, resulting in data loss.

I have reported this to the upstrem author through the email jacobi <ATSIMBOL> jcom DOT home DOT ne DOT jp in 2008-08-20 but he did not respond, did not updated his software and did not include this vital information in the software's web page. In fact, the simple fact that the last athcool release is from 2007-11-05 suggests it is not maintained. 

The software already warns that its use may cause
    * noisy or distorted sound playback
    * a slowdown in harddisk performance
    * system locks or instability 
But that list should include "massive filesystem corruption", and cite that this is particularly true for users of PixelView PV-M4900 FM.RC.
In fact, when executing the software there is a warning of "this can cause massive filesystem corruption (rare, but observed at least once)". This information should be in the webpage and in the ewarn, and, after my bug report, the phrasing "rare, but observed at least once" should be replaced by "observed at least twice".

And in my opinion, athcool is absolutely not x86 stable. It should be at most ~x86, and possibly hardmasked. 

My problem has occurred in September 2007 (sorry for taking a year to report) and I don't remember what kernel version I used at the time. I don't remeber it the athcool version 0.3.11 or 0.3.11-r1, but this is irrelevant given the little difference in the two ebuilds. Also, this is a case of hardware instability caused by the "Disconnect enable when STPGNT detected" bit; indeed, if this bit is set manually, the problem is the same.
Here is my call-for-help in the Gentoo forums http://forums.gentoo.org/viewtopic-t-580472-highlight-.html

More importantly, here is a copy of the original email I sent to the author:

Hi. I have used athcool 0.3.11-r1 on Gentoo. With power saving mode
on, I get a significant decrease in CPU temperature, and the system
seems stable as long as I do not use my TV capture card.

But when I use the TV card with power saving mode on, the system behaves
erratically. It eventually led to serious filesystem corruption. So
I suggest you to warn people not to use athcool in a computer similar
to mine (details below).

The filesystem used was Reiserfs. I was able to recover most of my
data with the command reiserfsck --rebuild-tree.
But I did lose some data and a lot of time. I do not use athcool
anymore, to be on the safe side.

Now some detailed information:

The analog TV capture card is named
Prolink PixelView PlayTV MPEG2 PV-M4900

# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
00:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
00:0b.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
00:0b.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
00:0c.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 440] (rev a3)

In the lspci output above, the TV card corresponds to the lines

00:0b.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
00:0b.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)


# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2600+
stepping        : 1
cpu MHz         : 2165.454
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4330.90
clflush size    : 32
power management: ts


Output from athcool:

# athcool on
athcool version 0.3.11 - control power-saving mode on AMD Athlon/Duron CPUs

!!!WARNING!!!
Depending on your motherboard and/or hardware components,
enabling Athlon powersaving mode may cause:
 * noisy or distorted sound playback
 * a slowdown in harddisk performance
 * system locks or instability
 * massive filesystem corruption (rare, but observed at least once)

Before use athcool, you must recognize these potential DANGERS.
Please use athcool AT YOUR OWN RISK.

athcool is supplied "as is". The author disclaims all warranties,
expressed or implied. The author and any other persons assume
no liability for damages, direct or consequential, which may
result from the use of athcool.

VIA KT400[A]/KT600 (1106 3189) found
enabling 'Disconnect when STPGNT Detected' bit ...  done
       Address 0xD2 : 0x69 -> 0xE9
enabling 'HALT Command Detection' bit ...  done
       Address 0xD5 : 0x1C -> 0x1E


More information about my motherboard:
The cover from my motherboard manual reads
KT4 Ultra
MS-6590 (v1.X) ATX Mainboard

According to the manual, these are some of the various chips the
motherboard uses:
VIA KT400 and VIA VT8235 (chipset)

Winbond W83697HF (hardware monitoring chip).
C-media CMI8738MX (audio)

Please reply to this email. I would like to know if my information have
helped.

Reproducible: Always
Comment 1 Jorge Peixoto de Morais Neto 2008-11-25 21:32:54 UTC
(In reply to comment #0)
> The software already warns that its use may cause
>     * noisy or distorted sound playback
>     * a slowdown in harddisk performance
>     * system locks or instability 
> But that list should include "massive filesystem corruption", and cite that
> this is particularly true for users of PixelView PV-M4900 FM.RC.
> In fact, when executing the software there is a warning of "this can cause
> massive filesystem corruption (rare, but observed at least once)". This
> information should be in the webpage and in the ewarn, and, after my bug
> report, the phrasing "rare, but observed at least once" should be replaced by
> "observed at least twice".

To be clearer, the phrasing "this can cause
> massive filesystem corruption (rare, but observed at least once)" does not appear in the website and does not appear when installing the ebuild; only appears when executing the software. 
I suggest:
1) The phrasing should be be changed to "this can cause massive filesystem corruption (observed at least twice)"
2) This should be in the ewarn
3) Ideally, we should inform people that at least one case of massive filesystem corruption involved a PixelView PV-M4900 FM.RC analog TV capture board
4) This software is absolutely not x86-stable
Comment 2 SpanKY gentoo-dev 2008-11-27 18:22:01 UTC
not really sure why you're surprised.  the documentation everywhere says this tool may cause instability based on your setup.  instability means anything (including file system corruption).  i dont see the need to make any changes on our end ...
Comment 3 Jorge Peixoto de Morais Neto 2008-11-27 19:14:38 UTC
(In reply to comment #2)
> not really sure why you're surprised.  the documentation everywhere says this
> tool may cause instability based on your setup.  instability means anything
> (including file system corruption).  i dont see the need to make any changes on
> our end ...
> 
I assumed that with a journaling filesystem a crash would be unlikely to lead to serious filesystem corruption. I also think many (most?) users have the same feeling. In fact, my system has been abruptly shut down (due to power loss or unstable software) several times, but only these two athcool-related crashes have caused *any* significant filesystem corruption *at all*, and these were  very serious filesystem corruptions.

Also, the fact that this software is x86-stable (and Gentoo is usually conservative when declaring software stable), and that filesystem corruption is not mentioned in the website or in the ewarn, lead me to believe it is safe.
When I actually execute the software and it says "this can cause filesystem corruption - rare, but observed at least once" I thought that fileystem corruption is indeed rare, based on all the other evidence. This is why I ask you:

1) Change the phrasing "this can cause massive filesystem corruption (rare, but observed at least once)" to "this can cause serious filesystem corruption (observed at least twice)"
2) This should be in the ewarn (presently it is only shown when executing the software)
3) Ideally, we should inform people that at least one case of massive
filesystem corruption involved a PixelView PV-M4900 FM.RC analog TV capture
board
4) This software is not x86-stable. Don't you think it is ~x86, at best?

Please consider.
Comment 4 SpanKY gentoo-dev 2008-11-27 19:30:34 UTC
it's a tool whose stability is based on the system.  it having the capability to destroy it isnt really a blocker to stable.  that'd be like saying `cat /dev/null > /dev/sda` is dangerous.

your understanding of the role of the journal is a bit incomplete.  it maintains transactions to the file system.  if the communication with the hardware itself is unstable, then having a journal on the same exact device is meaningless.  garbage out (regardless of the garbage being journaled) is still garbage.  that is what system instability implies.

but to be explicit, i added "file system corruption" to the warning list.
http://sources.gentoo.org/sys-power/athcool/athcool-0.3.11-r1.ebuild?r1=1.1&r2=1.2
Comment 5 Jorge Peixoto de Morais Neto 2008-11-27 19:55:36 UTC
(In reply to comment #4)
> it's a tool whose stability is based on the system.  it having the capability
> to destroy it isnt really a blocker to stable.  that'd be like saying `cat
> /dev/null > /dev/sda` is dangerous.
That is a user mistake that could be avoided by a reasonable understanding of the behavior of output redirection and the meaning of /dev/sda. The behavior of cat and output redirection is predictable.
athcool's behavior is unpredictable and possibly very harmful.

> 
> your understanding of the role of the journal is a bit incomplete.  it
> maintains transactions to the file system.  if the communication with the
> hardware itself is unstable, then having a journal on the same exact device is
> meaningless.  garbage out (regardless of the garbage being journaled) is still
> garbage.  that is what system instability implies.
Its not journaling I misunderstood, it is "instability". I relate "instability" with "possibility of crashes". But in the case of athcool, it means undefined behavior, like comp.lang.c's "demons fly out of your nose".

> 
> but to be explicit, i added "file system corruption" to the warning list.
> http://sources.gentoo.org/sys-power/athcool/athcool-0.3.11-r1.ebuild?r1=1.1&r2=1.2
I can't see that due to a ViewCVS error. But I ask you mention that filesystem corruption occurred at least twice.
What do you think of this: substitute the line 
" * system locks or instability"
by a more specific
" * system locks or unpredictable behavior, "
"   including serious filesystem corruption "
"   (observed at least twice)"
It would be nice to mention the PixelView PV-M4900 FM.RC analog TV capture
board. If you think this would make the ewarn too verbose, you can add it to the ebuild as a comment, and mention this bug report.

And (here you are more likely to disagree) I still consider that software with unpredictable behavior cannot be considered stable.
Comment 6 SpanKY gentoo-dev 2008-11-27 20:07:21 UTC
yes, instability here certainly means "demons fly out of your nose" and not "system randomly reboots"

the viewvc machine has delayed updates with actual cvs server, so you'll have to wait a bit to click the link

i'd prefer to not maintain a hardware database of things known to break.  that is for upstream to deal with, especially considering the exact combinations of hardware may be very exact as well as change over time wrt software versions.
Comment 7 Luca Santarelli 2008-11-27 20:19:49 UTC
(In reply to comment #5)
> And (here you are more likely to disagree) I still consider that software with
> unpredictable behavior cannot be considered stable.

I understand that you are very upset because of your data loss and I am sorry for you (having had two myself, both with reiserfs and jfs), but I cannot agree with your claim of "cannot be considered stable".

As you mention in your initial report, it's not a bug in the software since you can reproduce it by setting the bits manually. It's a bug in your hardware, cope with it as I cope with my broken mobo-raid-controller.

I've been using athcool on 3 computers for more than 3 years (in fact I don't even remember when I started using it, it may as well have been 5 years by now) without any issue.

If I was to declare "unstable" a software due to a single hardware incompatibility, I'd have to declare unstable each and every release of the linux kernel. :-)
Comment 8 Jorge Peixoto de Morais Neto 2008-11-27 21:03:12 UTC
(In reply to comment #6)
> yes, instability here certainly means "demons fly out of your nose" and not
> "system randomly reboots"
Could you then replace the word "instability" by the expression "unpredictable behavior" or "undefined behavior"?
Please consider. I believe many (most?) users associate "instability" with "system sometimes crashes". Like Windows 95.

> the viewvc machine has delayed updates with actual cvs server, so you'll have
> to wait a bit to click the link
Thank you for the explanation. I then looked it in packages.gentoo.org and found it.

> i'd prefer to not maintain a hardware database of things known to break.  that
> is for upstream to deal with, especially considering the exact combinations of
> hardware may be very exact as well as change over time wrt software versions.

Could you please then contact upstream? Just send him an email and mention this bug report. Hopefully he will pay more attention to a reputable Gentoo developer than to a random guy on the internet (me). If he still maintains the software, that is.
Comment 9 SpanKY gentoo-dev 2008-11-27 21:24:33 UTC
ive refined "instability" as suggested, cheers

http://sources.gentoo.org/sys-power/athcool/athcool-0.3.11-r1.ebuild?r1=1.2&r2=1.3
Comment 10 Jorge Peixoto de Morais Neto 2008-11-27 22:55:08 UTC
(In reply to comment #9)
> ive refined "instability" as suggested, cheers
> 
> http://sources.gentoo.org/sys-power/athcool/athcool-0.3.11-r1.ebuild?r1=1.2&r2=1.3
> 
Thank you.