Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 113156 - [doc] mention app-admin/mcelog in amd64 docs
Summary: [doc] mention app-admin/mcelog in amd64 docs
Status: RESOLVED FIXED
Alias: None
Product: [OLD] Docs-user
Classification: Unclassified
Component: Handbook (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Shyam Mani (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-21 06:04 UTC by Andreas Arens
Modified: 2006-02-26 06:40 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Arens 2005-11-21 06:04:55 UTC
Newer X86_64 2.6 kernels require to regularly run mcelog as provided by the
app-admin/mcelog package.
While this works after emerging said package, it should be an unconditional part
of the distribution, and therefore the core system should depend on it.
Detail can be found in Andi Kleen's (X86_64 kernel maintainer) lkml post:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113121225914384&w=2



Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1 Tres 'RiverRat' Melton 2005-11-21 06:17:15 UTC
Well I don't have it running on mine so clearly it is not required.  However,
the official x86-64.org site does suggest running it from a cronjob on a regular
basis.  And they aren't very ambigous about it either.

ftp://ftp.x86-64.org/pub/linux/tools/mcelog/README
Comment 2 Patrick McLean gentoo-dev 2005-11-21 09:11:35 UTC
At the moment mcelog is marked ~amd64, so we would have to stabilize this before
putting it in the system profile.

What would we do about the case where someone is running a cron-less system?

This seems to compile and run error-free on all my amd64 systems, though it
hasn't produced any output on any of them.
Comment 3 Simon Stelling (RETIRED) gentoo-dev 2005-11-21 09:54:43 UTC
beside the dependency problem (package in system depends on virtual/cron, so
cron has to be in system which is a non-trivial change) i lack to see why this
package has to be in system at all. worst thing that can happen to you is that
you never read all the messages. i've been running my system for about 2 years
without mcelog, and didn't notice any problems, so i don't think it belongs to
system.. do you have another source that clearly states not running mcelog will
break your system or cause any problems?
Comment 4 Andreas Arens 2005-11-21 10:41:29 UTC
Well, the problem is not so much that the system won't work or even fail without
this running, but that you will fail to notice that your system is possibly
building up hardware issues. x86_64 kernels are special in this way, as they do
not log such events to the usual syslog, but use the mcelog facility.
Here's some more detailed package description from the author (SuSE package
description):
"Linux x86-64 kernels since 2.6.4 don't print recoverable machine check errors
to the kernel log anymore. Instead they are saved into a special kernel buffer
accessible using /dev/mcelog. mcelog reads /dev/mcelog and prints the stored
machine check records to stdout. Then the stored machine check records in the
kernel buffer are deleted."

Of course this is a border case, since it only becomes important when the
hardware becomes unstable, but for the reasonable admin the should at least be a
hint in the documentation. If integrating this requires the core to depend on
virtual/cron, it's probably not worth the fuzz.
Comment 5 Jory A. Pratt 2005-11-21 10:46:38 UTC
problem still comes down to the USER has to add support in the kernel for this
to even work. I do not see a reason to install a package that is not even gonna
work unless kernel support is there to begin with.
Comment 6 Simon Stelling (RETIRED) gentoo-dev 2005-11-21 10:48:14 UTC
Andreas: Good idea, we should mention it in our docs.
Comment 7 Marcus D. Hanwell (RETIRED) gentoo-dev 2005-11-22 16:50:51 UTC
This seems like the best solution to me too - add a section to our docs as we 
cannot force kernel configurations on users as other distros do. 
Comment 8 Simon Stelling (RETIRED) gentoo-dev 2005-12-26 07:28:55 UTC
just FYI: i stablized 0.4-r1 a few days ago
Comment 9 Tom Martin (RETIRED) gentoo-dev 2006-01-15 14:39:34 UTC
Okay, so we need to have something like:

...

The x86_64 kernel maintainer strongly recommends users enable MCE features so that they are able to be notified of any hardware problems. This requires the app-admin/mcelog package.

(Choose appropriate)
Processor type and features --->
  [*] Intel MCE features
  [*] AMD MCE features

...

In the 'Configuring the Kernel' section.

Have I missed anything?
Comment 10 Simon Stelling (RETIRED) gentoo-dev 2006-01-16 08:54:03 UTC
> The x86_64 kernel maintainer strongly recommends users enable MCE features so
> that they are able to be notified of any hardware problems. This requires the
> app-admin/mcelog package.

This sounds like 'kernel maintainers are watching you' to me :D You probably meant '... that *you* are able to ...', right? You should probably explain that AMD64 users need mcelog because those messages aren't printed to dmesg but to /dev/mcelog. I think this is important to know because if you think your hardware is buggy, first place you look into is probably the output of dmesg and /var/log/messages

Comment 11 Tom Martin (RETIRED) gentoo-dev 2006-01-16 10:45:26 UTC
(In reply to comment #10)
> > The x86_64 kernel maintainer strongly recommends users enable MCE features so
> > that they are able to be notified of any hardware problems. This requires the
> > app-admin/mcelog package.
> 
> This sounds like 'kernel maintainers are watching you' to me :D You probably
> meant '... that *you* are able to ...', right? 

I sort of thought about this, I wasn't sure. I think you're probably right.

> You should probably explain that
> AMD64 users need mcelog because those messages aren't printed to dmesg but to
> /dev/mcelog. I think this is important to know because if you think your
> hardware is buggy, first place you look into is probably the output of dmesg
> and /var/log/messages

Yeah, very true.

Comment 12 Shyam Mani (RETIRED) gentoo-dev 2006-02-13 11:43:16 UTC
Will include this in the latest handbook
Comment 13 Shyam Mani (RETIRED) gentoo-dev 2006-02-13 12:25:34 UTC
Fixed for 2006.0.

I'm guessing we need to backport this as well to the other handbooks?
Comment 14 Simon Stelling (RETIRED) gentoo-dev 2006-02-13 12:44:20 UTC
not sure if it's worth the hassle as it's really not a critical issue, but it would certainly be nice to have :)
Comment 15 Shyam Mani (RETIRED) gentoo-dev 2006-02-26 06:40:51 UTC
Done :)