I recently upgraded my kernel from 2.6.33 to 2.6.37 & began to receive multiple messages in every terminal (Konsole & XFCE's Terminal) saying kernel: [Hardware Error]: No human readable MCE decoding support on this CPU type. kernel: [Hardware Error]: Run the message through 'mcelog --ascii' to decode. these messages also occur in the Syslog file with the added line kernel: [Hardware Error]: Machine check events logged This 3rd line used to occur in Syslog also when using 2.6.33, but not the others & none of the lines was written to any terminal I was running. The messages can be eradicated by booting with 'append="nomce"', but that appears to stop the whole MCE process, not just the messages. Looking through the kernel configuration with 'make menuconfig' suggests that the problem is that CONFIG_EDAC_DECODE_MCE is what is needed, but as the help states "Decode MCEs in human-readable form (only on AMD for now)". My machine has an Intel Core2 Duo processor. Reproducible: Always Steps to Reproduce: 1. Install kernel 2.6.37 configured with MCE enabled with an Intel processor 2. Wait a few minutes & check any virtual terminals which are running. 3. Actual Results: The 2 messages above will appear several times in the terminal. Expected Results: No such messages should occur in any terminal. I don't know whether this is a bug in the Linux kernel, whose code for MCE was updated in the 2.6.36 release, which I haven't tested, or whether the unwanted messages can be suppressed by some step available elsewhere in Gentoo. It should not be necessary for users to suppress the whole MCE process, which may be necessary on some hardware to prevent overheating or data corruption, simply in order to avoid the nuisance messages in terminals. If it is a kernel bug, I'm not sure is an ordinary user can report it to the kernel developers & in any case it's more likely to be taken seriously by them if it's reported by the devs of a well-established distro like Gentoo.
Please attach your kernel .config, and paste your `emerge --info' output in a comment.
Created attachment 260639 [details] output of 'emerge --info' Output of 'emerge --info' as requested
Created attachment 260640 [details] kernel .config as requested kernel .config for 2.6.37
Did you previously have CONFIG_EARLY_PRINTK=y too? I am asking because I doubt that enabling MCE would /cause/ messages to flood your consoles. Maybe a diff between your old and new .config will tell what else got enabled by default.
Created attachment 260696 [details] diff of .config for 2.6.33 & 2.6.37 This is the diff requested. CONFIG_EARLY_PRINTK=y in both versions.
Searching through your logs, do you see any machine check messages along these lines: CPU 0: Machine Check Exception: 0 Bank 0: b200004000000800 TSC 0 PROCESSOR 0:6fb TIME 1288829692 SOCKET 0 APIC 0 If so, the go ahead and try feeding it to 'mcelog --ascii' for decoding. If there are no such messages, then this bug goes to the kernel team, who might ask for more info and can give advice on reporting it upstream.
The contents of my /var/log/ are auth.log daemon.log emerge-fetch.log faillog lastlog mail.err mail.warn ntpd.log syslog syslog.2.gz syslog.5.gz uucp.log Xorg.0.log.old ConsoleKit debug emerge.log imapd.log lpr.log mail.info messages ppp.log syslog.0 syslog.3.gz syslog.6.gz wtmp cups dmesg emerge-logs kern.log lvm2-setup.log mail.log news sandbox syslog.1.gz syslog.4.gz user.log Xorg.0.log I looked at kern.log messages syslog* using '(z)less' & none of them contains the string 'Machine Check Exception'. Would you like me to do any other searches ? Something which changed in 2.6.36 (others have reported the problem there) caused the messages to be broadcast to all running terminals. I do not have the expertise to know whether that could be directly due to the kernel or whether it would involve several steps, some of which might be outside the kernel & therefore correctable by some change eg in a system file. Thanks for your prompt responses.
It looks like nowadays the actual MCE messages go to a separate device, so they wouldn't necessarily end up in syslog: "When you see the "Machine check errors logged" message in the system log then mcelog should run to collect and decode machine check entries from /dev/mcelog. Normally mcelog should be run regularly from a cronjob." linux/Documentation/x86/x86_64 So I'm still concerned that your hardware is really generating machine check errors rather than being a false alarm. Please install app-admin/mcelog and see if you get MCE details in /var/log/mcelog. If the errors are legitimate, then the fact that they are getting prominently displayed to terminals seems like a feature rather than a bug...
(In reply to comment #8) > It looks like nowadays the actual MCE messages go to a separate device, > so they wouldn't necessarily end up in syslog: > "When you see the "Machine check errors logged" message in the system log > then mcelog should run to collect and decode machine check entries > from /dev/mcelog. Normally mcelog should be run regularly from a cronjob." > (linux/Documentation/x86/x86_64) > I'm still concerned your hardware is really generating machine check errors > rather than being a false alarm. Yes, there's no doubt it's generating real error messages, the issue is where they should be delivered (smile). > Please install app-admin/mcelog > and see if you get MCE details in /var/log/mcelog. I installed it, rebooted without the 'nomce' option & waited for the messages to appear in my user terminals, which they have, but the file/dir 'mcelog' hasn't been created. I ran 'mcelog', which gives the following output : root:506 log> mcelog HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor MCE 0 CPU 0 BANK 3 TIME 1296475963 Mon Jan 31 07:12:43 2011 MCG status: MCi status: Error enabled Threshold based error status: green MCA: corrected filtering (some unreported errors in same region) Level-2 Generic memory hierarchy error STATUS 902000420320100e MCGSTATUS 0 MCGCAP 806 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor MCE 1 CPU 1 BANK 3 ADDR 1023480 TIME 1296475963 Mon Jan 31 07:12:43 2011 MCG status: MCi status: Error enabled MCi_ADDR register valid Threshold based error status: green MCA: Generic CACHE Level-2 Generic Error STATUS 942000c20501010a MCGSTATUS 0 MCGCAP 806 APICID 1 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 I have not encountered any problems in everyday use of my machine. > If the errors are legitimate, > the fact they are getting prominently displayed to terminals > seems like a feature rather than a bug... No (another smile)! -- such messages should never be displayed in a user's terminal, as they are appearing here. Nor should they really be displayed in root's terminal unless s/he has made some move to have them displayed there. They should be logged, eg in /var/log/mcelog . Please remember that this phenomenon began only with kernel 2.6.37 for me, though others have reported it starting with 2.6.36. It was a revision of the MCE code in the kernel which caused it, not some new defect in my hardware.
Anything different with 2.6.38 kernels?
Sorry for the delay: an infected tooth needed urgent repair. There is no change with kernel 2.6.38 : the 'nomce' flag is needed or messages are spread on all running virtual terminals: see screenshot.
Created attachment 268079 [details] screenshot of unwanted messages screenshot of effect on Mutt
Created attachment 268081 [details] screenshot of unwanted messages screenshot of effect on terminal
Please take this upstream at http://bugzilla.kernel.org and post the url back here
This has already been reported as Kernel bug 30662 . I have added my own experience as a comment there.
We'll follow the upstream bug and backport fixes as identified.
Upstream closed the bug as fixed in 3.0. Closing.