Summary: | Strange panics. | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | pakar <pakar> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED CANTFIX | ||
Severity: | major | CC: | duaneg, pakar |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
dmesg with cool'n'quiet enabled
dmesg with cool'n'quiet disabled kernelconf |
Description
pakar
2008-04-27 14:28:31 UTC
Are the panics consistent at all? This sounds like broken hardware causing random memory corruption, in which case there isn't much we can do, sorry. You could try checking for BIOS upgrades, cool'n'quiet requires BIOS support and bugs there could be causing trouble. If you want to continue investigation here please post your kernel config and full dmesg output from shortly after boot. Which motherboard you have might be useful to know, too. Created attachment 153127 [details]
dmesg with cool'n'quiet enabled
Created attachment 153129 [details]
dmesg with cool'n'quiet disabled
(In reply to comment #1) > Are the panics consistent at all? Yes, when cool'n'quiet is enabled it panics ONLY when it goes beyond 2.2Ghz, or manually setting to 2.4 or 2.5Ghz, and it does that within 1-2 seconds during 100% load. If running at 100% on both cores at or below 2.2Ghz (limiting via scaling_max_freq) it works perfectly. > This sounds like broken hardware causing random memory corruption, in which case there isn't much we can do, sorry. I disagree there to a degree. If it would be broken hardware it should also display the same issues while running at 2.5Ghz without cool'n'quiet and/or display some strange behaviour when running with cool'n'quiet enabled but below 2.2Ghz. Shure it could be some strange obsure issue with the multiplier setting, but then it should probably behave this way when setting those values in the bios with CNQ disabled (have tried all problematic multipliersettings via the bios). > You > could try checking for BIOS upgrades, cool'n'quiet requires BIOS support and bugs there could be causing trouble. Yep, and that's what i suspect. Most it might be that it's reading out the wrong vid it should set and causing problem. > If you want to continue investigation here please post your kernel config and > full dmesg output from shortly after boot. Which motherboard you have might be > useful to know, too. System-setup. Asus M3N - AMD 770 / SB600 chipset AMD64 X2 4800+ 65W AM2+ ( http://en.wikipedia.org/wiki/Athlon_64_X2#Brisbane_.2865_nm_SOI.29 ) Created attachment 153131 [details]
kernelconf
Spelling-correction Yep, and that's what i suspect. Might be that it's reading out the wrong vid it should set and that might be causing problems. Sorry, by consistent panics I meant panicing in a consistent place in the kernel. I.e. something that would point to a kernel issue instead of a BIOS/hardware issue. As it is there probably isn't much that the kernel can do about it. I suppose it could blacklist affected chips/states, but I'm not sure whether that would be feasible. BTW, if you turn on CPU_FREQ_DEBUG the kernel will print details of state transitions, so you should be able to tell if it happens consistently on entry to a certain state. Anyway, googling around there seem to be lots of similar reports of instability with AM2+ chips using CnQ, on a variety of motherboards, especially at high clock speeds. Some are overclockers but there also seem to be plenty running at rated speeds. (In reply to comment #7) > Sorry, by consistent panics I meant panicing in a consistent place in the > kernel. I.e. something that would point to a kernel issue instead of a > BIOS/hardware issue. As it is there probably isn't much that the kernel can do > about it. I suppose it could blacklist affected chips/states, but I'm not sure > whether that would be feasible. mm. i agree that the kernel probably dont have much to do with the actual panic but if it does set some strange mode for the cpu then it would atlest be the source of the problem.. > > BTW, if you turn on CPU_FREQ_DEBUG the kernel will print details of state > transitions, so you should be able to tell if it happens consistently on entry > to a certain state. Ah, that was something i forgot... I'll try that out as soon as i can, but not shure if it will be able to log anything via the netconsole but it's a good thing to try. I'll post a log as soon as i can. Also, do know of any tool to collect the current power-settings (vid/fid etc) from the system while in cnq and non-cnq mode? Might be good to see what values the bios do set and what values we get while on cnq mode? > > Anyway, googling around there seem to be lots of similar reports of instability > with AM2+ chips using CnQ, on a variety of motherboards, especially at high > clock speeds. Some are overclockers but there also seem to be plenty running at > rated speeds. > Oh, google'd some myself but did not find anything specific about it, atleast not with this behaviour. (i'm also running at stock-speed just to confirm) Any updates here? Yep, but i dropped the issue. Latest ubuntu kernel does not contain the same behaviour and the slow updates in certin areas on gentoo got me to reconsider running an ubuntu-deriviate (i know, it's also slow, but you don't have to wait for half a day to do a big upgrade :) and after the switch the system has been running quietly with cool'n'quiet enabled. I did try to do some logging before the switch but it seems like what happens is that the chipset or something goes out of sync and that causes memory-corruption that then causes the machine to panic that in turns causes the kernel to be corrupt. Might be that it missreads the voltages in the dsdt table but did not see any big changes in the ACPI when this started to appear. I did notice the same (but much more seldom, and always during full load) issues on a similar system (M2N-e mainboard with a AM2 X2 4200+ cpu) and that system is also happily running ubuntu with cnq enabled. But for my part you could close this case, if nobody else has experienced the same issues. ok, thanks for responding |