Summary: | gentoo-s-2.6.25-r1: very bad system performance because of CONFIG_SPARSEMEM | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Christian Hoffmann (RETIRED) <hoffie> |
Component: | New packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | VERIFIED NEEDINFO | ||
Severity: | normal | CC: | duaneg |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://home.hoffie.info/linux-2.6.25-boot-failure.jpg | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
.config for gentoo-sources-2.6.25-r1
.config for gentoo-sources-2.6.24-r4 dmesg output after successful boot w/ 2.6.24 dmesg output after successful boot w/ 2.6.25 |
Description
Christian Hoffmann (RETIRED)
2008-04-21 12:56:31 UTC
Created attachment 150495 [details]
.config for gentoo-sources-2.6.25-r1
Created attachment 150497 [details]
.config for gentoo-sources-2.6.24-r4
Created attachment 150498 [details]
dmesg output after successful boot w/ 2.6.24
Created attachment 150499 [details]
dmesg output after successful boot w/ 2.6.25
Some more testing revealed: * Probability of the hang seems to be a bit higher than I guesstimated, I just had 11 failed attempts * I left the desk for a few minutes (~8) and after I came back it still showed the same screen (as in $URL). After pressing the Num lock key, the boot process continued, albeit way slower than normal (and all i/o-related stuff continued to be rather slow) Ok, somehow I got a kernel config which does no longer exhibit the boot failure (and I am not really keen on tracking down which option it was, but I'm attaching my current .config anyway). The slow down I've been experiencing seems to be a different problem. I was able to track it down to a mixture of a config problem and a regression. In 2.6.24* I was using CONFIG_DISCONTIGMEM{,_MANUAL}=y, but in 2.6.25 this option has been disabled for x86_64, as such CONFIG_SPARSEMEM{,_MANUAL}=y gets set by default. That's probably why my attempt at git-bisecting this problem failed. Using CONFIG_SPARSEMEM in 2.6.24 shows the exact same slow down. So, to summarize things: The fact, that CONFIG_SPARSEMEM is really slow on my system has been there since 2.6.24-gentoo-r4 at least (I verified this), but it has not been a problem for me so far as I was able to choose a different memory layout (CONFIG_DISCONTIGMEM), which is no longer possible in 2.6.25*. What to do now? File an upstream bug? Hi Christian, to recap, you had two problems: first a hang on boot, second a slowdown when using SPARSEMEM. The hang on boot problem showed itself after the messages you mentioned. Pressing a key seemed to get the process started again but the system was slow performing IO thereafter. This problem has subsequently gone away with a new kernel configuration. So, a couple of things: first, could you post your new config (that doesn't exhibit the hang-on-boot behaviour). I wouldn't mind having a quick look to see if there are any differences that jump out as possible culprits. Second, I think it would be useful to file an upstream bug at http://bugzilla.kernel.org/ regarding the slowdown. However, before doing that, there are a couple of boot parameters it may be worth trying. Since it seemed to be hanging in the PCI code try with pci=nommconf and pci=nomsi (individually) and see if they make a difference. If you don't mind doing a bit more work it would also be really helpful to see if this is a regression or whether it has always been slow with SPARSEMEM. Just checking 2.6.23 would be helpful. If you file a bug upstream please post the URL to it here afterwards, thanks. Please feel free to reopen once you've done the steps requested in comment #7 Thanks for all your comments! I've done lots of tests over time, without any results. Today I tried 2.6.26 -- it showed the same problems. Then I did a BIOS update (and loaded the default values) and surprise -- everything works as expected. I'm really not sure what the problem was -- apparently the BIOS somehow presented wrong information to the kernel and <2.6.25 were able to cope with this, but newer versions were not. Anyway, it now works as expected. It probably was not a kernel bug, but at least a behavior change which made it really problematic for me to use :) |