Summary: | 64G highmem option in gentoo-sources is unstable | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Curtis Wood <curtis> |
Component: | [OLD] Core system | Assignee: | Michal Januszewski (RETIRED) <spock> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | cm, fuzzyray, iyosifov, kernel, Matthias.Gerstner, tom, wolf31o2 |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | linux-2.6.13 | ||
Package list: | Runtime testing required: | --- |
Description
Curtis Wood
2005-08-04 10:34:12 UTC
Strange. Could you please test the latest development kernel (currently vanilla-sources-2.6.13_rc5) and see if the issue also exists there? (gentoo-sources includes a fair amount of stuff from 2.6.13...) I tried the latest a greatest from the stock kernel tree like you suggested (2.6.13-rc5), and it did not have the problems that the gentoo-sources does have :( With regards to the highmem 64G option being set... I'm using the configuration from genkernel, and only modifying a few things - if that helps at all... Any other testing you would like, just give me a yell ;-) Ok, thanks for testing that. The 2.6.12 patchset is big, so I think it will be best to wait for 2.6.13 (much smaller gentoo patchset) before debugging this further. 2.6.13 should be out within a week or so. out of curoisity, what vendor recommended the HIGHMEM_64G and why? I have allmost identical hardware to you, and I don't have any stability issues. *** Bug 103501 has been marked as a duplicate of this bug. *** Ok, gentoo-sources-2.6.13 is now in portage so I'd be very grateful if someone could help find the problem. First thing to do is install gentoo-sources-2.6.13 and confirm the problem still exists. Then unpack /usr/portage/distfiles/genpatches-2.6.13-1.base.tar.bz2 and /usr/portage/distfiles/genpatches-2.6.13-1.extras.tar.bz2 into a directory. We now need to find the offending patch. Start at the high end (patch 4905) and revert the patches in descending order. To revert a patch you use "patch -p1 -R" from the kernel source directory. Revert a few at a time (say 3?) and then confirm the problem still exists on the "unpatched" kernel before reverting more. There are 20 patches in total. For example: 1. build gentoo-sources-2.6.13, confirm problem still exists 2. revert 3 patches # cd /usr/src/linux-2.6.13-gentoo # patch -p1 -R -i /path/to/4905_alpha-sysctl-uac.patch # patch -p1 -R -i /path/to/4900_speakup-20050825.patch # patch -p1 -R -i /path/to/4705_squashfs-2.2.patch 3. Take a note of the time, and rebuild the kernel the normal way, and copy the new image over to /boot, etc etc. 4. Reboot into the new kernel, and run "uname -v" to get the time+date that the running kernel was compiled. This should approximately match the time you noted in step 3. If it doesn't, you made a mistake when copying over the new kernel image. 5. See if the problem still exists, and assuming it does, revert out the next few patches: # cd /usr/src/linux-2.6.13-gentoo # patch -p1 -R -i /path/to/4505_vesafb-tng-0.9-rc7-r1.patch # patch -p1 -R -i /path/to/4500_fbsplash-0.9.2-r4.patch # patch -p1 -R -i /path/to/4351_megaraid-compatibility.patch 6. Repeat from step 2 onwards until you are unable to reproduce the problem then report back here which were the last group of patches that you reverted which caused the problem to go away. Thanks! Just out of curiousity, about how many patches are we talking here? My laptop has 1GB RAM and I could probably do this sort of testing while I'm at work, but I wouldn't be able to start until tomorrow. If nobody else steps up, then I can help out by volunteering my machine. I have tommorrow off from work and can also test on my machine. Thanks people. There are 20 patches. They are listed here: http://dev.gentoo.org/~dsd/genpatches/patches-2.6.13-1.htm Some are very unlikely to be related to this bug (e.g. 1300, 4101) so feel free to use your own intuition to give some patches less attention. Note that the original report is from a genkernel user (so pretty much all of the feature patches will be compiled into the kernel) but I've had reports from manual compiles too. To be thorough, its probably a good idea to compile the extra features (squashfs, vesafb-tng, fbsplash) into your test kernels (until you revert those patches!). Also, the original report says it is very easy to reproduce (i.e. he can't even _boot_ cleanly) whereas others have reported it being far more scarce, i.e. it doesn't appear until halfway through a glibc compile. I'm wondering if one easy way to reproduce this problem might be to run memtest86 while under Linux (under a suspected kernel). Just an idea. One more thing, once you get into the revert-retest-revert-retest routine its very likely that the problem will go away early on. All of the big intrusive feature patches have high numbers (>=3000) so will end up being reverted first. Here is what I have so far: * emerged genkernel * Ran genkernel --menuconfig --bootsplash all * Changed the High Memory option to 64G and saved. * Rebooted and ran into errors * genkernel --menuconfig --bootsplash all and changed option back to 4G * rebooted successfully * Ran patch -p1 -R -i 4905_alpha-sysctl-uac.patch * genkernel --menuconfig --bootsplash all and changed option back to 64G * Rebooted unsuccessfully * patch -p1 -R -i 4900_speakup-20050825.patch * genkernel --menuconfig --bootsplash all * Changed Framebuffer from vesafb-tng to vesafb (due to having invalid vga line and not wanting to lookup the new syntax to pass to kernel) * rebooted successfully * genkernel --menuconfig --bootsplash all * Changed back to vesafb-tng * rebooted unsuccessfully So in my case, it actually looks like something in the interaction between vesafb-tng and the 64G High Mem option. I'm going to try on a vanilla kernel to double check, before I continue removing patches from gentoo-sources-2.6.13. Unfortunately, it takes about an hour to rebuild each kernel with genkernel so the testing is time intensive. I just realized the vesafb-tng is not in the vanilla kernel, so the suspect patch is 4505_vesafb-tng-0.9-rc7-r1.patch. I am removing that patch and recompiling to test. For my system it is the 4505_vesafb-tng-0.9-rc7-r1.patch that causes the problems. Removing the patch or as stated above changing from vesafb-tng to vesafb resolves the problems. Thanks Paul. Could you perhaps test applying the vesafb-tng patch against vanilla, so that you effectively have clean 2.6.13 + vesafbtng + highmem64, and see if it is reproducible on a setup like that? I ask this as tomaw also reproduced this bug and does not use vesafb-tng, but hasn't been successful in tracking down which Gentoo patch is the cause just yet. We may be dealing with multiple problems, or a problem which only appears with a certain patch combination, or something ugly like that. I now have access to a pc with 1.5gb RAM but I haven't been able to reproduce the problem yet. I'll keep playing... I am able to reproduce the problem by applying the vesafb-tng patch against the vanilla kernel. I have also ditched genkernel and have gone back to my much slimmer monolithic kernel config and it exhibits the problem as well. Michal is working on this Just to keep everyone up-to-date: I was able to reproduce the problem on my machine and have already fixed it. I'm currently polishing the new code a little bit and will hopefully soon release a fixed version of the patch. I took a little longer than I expected, but here it is: http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005091603.patch Please test with vanilla 2.6.13* or 2.6.14-rc1. I am not seeing any issues with 2.6.14-rc1 New patch available at: http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005092001.patch (in case someone wants to do more testing, please use this one) Fixed in gentoo-sources-2.6.14 |