Using the default config from genkernel, and setting the 64g highmem option makes the system extremely unstable (wont boot or compile, eveything segv's)... I've tested the stock 2.6.12.3 kernel with the same configuration and it is stable. The only changes to the default config is changing the processor family (p4), Allocate 3rd-level pagetables from highmem (on), Use register arguments (on) and disabling all IDE drivers except for the apropriate chip (tested on 2 different systems). Changing highmem option back to 4g, the system is stable again... Results are the same for gcc 3.4.4 and stable 3.3.5. Reproducible: Always Steps to Reproduce: 1. configure 2.6.12-gentoo-r6 2. enable 64g (highmem) 3. compile/install/reboot 4. lather, rince, repeat with stock 2.6.12.3 kernel... Actual Results: first is that udevstart gets a ton "sed" errors, reiserfs progs segv (fsck,mkreiserfs), compiler segv's when compling anything... Expected Results: system boots normally, compiling works (doesnt result in segv's)... Bug reproduced on 2 different systems... My main system specs are below... It was recommending by comercial vendor to set the 64g option to get around an annoying bug. Intel(R) Pentium(R) 4 CPU 3.40GHz 2G RAM AS8 ABIT main board drak ~ # lspci 0000:00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02) 0000:00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02) 0000:00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 0000:00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 0000:00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI #3 (rev 02) 0000:00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02) 0000:00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 0000:00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02) 0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 0000:01:00.0 VGA compatible controller: nVidia Corporation NV40 [GeForce 6800] (rev a1) 0000:02:01.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) 0000:02:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:02:05.0 USB Controller: NEC Corporation USB (rev 43) 0000:02:05.1 USB Controller: NEC Corporation USB (rev 43) 0000:02:05.2 USB Controller: NEC Corporation USB 2.0 (rev 04) 0000:02:06.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04) 0000:02:06.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev 04) 0000:02:06.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
Strange. Could you please test the latest development kernel (currently vanilla-sources-2.6.13_rc5) and see if the issue also exists there? (gentoo-sources includes a fair amount of stuff from 2.6.13...)
I tried the latest a greatest from the stock kernel tree like you suggested (2.6.13-rc5), and it did not have the problems that the gentoo-sources does have :( With regards to the highmem 64G option being set... I'm using the configuration from genkernel, and only modifying a few things - if that helps at all... Any other testing you would like, just give me a yell ;-)
Ok, thanks for testing that. The 2.6.12 patchset is big, so I think it will be best to wait for 2.6.13 (much smaller gentoo patchset) before debugging this further. 2.6.13 should be out within a week or so.
out of curoisity, what vendor recommended the HIGHMEM_64G and why? I have allmost identical hardware to you, and I don't have any stability issues.
*** Bug 103501 has been marked as a duplicate of this bug. ***
Ok, gentoo-sources-2.6.13 is now in portage so I'd be very grateful if someone could help find the problem. First thing to do is install gentoo-sources-2.6.13 and confirm the problem still exists. Then unpack /usr/portage/distfiles/genpatches-2.6.13-1.base.tar.bz2 and /usr/portage/distfiles/genpatches-2.6.13-1.extras.tar.bz2 into a directory. We now need to find the offending patch. Start at the high end (patch 4905) and revert the patches in descending order. To revert a patch you use "patch -p1 -R" from the kernel source directory. Revert a few at a time (say 3?) and then confirm the problem still exists on the "unpatched" kernel before reverting more. There are 20 patches in total. For example: 1. build gentoo-sources-2.6.13, confirm problem still exists 2. revert 3 patches # cd /usr/src/linux-2.6.13-gentoo # patch -p1 -R -i /path/to/4905_alpha-sysctl-uac.patch # patch -p1 -R -i /path/to/4900_speakup-20050825.patch # patch -p1 -R -i /path/to/4705_squashfs-2.2.patch 3. Take a note of the time, and rebuild the kernel the normal way, and copy the new image over to /boot, etc etc. 4. Reboot into the new kernel, and run "uname -v" to get the time+date that the running kernel was compiled. This should approximately match the time you noted in step 3. If it doesn't, you made a mistake when copying over the new kernel image. 5. See if the problem still exists, and assuming it does, revert out the next few patches: # cd /usr/src/linux-2.6.13-gentoo # patch -p1 -R -i /path/to/4505_vesafb-tng-0.9-rc7-r1.patch # patch -p1 -R -i /path/to/4500_fbsplash-0.9.2-r4.patch # patch -p1 -R -i /path/to/4351_megaraid-compatibility.patch 6. Repeat from step 2 onwards until you are unable to reproduce the problem then report back here which were the last group of patches that you reverted which caused the problem to go away. Thanks!
Just out of curiousity, about how many patches are we talking here? My laptop has 1GB RAM and I could probably do this sort of testing while I'm at work, but I wouldn't be able to start until tomorrow. If nobody else steps up, then I can help out by volunteering my machine.
I have tommorrow off from work and can also test on my machine.
Thanks people. There are 20 patches. They are listed here: http://dev.gentoo.org/~dsd/genpatches/patches-2.6.13-1.htm Some are very unlikely to be related to this bug (e.g. 1300, 4101) so feel free to use your own intuition to give some patches less attention. Note that the original report is from a genkernel user (so pretty much all of the feature patches will be compiled into the kernel) but I've had reports from manual compiles too. To be thorough, its probably a good idea to compile the extra features (squashfs, vesafb-tng, fbsplash) into your test kernels (until you revert those patches!). Also, the original report says it is very easy to reproduce (i.e. he can't even _boot_ cleanly) whereas others have reported it being far more scarce, i.e. it doesn't appear until halfway through a glibc compile. I'm wondering if one easy way to reproduce this problem might be to run memtest86 while under Linux (under a suspected kernel). Just an idea.
One more thing, once you get into the revert-retest-revert-retest routine its very likely that the problem will go away early on. All of the big intrusive feature patches have high numbers (>=3000) so will end up being reverted first.
Here is what I have so far: * emerged genkernel * Ran genkernel --menuconfig --bootsplash all * Changed the High Memory option to 64G and saved. * Rebooted and ran into errors * genkernel --menuconfig --bootsplash all and changed option back to 4G * rebooted successfully * Ran patch -p1 -R -i 4905_alpha-sysctl-uac.patch * genkernel --menuconfig --bootsplash all and changed option back to 64G * Rebooted unsuccessfully * patch -p1 -R -i 4900_speakup-20050825.patch * genkernel --menuconfig --bootsplash all * Changed Framebuffer from vesafb-tng to vesafb (due to having invalid vga line and not wanting to lookup the new syntax to pass to kernel) * rebooted successfully * genkernel --menuconfig --bootsplash all * Changed back to vesafb-tng * rebooted unsuccessfully So in my case, it actually looks like something in the interaction between vesafb-tng and the 64G High Mem option. I'm going to try on a vanilla kernel to double check, before I continue removing patches from gentoo-sources-2.6.13. Unfortunately, it takes about an hour to rebuild each kernel with genkernel so the testing is time intensive.
I just realized the vesafb-tng is not in the vanilla kernel, so the suspect patch is 4505_vesafb-tng-0.9-rc7-r1.patch. I am removing that patch and recompiling to test.
For my system it is the 4505_vesafb-tng-0.9-rc7-r1.patch that causes the problems. Removing the patch or as stated above changing from vesafb-tng to vesafb resolves the problems.
Thanks Paul. Could you perhaps test applying the vesafb-tng patch against vanilla, so that you effectively have clean 2.6.13 + vesafbtng + highmem64, and see if it is reproducible on a setup like that? I ask this as tomaw also reproduced this bug and does not use vesafb-tng, but hasn't been successful in tracking down which Gentoo patch is the cause just yet. We may be dealing with multiple problems, or a problem which only appears with a certain patch combination, or something ugly like that. I now have access to a pc with 1.5gb RAM but I haven't been able to reproduce the problem yet. I'll keep playing...
I am able to reproduce the problem by applying the vesafb-tng patch against the vanilla kernel. I have also ditched genkernel and have gone back to my much slimmer monolithic kernel config and it exhibits the problem as well.
Michal is working on this
Just to keep everyone up-to-date: I was able to reproduce the problem on my machine and have already fixed it. I'm currently polishing the new code a little bit and will hopefully soon release a fixed version of the patch.
I took a little longer than I expected, but here it is: http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005091603.patch Please test with vanilla 2.6.13* or 2.6.14-rc1.
I am not seeing any issues with 2.6.14-rc1
New patch available at: http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005092001.patch (in case someone wants to do more testing, please use this one)
Fixed in gentoo-sources-2.6.14