Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 101359

Summary: 64G highmem option in gentoo-sources is unstable
Product: Gentoo Linux Reporter: Curtis Wood <curtis>
Component: [OLD] Core systemAssignee: Michal Januszewski (RETIRED) <spock>
Status: RESOLVED FIXED    
Severity: normal CC: cm, fuzzyray, iyosifov, kernel, Matthias.Gerstner, tom, wolf31o2
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Whiteboard: linux-2.6.13
Package list:
Runtime testing required: ---

Description Curtis Wood 2005-08-04 10:34:12 UTC
Using the default config from genkernel, and setting the 64g highmem option
makes the system extremely unstable (wont boot or compile, eveything segv's)...
I've tested the stock 2.6.12.3 kernel with the same configuration and it is
stable. The only changes to the default config is changing the processor family
(p4), Allocate 3rd-level pagetables from highmem (on), Use register arguments
(on) and disabling all IDE drivers except for the apropriate chip (tested on 2
different systems). Changing highmem option back to 4g, the system is stable
again... Results are the same for gcc 3.4.4 and stable 3.3.5.

Reproducible: Always
Steps to Reproduce:
1. configure 2.6.12-gentoo-r6
2. enable 64g (highmem)
3. compile/install/reboot
4. lather, rince, repeat with stock 2.6.12.3 kernel...

Actual Results:  
first is that udevstart gets a ton "sed" errors, reiserfs progs segv
(fsck,mkreiserfs), compiler segv's when compling anything...

Expected Results:  
system boots normally, compiling works (doesnt result in segv's)...

Bug reproduced on 2 different systems... My main system specs are below... It
was recommending by comercial vendor to set the 64g option to get around an
annoying bug.

Intel(R) Pentium(R) 4 CPU 3.40GHz
2G RAM
AS8 ABIT main board

drak ~ # lspci
0000:00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
0000:00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller
(rev 02)
0000:00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
#3 (rev 02)
0000:00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
0000:00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
Controller (rev 02)
0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller
(rev 02)
0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller
(rev 02)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV40 [GeForce 6800]
(rev a1)
0000:02:01.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000
Controller (PHY/Link)
0000:02:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
0000:02:05.0 USB Controller: NEC Corporation USB (rev 43)
0000:02:05.1 USB Controller: NEC Corporation USB (rev 43)
0000:02:05.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0000:02:06.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
0000:02:06.1 Input device controller: Creative Labs SB Audigy MIDI/Game port
(rev 04)
0000:02:06.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
Comment 1 Daniel Drake (RETIRED) gentoo-dev 2005-08-05 15:53:25 UTC
Strange.

Could you please test the latest development kernel (currently
vanilla-sources-2.6.13_rc5) and see if the issue also exists there?
(gentoo-sources includes a fair amount of stuff from 2.6.13...)
Comment 2 Curtis Wood 2005-08-05 20:41:59 UTC
I tried the latest a greatest from the stock kernel tree like you suggested
(2.6.13-rc5), and it did not have the problems that the gentoo-sources does have
:( With regards to the highmem 64G option being set... 

I'm using the configuration from genkernel, and only modifying a few things - if
that helps at all... Any other testing you would like, just give me a yell ;-)
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2005-08-06 03:01:12 UTC
Ok, thanks for testing that.

The 2.6.12 patchset is big, so I think it will be best to wait for 2.6.13 (much
smaller gentoo patchset) before debugging this further. 2.6.13 should be out
within a week or so.

Comment 4 Mike Doty (RETIRED) gentoo-dev 2005-08-18 12:08:17 UTC
out of curoisity, what vendor recommended the HIGHMEM_64G and why?  I have
allmost identical hardware to you, and I don't have any stability issues.
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2005-08-23 15:23:52 UTC
*** Bug 103501 has been marked as a duplicate of this bug. ***
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2005-08-29 09:06:59 UTC
Ok, gentoo-sources-2.6.13 is now in portage so I'd be very grateful if someone
could help find the problem.

First thing to do is install gentoo-sources-2.6.13 and confirm the problem still
exists.

Then unpack /usr/portage/distfiles/genpatches-2.6.13-1.base.tar.bz2 and
/usr/portage/distfiles/genpatches-2.6.13-1.extras.tar.bz2 into a directory. We
now need to find the offending patch.

Start at the high end (patch 4905) and revert the patches in descending order.
To revert a patch you use "patch -p1 -R" from the kernel source directory.
Revert a few at a time (say 3?) and then confirm the problem still exists on the
"unpatched" kernel before reverting more. There are 20 patches in total.

For example:
 1. build gentoo-sources-2.6.13, confirm problem still exists

 2. revert 3 patches
   # cd /usr/src/linux-2.6.13-gentoo
   # patch -p1 -R -i /path/to/4905_alpha-sysctl-uac.patch
   # patch -p1 -R -i /path/to/4900_speakup-20050825.patch
   # patch -p1 -R -i /path/to/4705_squashfs-2.2.patch

 3. Take a note of the time, and rebuild the kernel the normal way, and copy the
new image over to /boot, etc etc.

 4. Reboot into the new kernel, and run "uname -v" to get the time+date that the
running kernel was compiled. This should approximately match the time you noted
in step 3. If it doesn't, you made a mistake when copying over the new kernel image.

 5. See if the problem still exists, and assuming it does, revert out the next
few patches:
   # cd /usr/src/linux-2.6.13-gentoo
   # patch -p1 -R -i /path/to/4505_vesafb-tng-0.9-rc7-r1.patch
   # patch -p1 -R -i /path/to/4500_fbsplash-0.9.2-r4.patch
   # patch -p1 -R -i /path/to/4351_megaraid-compatibility.patch

 6. Repeat from step 2 onwards until you are unable to reproduce the problem
then report back here which were the last group of patches that you reverted
which caused the problem to go away.

Thanks!
Comment 7 Chris Gianelloni (RETIRED) gentoo-dev 2005-09-01 12:57:58 UTC
Just out of curiousity, about how many patches are we talking here?  My laptop
has 1GB RAM and I could probably do this sort of testing while I'm at work, but
I wouldn't be able to start until tomorrow.  If nobody else steps up, then I can
help out by volunteering my machine.
Comment 8 Paul Varner (RETIRED) gentoo-dev 2005-09-01 13:03:04 UTC
I have tommorrow off from work and can also test on my machine.
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2005-09-01 13:26:35 UTC
Thanks people. There are 20 patches.

They are listed here:
http://dev.gentoo.org/~dsd/genpatches/patches-2.6.13-1.htm

Some are very unlikely to be related to this bug (e.g. 1300, 4101) so feel free
to use your own intuition to give some patches less attention.

Note that the original report is from a genkernel user (so pretty much all of
the feature patches will be compiled into the kernel) but I've had reports from
manual compiles too. To be thorough, its probably a good idea to compile the
extra features (squashfs, vesafb-tng, fbsplash) into your test kernels (until
you revert those patches!).

Also, the original report says it is very easy to reproduce (i.e. he can't even
_boot_ cleanly) whereas others have reported it being far more scarce, i.e. it
doesn't appear until halfway through a glibc compile.

I'm wondering if one easy way to reproduce this problem might be to run
memtest86 while under Linux (under a suspected kernel). Just an idea.
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2005-09-01 13:31:52 UTC
One more thing, once you get into the revert-retest-revert-retest routine its
very likely that the problem will go away early on. All of the big intrusive
feature patches have high numbers (>=3000) so will end up being reverted first.
Comment 11 Paul Varner (RETIRED) gentoo-dev 2005-09-02 14:33:53 UTC
Here is what I have so far:

* emerged genkernel
* Ran genkernel --menuconfig --bootsplash all
* Changed the High Memory option to 64G and saved.
* Rebooted and ran into errors
* genkernel --menuconfig --bootsplash all and changed option back to 4G
* rebooted successfully
* Ran patch -p1 -R -i 4905_alpha-sysctl-uac.patch
* genkernel --menuconfig --bootsplash all and changed option back to 64G
* Rebooted unsuccessfully
* patch -p1 -R -i 4900_speakup-20050825.patch
* genkernel --menuconfig --bootsplash all
* Changed Framebuffer from vesafb-tng to vesafb (due to having invalid vga line
and not wanting to lookup the new syntax to pass to kernel)
* rebooted successfully
* genkernel --menuconfig --bootsplash all
* Changed back to vesafb-tng
* rebooted unsuccessfully

So in my case, it actually looks like something in the interaction between
vesafb-tng and the 64G High Mem option.  I'm going to try on a vanilla kernel to
double check, before I continue removing patches from gentoo-sources-2.6.13.
Unfortunately, it takes about an hour to rebuild each kernel with genkernel so
the testing is time intensive.
Comment 12 Paul Varner (RETIRED) gentoo-dev 2005-09-02 14:40:03 UTC
I just realized the vesafb-tng is not in the vanilla kernel, so the suspect
patch is 4505_vesafb-tng-0.9-rc7-r1.patch.  I am removing that patch and
recompiling to test.
Comment 13 Paul Varner (RETIRED) gentoo-dev 2005-09-02 17:42:41 UTC
For my system it is the 4505_vesafb-tng-0.9-rc7-r1.patch that causes the
problems.  Removing the patch or as stated above changing from vesafb-tng to
vesafb resolves the problems.
Comment 14 Daniel Drake (RETIRED) gentoo-dev 2005-09-02 18:11:01 UTC
Thanks Paul. Could you perhaps test applying the vesafb-tng patch against
vanilla, so that you effectively have clean 2.6.13 + vesafbtng + highmem64, and
see if it is reproducible on a setup like that?

I ask this as tomaw also reproduced this bug and does not use vesafb-tng, but
hasn't been successful in tracking down which Gentoo patch is the cause just
yet. We may be dealing with multiple problems, or a problem which only appears
with a certain patch combination, or something ugly like that.

I now have access to a pc with 1.5gb RAM but I haven't been able to reproduce
the problem yet. I'll keep playing...
Comment 15 Paul Varner (RETIRED) gentoo-dev 2005-09-02 19:34:25 UTC
I am able to reproduce the problem by applying the vesafb-tng patch against the
vanilla kernel.  I have also ditched genkernel and have gone back to my much
slimmer monolithic kernel config and it exhibits the problem as well.
Comment 16 Daniel Drake (RETIRED) gentoo-dev 2005-09-11 12:44:48 UTC
Michal is working on this
Comment 17 Michal Januszewski (RETIRED) gentoo-dev 2005-09-11 13:59:19 UTC
Just to keep everyone up-to-date: I was able to reproduce the problem on my
machine and have already fixed it. I'm currently polishing the new code a little
bit and will hopefully soon release a fixed version of the patch.
Comment 18 Michal Januszewski (RETIRED) gentoo-dev 2005-09-16 07:27:47 UTC
I took a little longer than I expected, but here it is:

http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005091603.patch

Please test with vanilla 2.6.13* or 2.6.14-rc1.
Comment 19 Paul Varner (RETIRED) gentoo-dev 2005-09-18 08:35:06 UTC
I am not seeing any issues with 2.6.14-rc1
Comment 20 Michal Januszewski (RETIRED) gentoo-dev 2005-09-19 17:27:57 UTC
New patch available at:

http://dev.gentoo.org/~spock/projects/vesafb-tng/testing/vesafb-tng-testing-2005092001.patch

(in case someone wants to do more testing, please use this one)
Comment 21 Daniel Drake (RETIRED) gentoo-dev 2005-10-28 12:42:10 UTC
Fixed in gentoo-sources-2.6.14