Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 205344 - sys-kernel/hardened-sources-2.6.23-r4 - Frequent kernel panics
Summary: sys-kernel/hardened-sources-2.6.23-r4 - Frequent kernel panics
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: The Gentoo Linux Hardened Kernel Team (OBSOLETE)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-11 17:30 UTC by Travis Morgan
Modified: 2009-07-21 22:30 UTC (History)
8 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Kernel Panic (panic.txt,1.17 KB, text/plain)
2008-01-11 17:32 UTC, Travis Morgan
Details
emerge --info (emergeinfo.txt,3.46 KB, text/plain)
2008-01-11 17:32 UTC, Travis Morgan
Details
kernel config (.config,42.54 KB, text/plain)
2008-01-11 17:33 UTC, Travis Morgan
Details
System.map (System.map,885.71 KB, text/plain)
2008-01-11 17:33 UTC, Travis Morgan
Details
pci=nomsi (panic_nomsi.txt,1.17 KB, text/plain)
2008-01-11 21:58 UTC, Travis Morgan
Details
kern/sched.o - SMP non-working kernel (sched.o,54.04 KB, application/octet-stream)
2008-01-11 23:53 UTC, Travis Morgan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Travis Morgan 2008-01-11 17:30:52 UTC
After updating several systems to 2.6.23-hardened-r4 yesterday, two of them are now experiencing kernel panics very frequently. A simple 'find /' while in an ssh session will crash the box.

Reproducible: Always

Steps to Reproduce:
1.ssh to box
2.su - root
3.find /

Actual Results:  
Kernel panic

Expected Results:  
No panic!

Both machines that are panicking are i686. Two other machines were updated that run x86_64 and have had no problems yet. The kernel config was copied from the first machine to the other three so with very few exceptions all four machines are running the same config. Possibly an i686 only issue.
Comment 1 Travis Morgan 2008-01-11 17:32:31 UTC
Created attachment 140730 [details]
Kernel Panic
Comment 2 Travis Morgan 2008-01-11 17:32:53 UTC
Created attachment 140731 [details]
emerge --info
Comment 3 Travis Morgan 2008-01-11 17:33:08 UTC
Created attachment 140732 [details]
kernel config
Comment 4 Travis Morgan 2008-01-11 17:33:24 UTC
Created attachment 140733 [details]
System.map
Comment 5 Travis Morgan 2008-01-11 18:04:26 UTC
I should also mention that all four machines use reiserfs for their filesystems. 3 of the 4 machines use 3ware raid cards, and of the two that are crashing one is on a 3ware and one is not.
Comment 6 PaX Team 2008-01-11 20:27:01 UTC
(In reply to comment #0)
> After updating several systems to 2.6.23-hardened-r4 yesterday

which version did you upgrade from? (in particular, was it another .23?)

> two of them are now experiencing kernel panics very frequently.

the crash looks very much like some of those reported in bug #197521 except that you're already on gcc-4 so it apparently/probably isn't some miscompiled code. you said it was easily reproducible, would you also have the time/motivation for some test kernels? if so, the first thing to try is a non-SMP kernel and see if you can reproduce the crash with it.
Comment 7 Travis Morgan 2008-01-11 21:42:21 UTC
I actually have a couple more boxes I'll give details of as well. One doesn't follow the pattern, or at the least I haven't SEEN or been able to make it crash yet.

All are now 2.6.23-hardened-r4.
All run reiserfs.
All are SMP (dual core or hyperthreaded).

NAME   OLD KERNEL           ARCH    STATUS        GCC
c.t    2.6.16-hardened-r11  i686    crashing      4.1.2
s.s    2.6.16-hardened-r11  i686    crashing      4.1.2
m.ah   2.6.18-hardened      i686    not crashing  4.1.2
r.s    2.6.16-hardened-r11  x86_64  not crashing  4.1.2
m.as   2.6.20-hardened-r5   x86_64  not crashing  4.1.2
m.s    2.6.16-hardened-r11  x86_64  not crashing  4.1.2

I can try putting a non-SMP kernel on one of the crashing machines. I'll post the result when I have a chance to do that.

The original crash info was from the box s.s

Comment 8 Travis Morgan 2008-01-11 21:58:21 UTC
Created attachment 140739 [details]
pci=nomsi

no affect - still crashes
Comment 9 Travis Morgan 2008-01-11 22:14:55 UTC
I am unable to make the s.s system crash when the kernel is compiled without SMP. Normally "find /" would take about 10 seconds. It now goes through the entire filesystem repeatedly without a problem.
Comment 10 PaX Team 2008-01-11 22:38:46 UTC
(In reply to comment #9)
> I am unable to make the s.s system crash when the kernel is compiled without
> SMP. Normally "find /" would take about 10 seconds. It now goes through the
> entire filesystem repeatedly without a problem.

ok, so it's SMP related somehow. can you upload or send me your kernel/sched.o (from the non-working kernel) please?
Comment 11 Travis Morgan 2008-01-11 23:53:20 UTC
Created attachment 140745 [details]
kern/sched.o - SMP non-working kernel

I had to recompile this as I didn't keep the old version of the kernel around when I went to single processor. The config is exactly the same, and I'm 99.9% sure the resulting binary will be too as I recompiled many times before trying without SMP and always ended up with a buggy SMP kernel.
Comment 12 PaX Team 2008-01-18 21:05:50 UTC
(In reply to comment #7)
> c.t    2.6.16-hardened-r11  i686    crashing      4.1.2
> s.s    2.6.16-hardened-r11  i686    crashing      4.1.2
> m.ah   2.6.18-hardened      i686    not crashing  4.1.2

can you post /proc/cpuinfo for these machines (or just the flags lines)? is the pax part of the configuration the same on them?
Comment 13 Travis Morgan 2008-02-13 17:27:21 UTC
PAX is exactly the same between c.t and s.s. It differs slightly for m.ah.
--- c.t
+++ m.ah
-# CONFIG_PAX_PAGEEXEC is not set
+CONFIG_PAX_PAGEEXEC=y
+# CONFIG_PAX_EMUTRAMP is not set
+CONFIG_PAX_MPROTECT=y
+# CONFIG_PAX_NOELFRELOCS is not set
-# CONFIG_PAX_MEMORY_SANITIZE is not set
+CONFIG_PAX_MEMORY_SANITIZE=y


Anyways, here's the full info.

c.t:
> cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 1
cpu MHz         : 2800.340
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5604.82
clflush size    : 64

> grep -i pax /usr/src/linux-`uname -r`/.config
# PaX
CONFIG_PAX=y
# PaX Control
# CONFIG_PAX_SOFTMODE is not set
CONFIG_PAX_EI_PAX=y
CONFIG_PAX_PT_PAX_FLAGS=y
CONFIG_PAX_NO_ACL_FLAGS=y
# CONFIG_PAX_HAVE_ACL_FLAGS is not set
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
# CONFIG_PAX_PAGEEXEC is not set
# CONFIG_PAX_SEGMEXEC is not set
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_ASLR=y
# CONFIG_PAX_RANDKSTACK is not set
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
# CONFIG_PAX_MEMORY_UDEREF is not set



s.s:
> cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 6
model name      : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping        : 2
cpu MHz         : 3412.280
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
bogomips        : 6829.85
clflush size    : 64


> grep -i pax /usr/src/linux-`uname -r`/.config
# PaX
CONFIG_PAX=y
# PaX Control
# CONFIG_PAX_SOFTMODE is not set
CONFIG_PAX_EI_PAX=y
CONFIG_PAX_PT_PAX_FLAGS=y
CONFIG_PAX_NO_ACL_FLAGS=y
# CONFIG_PAX_HAVE_ACL_FLAGS is not set
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
# CONFIG_PAX_PAGEEXEC is not set
# CONFIG_PAX_SEGMEXEC is not set
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_ASLR=y
# CONFIG_PAX_RANDKSTACK is not set
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
# CONFIG_PAX_MEMORY_UDEREF is not set


m.ah:
> cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 1
cpu MHz         : 2800.371
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5605.21
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 1
cpu MHz         : 2800.371
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5600.27
clflush size    : 64


> grep -i pax /usr/src/linux-`uname -r`/.config
# PaX
CONFIG_PAX=y
# PaX Control
# CONFIG_PAX_SOFTMODE is not set
CONFIG_PAX_EI_PAX=y
CONFIG_PAX_PT_PAX_FLAGS=y
CONFIG_PAX_NO_ACL_FLAGS=y
# CONFIG_PAX_HAVE_ACL_FLAGS is not set
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
CONFIG_PAX_PAGEEXEC=y
# CONFIG_PAX_SEGMEXEC is not set
# CONFIG_PAX_EMUTRAMP is not set
CONFIG_PAX_MPROTECT=y
# CONFIG_PAX_NOELFRELOCS is not set
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_ASLR=y
# CONFIG_PAX_RANDKSTACK is not set
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
CONFIG_PAX_MEMORY_SANITIZE=y
# CONFIG_PAX_MEMORY_UDEREF is not set

Comment 14 PaX Team 2008-02-15 23:29:44 UTC
can you try out the latest pax or grsec patches? i think i managed to reproduce and find out your problem. as confirmation, you can also just disable HIGHPTE and see if you still get the issue.
Comment 15 Travis Morgan 2008-02-15 23:47:37 UTC
Are you saying you want me to grab the same kernel, or latest kernel, and patch it with the latest PAX myself, or that you want me to try the latest hardened-sources kernel in portage?
Comment 16 Travis Morgan 2008-02-15 23:49:26 UTC
Nm.. I think I know what you mean. I'll see if I can do some testing to make the lockups happen again and then test again without HIGHPTE to see if they still happen.
Comment 17 PaX Team 2008-02-16 00:03:15 UTC
for now you have to grab the patches directly from the grsec or pax test dirs (grsecurity.net/~spender or ~paxguy1), eventually they will be added to gentoo but right now you can't just emerge them.
Comment 18 kfm 2008-02-27 18:55:51 UTC
Based on a comment made by the PaX Team in bug 198051, this issue /may/ be resolved by a patch included in the 2.6.23-r8 release. As such, I would be grateful if the reporter were to test that.

In the even that the situation does not improve, note that a 2.6.24 release should be committed fairly soon.
Comment 19 Gordon Malm (RETIRED) gentoo-dev 2008-04-12 22:36:35 UTC
(In reply to comment #16)
> Nm.. I think I know what you mean. I'll see if I can do some testing to make
> the lockups happen again and then test again without HIGHPTE to see if they
> still happen.
> 

Did you try this by any chance and if so could you share the result?  Also, hardened-sources-2.6.24 is in the tree if you wouldn't mind seeing if your problem disappears with it since PaX Team thinks it may be fixed.  Thanks.
Comment 20 PaX Team 2008-04-14 19:26:34 UTC
(In reply to comment #19)
> Did you try this by any chance and if so could you share the result?  Also,
> hardened-sources-2.6.24 is in the tree if you wouldn't mind seeing if your
> problem disappears with it since PaX Team thinks it may be fixed.  Thanks.

actually the bug manifested on .24 as well so it's definitely not fixed. the only pattern i saw so far was that P4/HT boxes were affected (which points at some very subtle race somewhere in the scheduler, not clear how PaX is causing it though) but apparently not all of them in Travis' case, so it would be interesting to know how the .config of m.ah differs from the rest (not only the PaX bits but everything else as well as i think it's not configurable PaX feature that causes this, if it's PaX at all).
Comment 21 Travis Morgan 2008-08-11 17:53:43 UTC
I no longer have access to the systems this problem presented on so I won't be able to offer further assistance in tracking it down.
Comment 22 kfm 2009-07-21 22:30:32 UTC
This bug appears to have nowhere to go. Closing as NEEDINFO. Please re-open if any new information comes to light and/or it becomes apparent that it's still an issue in recent versions. Thanks.