After updating several systems to 2.6.23-hardened-r4 yesterday, two of them are now experiencing kernel panics very frequently. A simple 'find /' while in an ssh session will crash the box. Reproducible: Always Steps to Reproduce: 1.ssh to box 2.su - root 3.find / Actual Results: Kernel panic Expected Results: No panic! Both machines that are panicking are i686. Two other machines were updated that run x86_64 and have had no problems yet. The kernel config was copied from the first machine to the other three so with very few exceptions all four machines are running the same config. Possibly an i686 only issue.
Created attachment 140730 [details] Kernel Panic
Created attachment 140731 [details] emerge --info
Created attachment 140732 [details] kernel config
Created attachment 140733 [details] System.map
I should also mention that all four machines use reiserfs for their filesystems. 3 of the 4 machines use 3ware raid cards, and of the two that are crashing one is on a 3ware and one is not.
(In reply to comment #0) > After updating several systems to 2.6.23-hardened-r4 yesterday which version did you upgrade from? (in particular, was it another .23?) > two of them are now experiencing kernel panics very frequently. the crash looks very much like some of those reported in bug #197521 except that you're already on gcc-4 so it apparently/probably isn't some miscompiled code. you said it was easily reproducible, would you also have the time/motivation for some test kernels? if so, the first thing to try is a non-SMP kernel and see if you can reproduce the crash with it.
I actually have a couple more boxes I'll give details of as well. One doesn't follow the pattern, or at the least I haven't SEEN or been able to make it crash yet. All are now 2.6.23-hardened-r4. All run reiserfs. All are SMP (dual core or hyperthreaded). NAME OLD KERNEL ARCH STATUS GCC c.t 2.6.16-hardened-r11 i686 crashing 4.1.2 s.s 2.6.16-hardened-r11 i686 crashing 4.1.2 m.ah 2.6.18-hardened i686 not crashing 4.1.2 r.s 2.6.16-hardened-r11 x86_64 not crashing 4.1.2 m.as 2.6.20-hardened-r5 x86_64 not crashing 4.1.2 m.s 2.6.16-hardened-r11 x86_64 not crashing 4.1.2 I can try putting a non-SMP kernel on one of the crashing machines. I'll post the result when I have a chance to do that. The original crash info was from the box s.s
Created attachment 140739 [details] pci=nomsi no affect - still crashes
I am unable to make the s.s system crash when the kernel is compiled without SMP. Normally "find /" would take about 10 seconds. It now goes through the entire filesystem repeatedly without a problem.
(In reply to comment #9) > I am unable to make the s.s system crash when the kernel is compiled without > SMP. Normally "find /" would take about 10 seconds. It now goes through the > entire filesystem repeatedly without a problem. ok, so it's SMP related somehow. can you upload or send me your kernel/sched.o (from the non-working kernel) please?
Created attachment 140745 [details] kern/sched.o - SMP non-working kernel I had to recompile this as I didn't keep the old version of the kernel around when I went to single processor. The config is exactly the same, and I'm 99.9% sure the resulting binary will be too as I recompiled many times before trying without SMP and always ended up with a buggy SMP kernel.
(In reply to comment #7) > c.t 2.6.16-hardened-r11 i686 crashing 4.1.2 > s.s 2.6.16-hardened-r11 i686 crashing 4.1.2 > m.ah 2.6.18-hardened i686 not crashing 4.1.2 can you post /proc/cpuinfo for these machines (or just the flags lines)? is the pax part of the configuration the same on them?
PAX is exactly the same between c.t and s.s. It differs slightly for m.ah. --- c.t +++ m.ah -# CONFIG_PAX_PAGEEXEC is not set +CONFIG_PAX_PAGEEXEC=y +# CONFIG_PAX_EMUTRAMP is not set +CONFIG_PAX_MPROTECT=y +# CONFIG_PAX_NOELFRELOCS is not set -# CONFIG_PAX_MEMORY_SANITIZE is not set +CONFIG_PAX_MEMORY_SANITIZE=y Anyways, here's the full info. c.t: > cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 1 cpu MHz : 2800.340 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr bogomips : 5604.82 clflush size : 64 > grep -i pax /usr/src/linux-`uname -r`/.config # PaX CONFIG_PAX=y # PaX Control # CONFIG_PAX_SOFTMODE is not set CONFIG_PAX_EI_PAX=y CONFIG_PAX_PT_PAX_FLAGS=y CONFIG_PAX_NO_ACL_FLAGS=y # CONFIG_PAX_HAVE_ACL_FLAGS is not set # CONFIG_PAX_HOOK_ACL_FLAGS is not set CONFIG_PAX_NOEXEC=y # CONFIG_PAX_PAGEEXEC is not set # CONFIG_PAX_SEGMEXEC is not set # CONFIG_PAX_KERNEXEC is not set CONFIG_PAX_ASLR=y # CONFIG_PAX_RANDKSTACK is not set CONFIG_PAX_RANDUSTACK=y CONFIG_PAX_RANDMMAP=y # CONFIG_PAX_MEMORY_SANITIZE is not set # CONFIG_PAX_MEMORY_UDEREF is not set s.s: > cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Pentium(R) 4 CPU 3.40GHz stepping : 2 cpu MHz : 3412.280 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr lahf_lm bogomips : 6829.85 clflush size : 64 > grep -i pax /usr/src/linux-`uname -r`/.config # PaX CONFIG_PAX=y # PaX Control # CONFIG_PAX_SOFTMODE is not set CONFIG_PAX_EI_PAX=y CONFIG_PAX_PT_PAX_FLAGS=y CONFIG_PAX_NO_ACL_FLAGS=y # CONFIG_PAX_HAVE_ACL_FLAGS is not set # CONFIG_PAX_HOOK_ACL_FLAGS is not set CONFIG_PAX_NOEXEC=y # CONFIG_PAX_PAGEEXEC is not set # CONFIG_PAX_SEGMEXEC is not set # CONFIG_PAX_KERNEXEC is not set CONFIG_PAX_ASLR=y # CONFIG_PAX_RANDKSTACK is not set CONFIG_PAX_RANDUSTACK=y CONFIG_PAX_RANDMMAP=y # CONFIG_PAX_MEMORY_SANITIZE is not set # CONFIG_PAX_MEMORY_UDEREF is not set m.ah: > cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 1 cpu MHz : 2800.371 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr bogomips : 5605.21 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 1 cpu MHz : 2800.371 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl cid cx16 xtpr bogomips : 5600.27 clflush size : 64 > grep -i pax /usr/src/linux-`uname -r`/.config # PaX CONFIG_PAX=y # PaX Control # CONFIG_PAX_SOFTMODE is not set CONFIG_PAX_EI_PAX=y CONFIG_PAX_PT_PAX_FLAGS=y CONFIG_PAX_NO_ACL_FLAGS=y # CONFIG_PAX_HAVE_ACL_FLAGS is not set # CONFIG_PAX_HOOK_ACL_FLAGS is not set CONFIG_PAX_NOEXEC=y CONFIG_PAX_PAGEEXEC=y # CONFIG_PAX_SEGMEXEC is not set # CONFIG_PAX_EMUTRAMP is not set CONFIG_PAX_MPROTECT=y # CONFIG_PAX_NOELFRELOCS is not set # CONFIG_PAX_KERNEXEC is not set CONFIG_PAX_ASLR=y # CONFIG_PAX_RANDKSTACK is not set CONFIG_PAX_RANDUSTACK=y CONFIG_PAX_RANDMMAP=y CONFIG_PAX_MEMORY_SANITIZE=y # CONFIG_PAX_MEMORY_UDEREF is not set
can you try out the latest pax or grsec patches? i think i managed to reproduce and find out your problem. as confirmation, you can also just disable HIGHPTE and see if you still get the issue.
Are you saying you want me to grab the same kernel, or latest kernel, and patch it with the latest PAX myself, or that you want me to try the latest hardened-sources kernel in portage?
Nm.. I think I know what you mean. I'll see if I can do some testing to make the lockups happen again and then test again without HIGHPTE to see if they still happen.
for now you have to grab the patches directly from the grsec or pax test dirs (grsecurity.net/~spender or ~paxguy1), eventually they will be added to gentoo but right now you can't just emerge them.
Based on a comment made by the PaX Team in bug 198051, this issue /may/ be resolved by a patch included in the 2.6.23-r8 release. As such, I would be grateful if the reporter were to test that. In the even that the situation does not improve, note that a 2.6.24 release should be committed fairly soon.
(In reply to comment #16) > Nm.. I think I know what you mean. I'll see if I can do some testing to make > the lockups happen again and then test again without HIGHPTE to see if they > still happen. > Did you try this by any chance and if so could you share the result? Also, hardened-sources-2.6.24 is in the tree if you wouldn't mind seeing if your problem disappears with it since PaX Team thinks it may be fixed. Thanks.
(In reply to comment #19) > Did you try this by any chance and if so could you share the result? Also, > hardened-sources-2.6.24 is in the tree if you wouldn't mind seeing if your > problem disappears with it since PaX Team thinks it may be fixed. Thanks. actually the bug manifested on .24 as well so it's definitely not fixed. the only pattern i saw so far was that P4/HT boxes were affected (which points at some very subtle race somewhere in the scheduler, not clear how PaX is causing it though) but apparently not all of them in Travis' case, so it would be interesting to know how the .config of m.ah differs from the rest (not only the PaX bits but everything else as well as i think it's not configurable PaX feature that causes this, if it's PaX at all).
I no longer have access to the systems this problem presented on so I won't be able to offer further assistance in tracking it down.
This bug appears to have nowhere to go. Closing as NEEDINFO. Please re-open if any new information comes to light and/or it becomes apparent that it's still an issue in recent versions. Thanks.