I turned one core of a dual-core CPU off using "echo 0 > /sys/devices/system/cpu/cpu1/online" and later turned it back on again with "echo 1 > /sys/devices/system/cpu/cpu1/online". This resulted in a black screen followed by POST.
You could set up netconsole support and then repeat the experiment in order to get some context information about what actually happened. Here is a HOWTO: /usr/src/linux/Documentation/networking/netconsole.txt
Also check if you have reboot on panic enabled.
Can reproduce this using 2.6.34-grsec on a Core2 Duo 64 bit system. The reboot comes so fast that nothing reaches the logs after these two lines: [ 3784.210039] CPU 1 is now offline [ 3784.210045] SMP alternatives: switching to UP code
@reporter. hardened-sources-2.6.34 had known issues and was masked for a while. It has been replaced by hardened-sources-2.6.34-r1. If its not too much trouble, can you test with -r1. If the problem is repeats, please post your emerge --info and the kernel config.
2.6.34-hardened-r1 crashes in exactly the same way while 2.6.34-gentoo-r2 works correctly. Unfortunatley, netconsole did not show anything after boot, and instructing syslog-ng to log to a remote host also shows nothing after reactivating cpu1. A crash kernel loaded via kexec does not gain control, even though the crash kernel actually comes up when manually triggered via "echo c > /proc/sysrq-trigger". Hence - the POST is not a consequence of a kernel panic (else the crash kernel would gain control) -the crash is _probably_ related to hardened/grsec only (2.6.34-gentoo-r1 not tested though)
Okay this has to be looked at more carefully. Can I have your emerge --info, the kernel config file and hardware info.
Created attachment 240781 [details] output of emerge --info =hardened-sources-2.6.34-r1 (In reply to comment #6)
Created attachment 240785 [details] copy of /proc/config.gz for linux 2.6.34-hardened-r1 #7 SMP x86_64 Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz GenuineIntel GNU/Linux
Created attachment 240787 [details] Output of lshw on Dell Vostro 1510
Using kgdb via kgdboe, I managed to get a backtrace from near to the point where the system finally dies. (gdb) bt f #0 native_apic_mem_write (reg=784, v=16777216) at /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h:102 No locals. #1 0xffffffff8101a8e0 in apic_write (low=1552, id=<value optimized out>) at /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h:383 No locals. #2 native_apic_icr_write (low=1552, id=<value optimized out>) at arch/x86/kernel/apic/apic.c:271 No locals. #3 0xffffffff814730ff in apic_icr_write (apicid=1, cpu=1) at /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h:393 No locals. #4 wakeup_secondary_cpu_via_init (apicid=1, cpu=1) at arch/x86/kernel/smpboot.c:636 send_status = 1552 accept_status = 18446744073709551615 maxlvt = 5 j = 1 #5 do_boot_cpu (apicid=1, cpu=1) at arch/x86/kernel/smpboot.c:806 boot_error = <value optimized out> start_ip = 1552 timeout = <value optimized out> c_idle = {work = {data = {counter = 0}, entry = {next = 0xffff88011abfbd30, prev = 0xffff88011abfbd30}, func = 0xffffffff814734b0 <do_fork_idle>}, idle = 0xffff88013fa723a0, done = {done = 0, wait = {lock = {{rlock = {raw_lock = { slock = 0}}}}, task_list = {next = 0xffff88011abfbd60, prev = 0xffff88011abfbd60}}}, cpu = 1} #6 0xffffffff814733d0 in native_cpu_up (cpu=1) at arch/x86/kernel/smpboot.c:919 apicid = 1 flags = <value optimized out> err = <value optimized out> __func__ = "native_cpu_up" #7 0xffffffff81474f54 in __cpu_up (cpu=1) at /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/smp.h:96 No locals. #8 _cpu_up (cpu=1) at kernel/cpu.c:317 ret = -2124205856 nr_calls = 38 hcpu = 0x1 #9 cpu_up (cpu=1) at kernel/cpu.c:356 err = <value optimized out> #10 0xffffffff81467978 in store_online (dev=0xffff880001d04428, attr=0x1000000, buf=<value optimized out>, count=2) at drivers/base/cpu.c:50 cpu = 0x0 ret = -2124205856 #11 0xffffffff812a444b in sysdev_store (kobj=<value optimized out>, attr=0x1000000, buffer=0x1ff726 <Address 0x1ff726 out of bounds>, count=0) at drivers/base/sys.c:52 No locals. #12 0xffffffff81120fa5 in flush_write_buffer (file=<value optimized out>, buf=<value optimized out>, count=<value optimized out>, ppos=<value optimized out>) at fs/sysfs/file.c:209 attr_sd = 0xffff88013fa5dca8 kobj = 0xffff880001d04438 ---Type <return> to continue, or q <return> to quit---q Quit (gdb) step 103 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 102 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 105 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 108 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) native_apic_mem_write (reg=768, v=1552) at /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h:102 102 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 103 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 102 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) 105 in /usr/src/linux-2.6.34-hardened-r1/arch/x86/include/asm/apic.h (gdb) Ignoring packet error, continuing...
Non-hardened 2.6.34-gentoo-r1, apart from missing grsec features identically configured, is unaffected: $ uname -a Linux localhost 2.6.34-gentoo-r1 #1 SMP Thu Aug 5 16:33:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz GenuineIntel GNU/Linux [ 209.070052] CPU 1 is now offline [ 209.070058] SMP alternatives: switching to UP code [ 255.720638] SMP alternatives: switching to SMP code [ 255.731461] Booting Node 0 Processor 1 APIC 0x1 [ 255.720301] CPU1: Thermal monitoring handled by SMI
(In reply to comment #11) > Non-hardened 2.6.34-gentoo-r1, apart from missing grsec features identically > configured, is unaffected: > I have confirmed you above findings. Upstream suspects that this is due to some new PaX code in SMP which was introduced in 2.6.34.
I'm looking at this again and found that the problem persists in hardened-sources-2.6.34-r2. The bug is clearly introduced by PaX. CONFIG_PAX_KERNEXEC=y triggers the problem. But what I haven't been able to figure out is the connection between kernel page protection and your backtrace showing where the problem is hit.
I tested the latest hardened-sources-2.6.35 and this problem persists. I'm going to pass this one by upstream again.
can someone tell me if i386 is affected as well (seems to work here) or only amd64?
(In reply to comment #15) > can someone tell me if i386 is affected as well (seems to work here) or only > amd64? > I tested on i386 and amd64 running on identical hardware. This problem *only* affects the amd64 system.
Bug is still present using sys-kernel/hardened-sources-2.6.35-r2 on amd64. > The bug is clearly introduced by PaX. CONFIG_PAX_KERNEXEC=y > triggers the problem. But what I haven't been able to figure > out is the connection between kernel page protection and > your backtrace showing where the problem is hit. As I understand it, the problem is hit after an APIC I/O operation completed, presumably one which is finally restarting CPU1. Hence, without having an ICE/debugger available one probably can't come much closer.
(In reply to comment #17) > As I understand it, the problem is hit after an APIC I/O operation completed, > presumably one which is finally restarting CPU1. Hence, without having an > ICE/debugger available one probably can't come much closer. actually if you try this under qemu, you'll see that the problem is with the initial page tables that somehow get broken by the time the CPU is woken up (even though the very same page tables worked fine during init and nothing should have modified them since). i'm still trying to understand the root cause, i'll let you know when i figured it out.
(In reply to comment #18) > i'm still trying to understand the root cause, i'll let you know when i figured it out. the latest patches (both PaX and grsec) should fix this problem.
(In reply to comment #19) > (In reply to comment #18) > > i'm still trying to understand the root cause, i'll let you know when i figured it out. > > the latest patches (both PaX and grsec) should fix this problem. > Confirmed. The fix will be in hardened-sources-2.6.32-r26 hardened-sources-2.6.36-r1 which will hit the tree this afternoon. When one of these (or above) is stabilized, I'll close this bug. Thanks pipacs!
Just stabilized hardened-sources-2.6.32-r31.ebuild and hardened-sources-2.6.36-r6.ebuild which include the fix. Closing.