Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 431048 - sys-kernel/hardened-sources-3.4.7 panic on UP->SMP transition when putting CPU core online - BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8105a31d>] __setup_vector_irq+0xed/0x130
Summary: sys-kernel/hardened-sources-3.4.7 panic on UP->SMP transition when putting CP...
Status: RESOLVED OBSOLETE
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Anthony Basile
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-12 08:15 UTC by Jaak Ristioja
Modified: 2016-02-22 08:08 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel-config-3.4.7.gz (config.gz,19.64 KB, application/x-gzip)
2012-08-12 08:17 UTC, Jaak Ristioja
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jaak Ristioja 2012-08-12 08:15:51 UTC
I'm using the following two scripts (POWERSAVE and ONDEMAND) to save more power on my laptop:

#!/bin/sh
cpufreq-set -c 0 -g powersave
cpufreq-set -c 1 -g powersave
echo 0 > /sys/devices/system/cpu/cpu1/online

#!/bin/sh
echo 1 > /sys/devices/system/cpu/cpu1/online
cpufreq-set -c 0 -g ondemand
cpufreq-set -c 1 -g ondemand

I've never had problems with this before, but today, the ONDEMAND (second) script switched from X to the text console and caused the system to hang for a while (caps-lock led blinking). After this I got back to X, and was able to save the following kernel messages:

[226577.512703] SMP alternatives: switching to SMP code
[226577.520787] Booting Node 0 Processor 1 APIC 0x1
[226577.537881] microcode: CPU1 updated to revision 0xa0b, date = 2010-09-28
[226956.640150] CPU 1 is now offline
[226956.640158] SMP alternatives: switching to UP code
[227100.239522] SMP alternatives: switching to SMP code
[227100.250889] Booting Node 0 Processor 1 APIC 0x1
[227100.267774] microcode: CPU1 updated to revision 0xa0b, date = 2010-09-28
[227159.850914] CPU 1 is now offline
[227159.850921] SMP alternatives: switching to UP code
[227423.979296] e1000e: eth0 NIC Link is Down
[227425.457808] SMP alternatives: switching to SMP code
[227425.469829] Booting Node 0 Processor 1 APIC 0x1
[227159.850420] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[227159.850420] IP: [<ffffffff8105a31d>] __setup_vector_irq+0xed/0x130
[227159.850420] PGD 0 
[227159.850420] Oops: 0000 [#1] PREEMPT SMP 
[227159.850420] CPU 1 
[227159.850420] Modules linked in: snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_timer snd snd_page_alloc psmouse
[227159.850420] 
[227159.850420] Pid: 0, comm: swapper/1 Not tainted 3.4.7-hardened-arm #1 FUJITSU SIEMENS ESPRIMO Mobile U9210/S118DB
[227159.850420] RIP: 0010:[<ffffffff8105a31d>]  [<ffffffff8105a31d>] __setup_vector_irq+0xed/0x130
[227159.850420] RSP: 0000:ffff8802344b5eb8  EFLAGS: 00010046
[227159.850420] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[227159.850420] RDX: 0000000000000000 RSI: 000000000000002d RDI: 0000000000000001
[227159.850420] RBP: ffff8802344b5ef8 R08: 0000000000000000 R09: ffff880236400180
[227159.850420] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[227159.850420] R13: 00000000000000c1 R14: 000000000000a544 R15: 000000000000a240
[227159.850420] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[227159.850420] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[227159.850420] CR2: 0000000000000008 CR3: 00000000017b0000 CR4: 00000000000406b0
[227159.850420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[227159.850420] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[227159.850420] Process swapper/1 (pid: 0, threadinfo ffff88023687e8d0, task ffff88023687e4a0)
[227159.850420] Stack:
[227159.850420]  ffff8802344b5ed8 ffffffff0000002f 0000cde067e39aa3 0000000000000001
[227159.850420]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[227159.850420]  ffff8802344b5f08 ffffffff8103dc59 ffff8802344b5f28 ffffffff8177d0fd
[227159.850420] Call Trace:
[227159.850420]  [<ffffffff8103dc59>] setup_vector_irq+0x9/0x10
[227159.850420]  [<ffffffff8177d0fd>] smp_callin+0xee/0x13e
[227159.850420]  [<ffffffff8177d171>] start_secondary+0x24/0xcb
[227159.850420]  [<ffffffff8177d14d>] ? smp_callin+0x13e/0x13e
[227159.850420] Code: 6a 60 a7 00 7f 88 45 31 ed 0f 1f 44 00 00 49 63 c5 4d 8d 34 87 4c 89 f0 48 03 04 dd 70 10 a5 81 8b 38 85 ff 78 1f e8 a3 eb ff ff <44> 0f a3 60 08 19 c0 85 c0 75 0f 4c 03 34 dd 70 10 a5 81 41 c7 
[227159.850420] RIP  [<ffffffff8105a31d>] __setup_vector_irq+0xed/0x130
[227159.850420]  RSP <ffff8802344b5eb8>
[227159.850420] CR2: 0000000000000008
[227159.850420] ---[ end trace ca44ece634ce04d7 ]---
[227159.850420] Kernel panic - not syncing: Attempted to kill the idle task!
[227159.850420] panic occurred, switching back to text console
[227430.942004] CPU1: Stuck ??

The caps-lock led continued blinking and after a few minutes the system hung again and I had to cold boot.
Comment 1 Jaak Ristioja 2012-08-12 08:17:01 UTC
Created attachment 321078 [details]
kernel-config-3.4.7.gz

The /proc/config.gz file.
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2012-08-12 15:28:48 UTC
3.4.7-hardened-arm

I assume that name doesn't reflect the actual processor architecture in use?
Comment 3 Anthony Basile gentoo-dev 2012-08-12 22:06:47 UTC
(In reply to comment #2)
> 3.4.7-hardened-arm
> 
> I assume that name doesn't reflect the actual processor architecture in use?

Its amd64 from the panic.

@Jaak.  A few things before passing this to hardened upstream.  Can you upload

1) Your bzImage

2) Your vmlinux.  It should be in the root of /usr/src/linux

3) Your System.map


If you can, check to see if vanilla 3.4.7 gives the same panic.
Comment 4 PaX Team 2013-01-18 20:53:51 UTC
is this still a problem with 3.7?
Comment 5 Anthony Basile gentoo-dev 2013-04-13 22:45:50 UTC
Please reopen if this is still a problem
Comment 6 Jaak Ristioja 2013-04-14 12:25:08 UTC
(In reply to comment #2)
> I assume that name doesn't reflect the actual processor architecture in use?

Yes, I'm on amd64 (Intel Core2 Duo P8700). "Arm" is Estonian for "grace". I'm just grateful to God for providing me the hardware. :)


(In reply to comment #3)
> @Jaak.  A few things before passing this to hardened upstream.  Can you
> upload
> 
> 1) Your bzImage
> 
> 2) Your vmlinux.  It should be in the root of /usr/src/linux
> 
> 3) Your System.map
> 
> 
> If you can, check to see if vanilla 3.4.7 gives the same panic.

Hmm... For some reason, I didn't receive any notification emails about any comments to this bug. The first and only notification email was for comment #5. I even checked my spam folders and email server logs, nothing else for this bug ever hit my mailbox.

So... since a long time has passed since, I don't have these files any more. Sorry.


(In reply to comment #4)
> is this still a problem with 3.7?

When I reported the bug was the only time I have experienced it. I'm currently unable to produce this with hardened-sources-3.8.6.
Comment 7 Anthony Basile gentoo-dev 2013-04-14 12:30:36 UTC
(In reply to comment #6)
> (In reply to comment #2)
> > I assume that name doesn't reflect the actual processor architecture in use?
> 
> Yes, I'm on amd64 (Intel Core2 Duo P8700). "Arm" is Estonian for "grace".
> I'm just grateful to God for providing me the hardware. :)

Haha okay :)

> 
> (In reply to comment #4)
> > is this still a problem with 3.7?
> 
> When I reported the bug was the only time I have experienced it. I'm
> currently unable to produce this with hardened-sources-3.8.6.

Okay thank you.  This bug has become obsolete.  There is no reason to return to the earlier version and figure out what was wrong.
Comment 8 PaX Team 2013-04-14 14:34:41 UTC
(In reply to comment #7)
> Okay thank you.  This bug has become obsolete.  There is no reason to return
> to the earlier version and figure out what was wrong.

actually, there is ;). i took a look at the code and here's where it died:

arch/x86/kernel/apic/io_apic.c:__setup_vector_irq

1252 »·······/* Mark the free vectors */
1253 »·······for (vector = 0; vector < NR_VECTORS; ++vector) {
1254 »·······»·······irq = per_cpu(vector_irq, cpu)[vector];
1255 »·······»·······if (irq < 0)
1256 »·······»·······»·······continue;
1257
1258 »·······»·······cfg = irq_cfg(irq); // returned NULL
1259 »·······»·······if (!cpumask_test_cpu(cpu, cfg->domain))
1260 »·······»·······»·······per_cpu(vector_irq, cpu)[vector] = -1;
1261 »·······}

the call to irq_cfg is a wrapper around irq_get_chip_data which can and seemingly does return NULL and its callers check for it elsewhere, but in this file it's not done consistently, even in 3.8.7. so i suggest that you tell the kernel devs about it as i think it's a problem in vanilla itself. i can add the obvious NULL checks but i don't know what the correct reaction to them is in each case.
Comment 9 Anthony Basile gentoo-dev 2013-04-14 17:45:13 UTC
Thanks pipacs!
Comment 10 Jaak Ristioja 2016-02-22 08:08:07 UTC
(In reply to PaX Team from comment #8)
> so i suggest that you
> tell the kernel devs about it as i think it's a problem in vanilla itself.

So did anyone tell the kernel devs or is this also already irrelevant at this point?