434976 – sys-kernel/gentoo-sources-3.4.9 - arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17() : warn_slowpath_common+0x80/0x98

Bug 434976 - sys-kernel/gentoo-sources-3.4.9 - arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17() : warn_slowpath_common+0x80/0x98

Summary: sys-kernel/gentoo-sources-3.4.9 - arch/x86/xen/enlighten.c:860 xen_apic_write...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers

URL:	https://patchwork.kernel.org/patch/16...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-14 07:19 UTC by Konstantin Agouros
Modified:	2013-02-09 12:15 UTC (History)
CC List:	0 users

See Also:	435546
Package list:
Runtime testing required:	---

Attachments
Kernel-Config (.config,80.86 KB, text/plain) 2012-09-14 07:19 UTC, Konstantin Agouros	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Konstantin Agouros 2012-09-14 07:19:27 UTC

Created attachment 323732 [details]
Kernel-Config

During boot of the Dom0 in 3.4.9 I find the following 6 times (i guess once for each CPU core):

[    2.358574] ------------[ cut here ]------------
[    2.358613] WARNING: at arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17()
[    2.358642] Hardware name: To Be Filled By O.E.M.
[    2.358667] Modules linked in:
[    2.358713] Pid: 0, comm: swapper/5 Tainted: G        W    3.4.9-gentoo-64bit #2
[    2.358743] Call Trace:
[    2.358768]  <IRQ>  [<ffffffff8104071e>] warn_slowpath_common+0x80/0x98
[    2.358818]  [<ffffffff8104074b>] warn_slowpath_null+0x15/0x17
[    2.358845]  [<ffffffff81003411>] xen_apic_write+0x15/0x17
[    2.358873]  [<ffffffff8101fd6e>] perf_events_lapic_init+0x2e/0x30
[    2.358900]  [<ffffffff8101ff39>] x86_pmu_enable+0x1c9/0x243
[    2.358927]  [<ffffffff810b3851>] perf_pmu_enable+0x21/0x23
[    2.358953]  [<ffffffff8101e9c9>] x86_pmu_commit_txn+0x84/0x9a
[    2.358980]  [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8
[    2.359007]  [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8
[    2.359034]  [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8
[    2.359061]  [<ffffffff810b47c8>] ? event_sched_in+0x7c/0x10e
[    2.359088]  [<ffffffff810b48e2>] group_sched_in+0x88/0x127
[    2.359115]  [<ffffffff810b4e4c>] __perf_event_enable+0xcf/0x123
[    2.359141]  [<ffffffff810b217d>] remote_function+0x3c/0x43
[    2.359169]  [<ffffffff81366209>] ? _raw_spin_lock_irq+0xb/0x24
[    2.360503]  [<ffffffff81081db7>] generic_smp_call_function_single_interrupt+0xc7/0xea
[    2.360533]  [<ffffffff8100ee64>] xen_call_function_single_interrupt+0xe/0x22
[    2.360560]  [<ffffffff81095b4e>] handle_irq_event_percpu+0x5a/0x196
[    2.360587]  [<ffffffff81098299>] handle_percpu_irq+0x39/0x4d
[    2.360614]  [<ffffffff8124ebd0>] __xen_evtchn_do_upcall+0x147/0x1e3
[    2.360641]  [<ffffffff8125078f>] xen_evtchn_do_upcall+0x2a/0x3c
[    2.360668]  [<ffffffff8136810e>] xen_do_hypervisor_callback+0x1e/0x30
[    2.360694]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[    2.360743]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[    2.360770]  [<ffffffff810081d0>] ? xen_safe_halt+0x10/0x18
[    2.360796]  [<ffffffff81018f58>] ? default_idle+0xb1/0x14c
[    2.360823]  [<ffffffff81019856>] ? cpu_idle+0xb3/0xd2
[    2.360850]  [<ffffffff81008819>] ? xen_irq_enable_direct_reloc+0x4/0x4
[    2.360877]  [<ffffffff81357857>] ? cpu_bringup_and_idle+0xe/0x10
[    2.360904] ---[ end trace 41ef0ee79c2c0f37 ]---

Comment 1 Konstantin Agouros 2012-09-14 07:20:16 UTC

I do not know if this is related however the box crashes at least once a week.
It seems that if the free memory (without substracting buffers/cached) reaches 0 something goes horribly wrong.

Comment 2 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-20 17:47:55 UTC

According to bug #435546 this might have been fixed already.

Can you try more recent kernels like stable gentoo-sources 3.6.11 and development git-sources-3.8_rc3?

Comment 3 Konstantin Agouros 2013-01-20 17:51:02 UTC

3.6.11 just bombed on me again today.

I am currently thinking the problem is xen and not really the kernel.

Comment 4 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-20 18:03:25 UTC

https://patchwork.kernel.org/patch/1636911/ landed in linux 3.7-rc3 and happens at the same place in the code arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17()

Please try a more recent unstable 3.7 or development 3.8 kernel, take your pick:

Unstable
--------

    echo "sys-kernel/gentoo-sources" >> /etc/portage/package.accept_keywords
    emerge -uDN gentoo-sources
    eselect kernel set linux-3.7.2-gentoo

Development
-----------

    echo "sys-kernel/git-sources" >> /etc/portage/package.accept_keywords
    emerge git-sources
    eselect kernel set linux-3.8-rc3

Then follow the kernel upgrade guide like usual.

Comment 5 Konstantin Agouros 2013-01-20 18:07:57 UTC

Ah there is a bit of information I forgot:

The free mem reaches 0 is not correct. It crashes shortly after passing 512MB of swapped out memory.

Comment 6 Konstantin Agouros 2013-01-20 18:10:19 UTC

Also for 3.6.11 the messages look a bit different. Now I get:

(XEN) physdev.c:155: dom0: wrong map_pirq type 3
(XEN) Xen WARN at msi.c:659
(XEN) ----[ Xen-4.1.1  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48016d95c>] pci_enable_msi+0x6fc/0x910
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: 00000000ffffffff   rbx: 0000000000002003   rcx: 00000000000fe3dc
(XEN) rdx: 0000000000000029   rsi: ffff82c4802380d6   rdi: ffff83021ec34c9c
(XEN) rbp: ffff83012846d0e0   rsp: ffff82c480297d28   r8:  0000000000000001
(XEN) r9:  0000000000000000   r10: 0000000000000008   r11: 0000000000000000
(XEN) r12: ffff82c480297e20   r13: 000000000000000a   r14: ffff83012846ddb0
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000215556000   cr2: 00007f633e50e000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d28:
(XEN)    00000000ffffffff 0000000000000001 0000000000000149 00000000fe3dc000
(XEN)    0000000000000000 0000000000000001 0000006000000004 00000000000fe3dc
(XEN)    0000006200000111 00000000fe3dc000 ffff830200000003 0000000000000111
(XEN)    0000000000000111 00000000000fe3dc ffff83012846d178 000982c4801828ae
(XEN)    0000000000000cfc 0000000000000111 000000000000001e ffff830219172000
(XEN)    00000000ffffffed 0000000000000111 000000000000001e ffff82c480170955

Comment 7 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-20 18:16:50 UTC

No commits related to that message found, still makes me wonder whether that other fix fixes this as well...

Comment 8 Konstantin Agouros 2013-01-20 18:23:13 UTC

I have a serial console on the xen console. So when the crash happens what I do see is a general protection fault from Xen. Can this really be caused by a Dom0 kernel?

Comment 9 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-20 19:04:40 UTC

Your error message contains

> dom0: wrong map_pirq type 3

so I suppose this is the hypervisor reporting that there is something going wrong with the Dom0 kernel.

Comment 10 Konstantin Agouros 2013-01-20 19:16:17 UTC

Well any hint on _what_ is going wrong would be great.

Also gentoo-sources 3.5.7 was crashing much faster then 3.4.9 or 3.6.11.

Comment 11 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-20 19:51:19 UTC

(In reply to comment #10)
> Well any hint on _what_ is going wrong would be great.

(Comment #4)
> https://patchwork.kernel.org/patch/1636911/ landed in linux 3.7-rc3 and
> happens at the same place in the code arch/x86/xen/enlighten.c:860
> xen_apic_write+0x15/0x17()

Well, you haven't tried a kernel that incorporates this fix yet; unless the fix has been backported (see whether the changes from that patch were applied). As long as that bug is still around we can't assume that you are experiencing a independent new bug with xen...

Given that the new error you gave hasn't been patched, unless the patch from above does so; you might want to report it upstream at https://bugzilla.kernel.org/ so they can take a look at it. But that assumes you have tried the development kernel, or there exists a chance they will ask you to do that.

Can you please leave a link to the upstream bug here if you do that?

Good luck!

Comment 12 Konstantin Agouros 2013-01-21 22:45:24 UTC

Would 3.7.3-gentoo do as well for the test?

Comment 13 Tom Wijsman (TomWij) (RETIRED) gentoo-dev

2013-01-22 02:16:38 UTC

(In reply to comment #12)
> Would 3.7.3-gentoo do as well for the test?

Yes, that should suffice as well, any 3.7 version would include that patch.

Comment 14 Konstantin Agouros 2013-01-23 22:18:00 UTC

gentoo-sources-3.7.3 standing by for next reboot

Comment 15 Konstantin Agouros 2013-02-05 12:26:14 UTC

OK 

I bootet 3.7.3 with Xen 4.2

Swapped out 1GB - no panic
It seems this is solve the issue. 

xl dmesg does not show any errors from the boot, so I hope all is well, case closed.