Created attachment 323732 [details] Kernel-Config During boot of the Dom0 in 3.4.9 I find the following 6 times (i guess once for each CPU core): [ 2.358574] ------------[ cut here ]------------ [ 2.358613] WARNING: at arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17() [ 2.358642] Hardware name: To Be Filled By O.E.M. [ 2.358667] Modules linked in: [ 2.358713] Pid: 0, comm: swapper/5 Tainted: G W 3.4.9-gentoo-64bit #2 [ 2.358743] Call Trace: [ 2.358768] <IRQ> [<ffffffff8104071e>] warn_slowpath_common+0x80/0x98 [ 2.358818] [<ffffffff8104074b>] warn_slowpath_null+0x15/0x17 [ 2.358845] [<ffffffff81003411>] xen_apic_write+0x15/0x17 [ 2.358873] [<ffffffff8101fd6e>] perf_events_lapic_init+0x2e/0x30 [ 2.358900] [<ffffffff8101ff39>] x86_pmu_enable+0x1c9/0x243 [ 2.358927] [<ffffffff810b3851>] perf_pmu_enable+0x21/0x23 [ 2.358953] [<ffffffff8101e9c9>] x86_pmu_commit_txn+0x84/0x9a [ 2.358980] [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8 [ 2.359007] [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8 [ 2.359034] [<ffffffff81032725>] ? pvclock_clocksource_read+0x48/0xb8 [ 2.359061] [<ffffffff810b47c8>] ? event_sched_in+0x7c/0x10e [ 2.359088] [<ffffffff810b48e2>] group_sched_in+0x88/0x127 [ 2.359115] [<ffffffff810b4e4c>] __perf_event_enable+0xcf/0x123 [ 2.359141] [<ffffffff810b217d>] remote_function+0x3c/0x43 [ 2.359169] [<ffffffff81366209>] ? _raw_spin_lock_irq+0xb/0x24 [ 2.360503] [<ffffffff81081db7>] generic_smp_call_function_single_interrupt+0xc7/0xea [ 2.360533] [<ffffffff8100ee64>] xen_call_function_single_interrupt+0xe/0x22 [ 2.360560] [<ffffffff81095b4e>] handle_irq_event_percpu+0x5a/0x196 [ 2.360587] [<ffffffff81098299>] handle_percpu_irq+0x39/0x4d [ 2.360614] [<ffffffff8124ebd0>] __xen_evtchn_do_upcall+0x147/0x1e3 [ 2.360641] [<ffffffff8125078f>] xen_evtchn_do_upcall+0x2a/0x3c [ 2.360668] [<ffffffff8136810e>] xen_do_hypervisor_callback+0x1e/0x30 [ 2.360694] <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 2.360743] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 2.360770] [<ffffffff810081d0>] ? xen_safe_halt+0x10/0x18 [ 2.360796] [<ffffffff81018f58>] ? default_idle+0xb1/0x14c [ 2.360823] [<ffffffff81019856>] ? cpu_idle+0xb3/0xd2 [ 2.360850] [<ffffffff81008819>] ? xen_irq_enable_direct_reloc+0x4/0x4 [ 2.360877] [<ffffffff81357857>] ? cpu_bringup_and_idle+0xe/0x10 [ 2.360904] ---[ end trace 41ef0ee79c2c0f37 ]---
I do not know if this is related however the box crashes at least once a week. It seems that if the free memory (without substracting buffers/cached) reaches 0 something goes horribly wrong.
According to bug #435546 this might have been fixed already. Can you try more recent kernels like stable gentoo-sources 3.6.11 and development git-sources-3.8_rc3?
3.6.11 just bombed on me again today. I am currently thinking the problem is xen and not really the kernel.
https://patchwork.kernel.org/patch/1636911/ landed in linux 3.7-rc3 and happens at the same place in the code arch/x86/xen/enlighten.c:860 xen_apic_write+0x15/0x17() Please try a more recent unstable 3.7 or development 3.8 kernel, take your pick: Unstable -------- echo "sys-kernel/gentoo-sources" >> /etc/portage/package.accept_keywords emerge -uDN gentoo-sources eselect kernel set linux-3.7.2-gentoo Development ----------- echo "sys-kernel/git-sources" >> /etc/portage/package.accept_keywords emerge git-sources eselect kernel set linux-3.8-rc3 Then follow the kernel upgrade guide like usual.
Ah there is a bit of information I forgot: The free mem reaches 0 is not correct. It crashes shortly after passing 512MB of swapped out memory.
Also for 3.6.11 the messages look a bit different. Now I get: (XEN) physdev.c:155: dom0: wrong map_pirq type 3 (XEN) Xen WARN at msi.c:659 (XEN) ----[ Xen-4.1.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48016d95c>] pci_enable_msi+0x6fc/0x910 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: 00000000ffffffff rbx: 0000000000002003 rcx: 00000000000fe3dc (XEN) rdx: 0000000000000029 rsi: ffff82c4802380d6 rdi: ffff83021ec34c9c (XEN) rbp: ffff83012846d0e0 rsp: ffff82c480297d28 r8: 0000000000000001 (XEN) r9: 0000000000000000 r10: 0000000000000008 r11: 0000000000000000 (XEN) r12: ffff82c480297e20 r13: 000000000000000a r14: ffff83012846ddb0 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000215556000 cr2: 00007f633e50e000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c480297d28: (XEN) 00000000ffffffff 0000000000000001 0000000000000149 00000000fe3dc000 (XEN) 0000000000000000 0000000000000001 0000006000000004 00000000000fe3dc (XEN) 0000006200000111 00000000fe3dc000 ffff830200000003 0000000000000111 (XEN) 0000000000000111 00000000000fe3dc ffff83012846d178 000982c4801828ae (XEN) 0000000000000cfc 0000000000000111 000000000000001e ffff830219172000 (XEN) 00000000ffffffed 0000000000000111 000000000000001e ffff82c480170955
No commits related to that message found, still makes me wonder whether that other fix fixes this as well...
I have a serial console on the xen console. So when the crash happens what I do see is a general protection fault from Xen. Can this really be caused by a Dom0 kernel?
Your error message contains > dom0: wrong map_pirq type 3 so I suppose this is the hypervisor reporting that there is something going wrong with the Dom0 kernel.
Well any hint on _what_ is going wrong would be great. Also gentoo-sources 3.5.7 was crashing much faster then 3.4.9 or 3.6.11.
(In reply to comment #10) > Well any hint on _what_ is going wrong would be great. (Comment #4) > https://patchwork.kernel.org/patch/1636911/ landed in linux 3.7-rc3 and > happens at the same place in the code arch/x86/xen/enlighten.c:860 > xen_apic_write+0x15/0x17() Well, you haven't tried a kernel that incorporates this fix yet; unless the fix has been backported (see whether the changes from that patch were applied). As long as that bug is still around we can't assume that you are experiencing a independent new bug with xen... Given that the new error you gave hasn't been patched, unless the patch from above does so; you might want to report it upstream at https://bugzilla.kernel.org/ so they can take a look at it. But that assumes you have tried the development kernel, or there exists a chance they will ask you to do that. Can you please leave a link to the upstream bug here if you do that? Good luck!
Would 3.7.3-gentoo do as well for the test?
(In reply to comment #12) > Would 3.7.3-gentoo do as well for the test? Yes, that should suffice as well, any 3.7 version would include that patch.
gentoo-sources-3.7.3 standing by for next reboot
OK I bootet 3.7.3 with Xen 4.2 Swapped out 1GB - no panic It seems this is solve the issue. xl dmesg does not show any errors from the boot, so I hope all is well, case closed.