Summary: | [2.6.27.6 regression] vmware guest panics on boot with CONFIG_VMI=Y | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Norman Back <gentoo3> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | vmware+disabled, wolf31o2 |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
URL: | http://bugzilla.kernel.org/show_bug.cgi?id=12167 | ||
Whiteboard: | linux-2.6.27.6-regression | ||
Package list: | Runtime testing required: | --- | |
Attachments: |
config-2.6.27-gentoo-r4-3
Extract of vmware debug log /proc/cpuinfo lspci from the vmware guest |
Description
Norman Back
2008-12-04 00:01:26 UTC
Created attachment 174206 [details]
config-2.6.27-gentoo-r4-3
Created attachment 174207 [details]
Extract of vmware debug log
Interesting lines in this log are the panic:
Dec 03 22:58:25.136: vcpu-0| Unknown int 10h func 0x0000
Dec 03 22:58:25.315: vcpu-0| Entering paravirt mode on vcpu 0
Dec 03 22:58:25.925: vcpu-0| Exiting on CLI;HLT at 0x60:0xc0100359
Dec 03 22:58:25.945: vmx| Stopping VCPU threads...
and the screen shot:
Dec 03 22:58:26.005: vmx| Decompressing Linux... Parsing ELF... done.
Dec 03 22:58:26.006: vmx| Booting the kernel.
Dec 03 22:58:26.006: vmx|
Dec 03 22:58:26.007: vmx|
Dec 03 22:58:26.007: vmx|
Dec 03 22:58:26.008: vmx|
Dec 03 22:58:26.008: vmx|
Dec 03 22:58:26.009: vmx|
Dec 03 22:58:26.010: vmx|
Dec 03 22:58:26.011: vmx| BUG: Int 14: CR2 fbe00000
Dec 03 22:58:26.011: vmx| EDI c05b1f98 ESI fbe00000 EBP 00a6e003 ESP c05b1f7c
Dec 03 22:58:26.012: vmx| EBX c05b1f98 EDX 0000000e ECX 00000003 EAX fbe00000
Dec 03 22:58:26.012: vmx| err 00000000 EIP c05db95c CS 00000062 flg 00010092
Dec 03 22:58:26.013: vmx| Stack: c00cc618 c00cc625 00000003 00000000 00000000 00000563 c05b1ff8 fbe00000
Dec 03 22:58:26.013: vmx| fbe10000 fbe00000 c05dba7e c05b1ff8 c05b1ff8 00646513 00609000 c05bac50
Dec 03 22:58:26.014: vmx| 00000800 00099d00 c059a000 00a6e003 00000800 00099d00 c059a000 c05b66d2
Created attachment 174213 [details]
/proc/cpuinfo
Created attachment 174214 [details]
lspci from the vmware guest
lspci from the vmware guest after booting with CONFIG_VMI=N
Please try with CONFIG_COMPAT_VDSO disabled and see if that changes anything. Is there a kernel version where this option has worked for you? Is the host system running Gentoo? What kernel is it running? Vmware Host uname -a Linux diamond 2.6.27-gentoo-r4-1 #1 SMP PREEMPT Sat Nov 22 07:50:50 GMT 2008 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ AuthenticAMD GNU/Linux Mother board is Asrock ALiveNF5SLI-1394 with 8GB ram Will try CONFIG_COMPAT_VDSO=N on guest as suggested. Tried sys-kernel/gentoo-sources-2.6.24-r8 and it boots OK as vmware guest with CONFIG_VMI=Y After a bit more testing with CONFIG_VMI=Y: sys-kernel/gentoo-sources-2.6.27-r2 boots OK sys-kernel/gentoo-sources-2.6.27-r3 panics with Int 14: CR2 Can you clarify several things please? Where are you trying to enable/disable CONFIG_VMI option? On hosts kernel or on guest? The first attachment is from the host machine or the guest one? When you are trying different kernels are you using the *exact* same configuration? "Where are you trying to enable/disable CONFIG_VMI option? On hosts kernel or on guest?" I was trying to enable the CONFIG_VMI option on the guest. "The first attachment is from the host machine or the guest one?" From the guest. The host is running sys-kernel/gentoo-sources-2.6.27-r4 "When you are trying different kernels are you using the *exact* same configuration?" I used "make oldconfig" to upgrade .config from sys-kernel/gentoo-sources-2.6.27-r2 to sys-kernel/gentoo-sources-2.6.27-r3 accepting the default replies. (In reply to comment #10) > I used "make oldconfig" to upgrade .config from > sys-kernel/gentoo-sources-2.6.27-r2 to sys-kernel/gentoo-sources-2.6.27-r3 > accepting the default replies. So this comes down to the difference between K_GENPATCHES_VER=4 (2.6.27-r2) http://sources.gentoo.org/viewcvs.py/linux-patches/genpatches-2.6/tags/2.6.27-4/ and K_GENPATCHES_VER=5 (2.6.27-r3) http://sources.gentoo.org/viewcvs.py/linux-patches/genpatches-2.6/tags/2.6.27-5/ or in terms of "vanilla" 2.6.27.4 <--> 2.6.27.6 Could you try "vanilla" sources? This would narrow it down further more. Cheers Axel 2.6.27.4 and 2.6.27.5 boot OK. 2.6.27.6 panics with Int 14: CR2 great, thanks for the fast diagnosis Please try disabling CONFIG_X86_RESERVE_LOW_64K on 2.6.27.6 (In reply to comment #14) > Please try disabling CONFIG_X86_RESERVE_LOW_64K on 2.6.27.6 Tried this but still panics Int 14: CR2 (In reply to comment #15) > (In reply to comment #14) > > Please try disabling CONFIG_X86_RESERVE_LOW_64K on 2.6.27.6 > > Tried this but still panics Int 14: CR2 Hmm, not all changes in "setup_arch" are disabled when setting CONFIG_X86_RESERVE_LOW_64K=n. It's just the "bad_bios_dmi_table" that is made empty. You could try to insert various "printk()" statements into "setup_arch()" (arch/x86/kernel/setup.c) from 2.6.27.6 and maybe figure out at which point the kernel crashes. Just an idea. This is what I would try next ... There are other changes in 2.6.27.6 and our guess that it was related to the 64k reservation was probably wrong. As a next step I would suggest doing a bisection to find (for sure) the exact commit that introduced the bug. The process is described here: http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ but you want to use the following git tree, not the one described there: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git use v2.6.27.5 as good and v2.6.27.6 as bad FYI, the above process will require you to test about 7 kernels before telling you which commit is bad (In reply to comment #18) > FYI, the above process will require you to test about 7 kernels before telling > you which commit is bad I have used bisecion once before (successfully). I'll give it a try later. My current guess is that VMI which has been fixed by http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=3a6ddd5f18405ca92e004416af8ed44b9c9783d7 might conflict with the call to "dmi_scan_machine" changed by http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=5c371b31be32033b0a4a993431484da8a2305369 See also: http://lkml.org/lkml/2008/8/7/298 (In reply to comment #18) > FYI, the above process will require you to test about 7 kernels before telling > you which commit is bad > Done! 5c371b31be32033b0a4a993431484da8a2305369 is first bad commit commit 5c371b31be32033b0a4a993431484da8a2305369 Author: Yinghai Lu <yhlu.kernel@gmail.com> Date: Mon Sep 22 02:52:26 2008 -0700 x86: fix CONFIG_X86_RESERVE_LOW_64K=y commit 2216d199b1430d1c0affb1498a9ebdbd9c0de439 upstream The bad_bios_dmi_table() quirk never triggered because we do DMI setup too late. Move it a bit earlier. Also change the CONFIG_X86_RESERVE_LOW_64K quirk to operate on the e820 table directly instead of messing with early reservations - this handles overlaps (which do occur in this low range of RAM) more gracefully. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> :040000 040000 b7b81ffb62eddf60c2d8545a61566f0d34c1b2a9 858d983687c53db5304015a245ee0c23f10c266d M arch (In reply to comment #21) > (In reply to comment #18) > > FYI, the above process will require you to test about 7 kernels before telling > > you which commit is bad > > > > Done! > > 5c371b31be32033b0a4a993431484da8a2305369 is first bad commit Exactly what I guessed. dmi_scan_machine must not be called at this stage. There is a comment in "setup.c" (* NOTE: ...) that indicates the first point of time at which early_ioremap may be called on x86-32. We are curerntly working on a patch that reverts the entire CONFIG_X86_RESERVE_LOW_64K stuff, i. e. the last five commits on arch/x86/kernel/setup.c Could you open an upstream bug on http://bugzilla.kernel.org/ and if done so, post the link here? (In reply to comment #23) > Could you open an upstream bug on http://bugzilla.kernel.org/ > and if done so, post the link here? > Done http://bugzilla.kernel.org/show_bug.cgi?id=12167 (In reply to comment #24) > (In reply to comment #23) > > Could you open an upstream bug on http://bugzilla.kernel.org/ > > and if done so, post the link here? > > > > Done > > http://bugzilla.kernel.org/show_bug.cgi?id=12167 Thanks! Now that we know what caused the problem, I think we can skip the patch that reverts the CONFIG_X86_RESERVE_LOW_64K stuff in favor of helping upstream to actually SOLVE the problem. Zach has posted his fix to LKML: http://lkml.org/lkml/2008/12/13/149 (In reply to comment #26) > Zach has posted his fix to LKML: http://lkml.org/lkml/2008/12/13/149 Yeah, and he attached it to http://bugzilla.kernel.org/show_bug.cgi?id=12167#c21 But it didn't make it into 2.6.27.9 and so far I haven't spotted it in Linus' tree ... Zach sent a fix upstream... ...which is included in gentoo-sources-2.6.27-r6. Thanks for your help working on this one. (In reply to comment #29) > ...which is included in gentoo-sources-2.6.27-r6. Thanks for your help working > on this one. .. and tested OK. # uname -r 2.6.27-gentoo-r6-1 # dmesg | grep -i vmi VMI: Found VMware, Inc. Hypervisor OPROM, API version 3.0, ROM version 1.0 vmi: registering clock event vmi-timer. mult=12582912 shift=22 vmi: registering clock event vmi-timer. mult=12582912 shift=22 Booting paravirtualized kernel on vmi vmi: registering clock source khz=3000000 |