Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 470792 - sys-kernel/vanilla-sources-3.9.3: CONFIG_X86_X2APIC+CONFIG_MAXSMP causing severe CPU-related issues
Summary: sys-kernel/vanilla-sources-3.9.3: CONFIG_X86_X2APIC+CONFIG_MAXSMP causing sev...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard: watch-linux-bugzilla
Keywords: UPSTREAM
Depends on:
Blocks:
 
Reported: 2013-05-20 16:19 UTC by Roman Žilka
Modified: 2013-05-24 18:02 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
.config (with X2APIC) (config-3.9.3,69.94 KB, text/plain)
2013-05-20 16:22 UTC, Roman Žilka
Details
dmesg (with X2APIC) (dmesg,48.62 KB, text/plain)
2013-05-20 16:23 UTC, Roman Žilka
Details
lspci -vvv (lspci-vvv,9.65 KB, text/plain)
2013-05-20 16:25 UTC, Roman Žilka
Details
emerge --info hwids (emerge--info,6.39 KB, text/plain)
2013-05-20 16:32 UTC, Roman Žilka
Details
/proc/cpuinfo (without X2APIC) (cpuinfo,3.62 KB, text/plain)
2013-05-20 16:33 UTC, Roman Žilka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Žilka 2013-05-20 16:19:48 UTC
I have a laptop with Core i5 and x2apic (as per /proc/cpuinfo). Enabling CONFIG_X86_X2APIC in vanilla 3.9.3 causes these symptoms:
* sluggish kernel boot
* a couple times during boot: "BUG: unable to handle kernel", Oops, kernel panic, NULL pointer dereference, "CPU.*Stuck"
* ~twice slower HDD I/O
* very high CPU temperature despite no CPU load (reported by physical fan speed and /usr/bin/acpi)
* only 1 CPU available in the booted system (should be 2 + 2/HT)
* "ACPI Warning.*SystemIO conflicts with Region \GPIO 1"
* driver iwlwifi cannot load firmware (could be due to the prolonged boot, but it wasn't *that* long...)
* "i8042: Can't write CTR while closing KBD port", "i8042: Can't reactivate KBD port"
* when running, every now and then the kernel pauses for a fraction of a second

It has nothing to do with power mgmt - disabling everything under "Power management and ACPI options" (except for ACPI_BUTTON and ACPI_VIDEO which are enforced) doesn't help.

Reproducible: Always
Comment 1 Roman Žilka 2013-05-20 16:22:39 UTC
Created attachment 348748 [details]
.config (with X2APIC)

when I remove X2APIC from this, the problems are gone
Comment 2 Roman Žilka 2013-05-20 16:23:34 UTC
Created attachment 348750 [details]
dmesg (with X2APIC)
Comment 3 Roman Žilka 2013-05-20 16:25:03 UTC
Created attachment 348752 [details]
lspci -vvv
Comment 4 Roman Žilka 2013-05-20 16:32:24 UTC
Created attachment 348758 [details]
emerge --info hwids
Comment 5 Roman Žilka 2013-05-20 16:33:26 UTC
Created attachment 348760 [details]
/proc/cpuinfo (without X2APIC)
Comment 6 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-05-20 20:26:36 UTC
That's quite unfortunate, will do some more research on this; a quick search yields nothing of interest upstream, will look at other bug sites and commit history later.
Comment 7 Mike Pagano gentoo-dev 2013-05-21 00:53:34 UTC
Have you run any earlier kernel versions with this setitng enabled?
Comment 8 Roman Žilka 2013-05-21 13:46:03 UTC
(In reply to comment #7)
> Have you run any earlier kernel versions with this setitng enabled?

No. Shall I try? I think it deps on EXPERIMENTAL or something in 3.8.
Comment 9 Mike Pagano gentoo-dev 2013-05-23 00:40:06 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Have you run any earlier kernel versions with this setitng enabled?
> 
> No. Shall I try? I think it deps on EXPERIMENTAL or something in 3.8.

You can if you like. It would be great if it worked in an earlier version, then we could pinpoint the patch that causes the error.
Comment 10 Roman Žilka 2013-05-23 10:00:51 UTC
Tried 3.8.12-hardened now. It works just fine with X2APIC. Here's the diff from my regular 3.8.12-hardened config.

# diff config-3.8.12-hardened.old config-3.8.12-hardened|grep '^[><][^#]*$'
> CONFIG_HAVE_INTEL_TXT=y
> CONFIG_EXPERIMENTAL=y
> CONFIG_X86_X2APIC=y
> CONFIG_NET_VENDOR_I825XX=y
> CONFIG_NET_VENDOR_SEEQ=y
> CONFIG_NET_VENDOR_SILAN=y
> CONFIG_IRQ_REMAP=y

What hits me is the INTEL_TXT which somehow got in automagically. I do have that in the laptop too, although I make no use of it, and I had it off in the 3.9.3 config. Will try 3.9.3 with +X2APIC+TXT and 3.8.12 with +X2APIC-TXT soon.
Comment 11 Roman Žilka 2013-05-23 10:27:31 UTC
Sorry, CONFIG_HAVE_INTEL_TXT=y has always been there even in 3.9.3 - it's an automatic option. I mistook it for CONFIG_INTEL_TXT which is actually selectable and has never been on in any of the configs.

I ended up trying 3.9.3 with CONFIG_INTEL_TXT=y (and the other one =y too) - no change.
Comment 12 Roman Žilka 2013-05-23 10:41:24 UTC
Tried vanilla-3.8.13 and it runs without a flaw. Here, "config-3.8.12-hardened.old" is the same one as in comment 10 (my regular, non-X2APIC hardened 3.8).

$ diff config-3.8.12-hardened.old config-3.8.13|egrep -v '(CONFIG_GRKERNSEC|CONFIG_PAX)'|grep '^[><] CONFIG'
> CONFIG_HAVE_INTEL_TXT=y
> CONFIG_EXPERIMENTAL=y
> CONFIG_X86_X2APIC=y
< CONFIG_X86_ALIGNMENT_16=y
> CONFIG_IRQ_REMAP=y
> CONFIG_DEBUG_RODATA=y
> CONFIG_DEBUG_SET_MODULE_RONX=y
> CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
< CONFIG_TASK_SIZE_MAX_SHIFT=42
Comment 13 Roman Žilka 2013-05-23 10:49:59 UTC
(Ad. previous comment: so that's a X2APIC-enabled 3.8.13 that runs fine here. The IRQ_REMAP option is under Drivers->IOMMU and is a dependency of X2APIC).

This is the diff between an early X2APIC-enabled 3.9.3 failing config that I tested and the current 3.8.13 operational X2APIC-enabled one.

# diff .config /tmp/config-3.9.3-fail1 |grep '^[><] CONFIG'
< CONFIG_HAVE_IRQ_WORK=y
< CONFIG_EXPERIMENTAL=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_ARCH_USE_BUILTIN_BSWAP=y
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
< CONFIG_GENERIC_SIGALTSTACK=y
> CONFIG_OLD_SIGSUSPEND3=y
> CONFIG_COMPAT_OLD_SIGACTION=y
> CONFIG_PADATA=y
< CONFIG_NR_CPUS=4
> CONFIG_MAXSMP=y
> CONFIG_NR_CPUS=4096
> CONFIG_ACPI_PROCESSOR_AGGREGATOR=y
> CONFIG_NETWORK_PHY_TIMESTAMPING=y
> CONFIG_SCSI_TGT=m
> CONFIG_TTY=y
> CONFIG_PPS=y
> CONFIG_PTP_1588_CLOCK=y
> CONFIG_GPIO_DEVRES=y
< CONFIG_STEP_WISE=y
> CONFIG_THERMAL_GOV_STEP_WISE=y
> CONFIG_DRM_TTM=m
> CONFIG_DRM_GMA500=m
> CONFIG_HDMI=y
> CONFIG_MMC_CLKGATE=y
> CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_CRYPTO_AEAD=y
> CONFIG_CRYPTO_PCRYPT=y
< CONFIG_PERCPU_RWSEM=y
> CONFIG_CPUMASK_OFFSTACK=y
Comment 14 Roman Žilka 2013-05-23 11:11:31 UTC
Finally some good news! I got 3.9.3 working.

# diff config-3.9.3.old config-3.9.3
313,314c313,314
< CONFIG_MAXSMP=y
< CONFIG_NR_CPUS=4096
---
> # CONFIG_MAXSMP is not set
> CONFIG_NR_CPUS=4
2744d2743
< CONFIG_CPUMASK_OFFSTACK=y

This happened when I turned off MAXSMP.

MAXSMP is available in vanilla 3.8.13 too, but I didn't enable it during the tests here.

Conclusion (for vanilla 3.9.3, at least): either MAXSMP, or X86_X2APIC, but not both. I didn't test with neither, but I expect a no-problem there. Please let me know if you're going to work on a patch; else I think this should reach upstream. If you're not going to report it, let me know.
Comment 15 Roman Žilka 2013-05-23 11:31:31 UTC
Tested -MAXSMP, +X2APIC, +ACPI, +Intel P state driver - no problem.
Comment 16 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-05-24 11:40:27 UTC
(In reply to comment #14)
> Please let me know if you're going to work on a patch; else I think this should reach upstream.

No idea what's specifically going wrong here; so, since you know the problem best feel free to report this upstream at http://bugzilla.kernel.org.
Comment 17 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-05-24 11:49:33 UTC
Also, you will probably need to enable debugging info (CONFIG_EXPERT=y and CONFIG_DEBUG_KERNEL=y if I recall correctly, maybe you can enable CONFIG_DEBUG_INFO=y as well) to have a readable stack trace in the dmesg when you are reporting; with its current version it is not possible to deduce where exactly the issue occured.