Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 646418 - sys-kernel/ck-sources-4.14.14 produce BUG: using smp_processor_id() in preemptible when using KVM and iommu
Summary: sys-kernel/ck-sources-4.14.14 produce BUG: using smp_processor_id() in preemp...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: kuzetsa CatSwarm (kuza for short)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-02 16:10 UTC by Anton Gubarkov
Modified: 2018-03-11 04:25 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
.config (.config,121.45 KB, text/plain)
2018-02-07 14:32 UTC, Anton Gubarkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Gubarkov 2018-02-02 16:10:53 UTC
my log is full of 
фев 02 19:06:15 home64 kernel: BUG: using smp_processor_id() in preemptible [00000000] code: CPU 1/KVM/12349
фев 02 19:06:15 home64 kernel: caller is single_task_running+0x5/0x20
фев 02 19:06:15 home64 kernel: CPU: 2 PID: 12349 Comm: CPU 1/KVM Tainted: P     U  W  O    4.14.14-ck #4
фев 02 19:06:15 home64 kernel: Hardware name: Gigabyte Technology Co., Ltd. Z370 AORUS Ultra Gaming/Z370 AORUS Ultra Gaming-CF, BIOS F6 10/31/2017
фев 02 19:06:15 home64 kernel: Call Trace:
фев 02 19:06:15 home64 kernel:  dump_stack+0x46/0x65
фев 02 19:06:15 home64 kernel:  check_preemption_disabled+0xd3/0xe0
фев 02 19:06:15 home64 kernel:  single_task_running+0x5/0x20
фев 02 19:06:15 home64 kernel:  kvm_vcpu_block+0x278/0x310
фев 02 19:06:15 home64 kernel:  kvm_arch_vcpu_ioctl_run+0x12d/0x1680
фев 02 19:06:15 home64 kernel:  ? kvm_arch_vcpu_load+0x64/0x230
фев 02 19:06:15 home64 kernel:  ? kvm_arch_vcpu_load+0x7f/0x230
фев 02 19:06:15 home64 kernel:  ? kvm_vcpu_ioctl+0x27b/0x5e0
фев 02 19:06:15 home64 kernel:  kvm_vcpu_ioctl+0x27b/0x5e0
фев 02 19:06:15 home64 kernel:  ? skiplist_insert+0x57/0xf0
фев 02 19:06:15 home64 kernel:  ? timerqueue_del+0x1e/0x40
фев 02 19:06:15 home64 kernel:  ? timerqueue_add+0x52/0x80
фев 02 19:06:15 home64 kernel:  ? enqueue_hrtimer+0x37/0x90
фев 02 19:06:15 home64 kernel:  ? _raw_spin_unlock_irqrestore+0xf/0x30
фев 02 19:06:15 home64 kernel:  ? hrtimer_start_range_ns+0x1ad/0x330
фев 02 19:06:15 home64 kernel:  do_vfs_ioctl+0x88/0x5d0
фев 02 19:06:15 home64 kernel:  ? security_file_ioctl+0x39/0x50
фев 02 19:06:15 home64 kernel:  SyS_ioctl+0x6f/0x80
фев 02 19:06:15 home64 kernel:  ? exit_to_usermode_loop+0x83/0x90
фев 02 19:06:15 home64 kernel:  entry_SYSCALL_64_fastpath+0x1d/0x76
фев 02 19:06:15 home64 kernel: RIP: 0033:0x7f0e728643e7
фев 02 19:06:15 home64 kernel: RSP: 002b:00007f0e665ef918 EFLAGS: 00000246
Comment 1 kuzetsa CatSwarm (kuza for short) 2018-02-06 14:19:34 UTC
please provide your /usr/src/linux/.config for this kernel.

the KVM reference has me wondering:

are you getting this bug on bare metal hardware or is it under KVM?

either way, will likely need to upstream this bug report after testing.
Comment 2 Anton Gubarkov 2018-02-07 14:32:18 UTC
Created attachment 518360 [details]
.config
Comment 3 Anton Gubarkov 2018-02-07 14:34:03 UTC
The bug happens on the host - the machine running the KVM hypervisor.
I have another laptop running the same kernel and using KVM for VMs, but it has no iommu. The bug doesnt' happen there.
Comment 4 kuzetsa CatSwarm (kuza for short) 2018-02-07 18:28:17 UTC
(In reply to Anton Gubarkov from comment #3)
> The bug happens on the host - the machine running the KVM hypervisor.
> I have another laptop running the same kernel and using KVM for VMs, but it
> has no iommu. The bug doesnt' happen there.

Can't say for certain, but it's possible that this might be related to compatibility between KVM and using a non-CFS scheduler. Also, based on the config it looks like you have cgroups support enabled.

Because you've mentioned other hardware not having these issues (different KVM host) I'd like to specifically rule out this possibility please:

Please build a KVM guest kernel which uses the CFS scheduler instead of MuQSS.

If CFS makes this issue go away, please also try a rebuild with the BFS scheduler (the other choice for ck-sources) instead of the MuQSS or CFS scheduler.

This is probably not a gentoo-specific issue, but once the specific details are better understood, I'll assist with relaying the information to upstream (and you can follow up there, if desired.)

Also please note: There are some unrelated issues preventing a release of ">sys-kernel/ck-sources-4.14.14" (gentoo-specific packaging) so limited time will likely mean this edge case doesn't see a timely resolution. Sorry about that.

---

Reference #1

http://madeforcloud.com/post/kvm-vcpu-scheduling/

Within KVM, each vcpu is mapped to a Linux process which in turn utilises hardware assistance to create the necessary ‘smoke and mirrors’ for virtualisation. As such, a vcpu is just another process to the CFS and also importantly to cgroups which, as a resource manager, allows Linux to manage allocation of resources - typically proportionally in order to set constraint allocations. cgroups also apply to Memory, network and I/O. 

---

Reference #2

/usr/src/linux-4.14.14-ck/Documentation/scheduler/sched-MuQSS.txt

What MuQSS does _not_ now feature is support for CGROUPS. The average user should neither need to know what these are, nor should they need to be using them to have good desktop behaviour. However since some applications refuse to work without cgroups, one can enable them with MuQSS as a stub and the filesystem will be created which will allow the applications to work.

---
Comment 5 Anton Gubarkov 2018-02-07 20:10:58 UTC
I've rebuilt the kernel on the affected host w/ MuQSS deselected. The BUG went away. 
I can't find the option to enable BFS.
Comment 6 kuzetsa CatSwarm (kuza for short) 2018-02-09 16:52:14 UTC
Some of the remarks about the host left me confused. I think you're referring to the hardware, not the kernel. (see below)

A) MuQSS enabled on host, KVM guest uses kernel with MuQSS disabled
B) MuQSS disabled on host, KVM guest uses kernel without MuQSS
C) MuQSS enabled on host, as well as KVM guest (both using MuQSS)
D) Not using MuQSS at all (neither host, nor on the KVM guest)

Please clarify which test case(s) have the bug VS working normally.

The output of:$ cat /proc/version (on the host, as well as KVM guest) should specify which compiler (incl. version) was used to build the kernel, and/or other metadata which may be relevant to document the issue.

Also, what is the output for:$ cat /proc/cpuinfo

This bug is probably not a gentoo-specific failure.

I'll relay the info upstream.
Comment 7 Anton Gubarkov 2018-02-09 18:59:06 UTC
I have different cases. My guests are running Windows.

so I have 

A) Host home64 with MuQSS enabled exhibits the BUG when a KVM guest is running
B) Host home64 without MuQSS enabled shows no BUG
C) Host r9-008cln never exhibited the BUG (but has no iommu hardware)


My info from home64:

home64 ~ # cat /proc/version
Linux version 4.14.14-ck (root@home64) (gcc version 7.3.0 (Gentoo 7.3.0 p1.0)) #5 SMP PREEMPT Wed Feb 7 21:56:38 MSK 2018

home64 ~ # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4196.557
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx es
t tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault in
vpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsba
se tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflu
shopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_
act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4193.025
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 1
cpu cores       : 6
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx es
t tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault in
vpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsba
se tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflu
shopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_
act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4183.938
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 2
cpu cores       : 6
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx es
t tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault in
vpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsba
se tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflu
shopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_
act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4169.763
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 3
cpu cores       : 6
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx es
t tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault in
vpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsba
se tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflu
shopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_
act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4166.248
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 4
cpu cores       : 6
apicid          : 8
initial apicid  : 8
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx es
t tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d
eadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault in
vpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsba
se tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflu
shopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_
act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
stepping        : 10
microcode       : 0x80
cpu MHz         : 4178.815
cache size      : 9216 KB
physical id     : 0
siblings        : 6
core id         : 5
cpu cores       : 6
apicid          : 10
initial apicid  : 10
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti retpoline intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
Comment 8 kuzetsa CatSwarm (kuza for short) 2018-02-16 17:58:23 UTC
(In reply to Anton Gubarkov from comment #7)
> I have different cases. My guests are running Windows.
> 
> so I have 
> 
> A) Host home64 with MuQSS enabled exhibits the BUG when a KVM guest is
> running
> B) Host home64 without MuQSS enabled shows no BUG
> C) Host r9-008cln never exhibited the BUG (but has no iommu hardware)

[...]

I believe the issue might be caused by KVM (on the host) not supporting MuQSS

Please specify SPECIFICALLY if home64 is using:

A) MuQSS enabled on host, KVM guest uses kernel with MuQSS disabled
B) MuQSS disabled on host, KVM guest uses kernel without MuQSS
C) MuQSS enabled on host, as well as KVM guest (both using MuQSS)

I need to know if MuQSS is on the host and/or guest

I also see that you've built the kernel using a compiler which isn't yet keyword-stabilized. Please try to reproduce this issue building using gcc version 6.4.0-r1

gcc-config will let you activate a different slot, so rebuilding gcc 7.3.x again won't be required if you switch back to that one, fortunately.
Comment 9 kuzetsa CatSwarm (kuza for short) 2018-03-11 04:25:50 UTC
MuQSS on the KVM host seems to have performance regressions:

http://ck-hack.blogspot.com/2017/05/linux-411-ck2-muqss-version-0156.html?showComment=1496771913408#c2549435917673112633

^ thread from upstream / should still be relevant if attempting to use MuQSS on the host, rather than kvm guests.

I'd recommend against using MuQSS on the kvm host, as CFS (default linux scheduler) is known to be a stable configuration with lower overhead for KVM.

TL;DR - your performance will only go down if you attempt to use MuQSS on the kvm host. Further, MuQSS <--> kvm interactions are upstream bugs which won't (can't) be fixed by making gentoo-specific changes.