Hi, I upgraded my KVM guest from sys-kernel/gentoo-sources-3.10.48 to sys-kernel/gentoo-sources-3.14.12 and rebooting into the new kernel failed with [ 0.930047] Call Trace: [ 0.930047] [<ffffffff81af1d36>] rapl_pmu_init+0xae/0x1b4 [ 0.930047] [<ffffffff81af1c88>] ? uncore_cpu_setup+0x13/0x13 [ 0.930047] [<ffffffff81000332>] do_one_initcall+0x112/0x160 [ 0.930047] [<ffffffff810df618>] ? parse_args+0x1e8/0x320 [ 0.930047] [<ffffffff81aea02c>] kernel_init_freeable+0x173/0x1fe [ 0.930047] [<ffffffff81ae9842>] ? do_early_param+0x88/0x88 [ 0.930047] [<ffffffff815c4d20>] ? rest_init+0x80/0x80 [ 0.930047] [<ffffffff815c4d2e>] kernel_init+0xe/0xf0 [ 0.930047] [<ffffffff815da56c>] ret_from_fork+0x7c/0xb0 [ 0.930047] [<ffffffff815c4d20>] ? rest_init+0x80/0x80 [ 0.930047] Code: 8b 14 10 e8 b0 47 1b 00 48 85 c0 49 89 c6 0f 84 8f 00 00 00 31 c0 b9 06 06 00 00 66 41 89 06 49 8d 46 10 49 89 46 10 49 89 46 18 <0f> 32 48 c1 e8 08 66 b9 1f 00 49 c7 46 20 c0 c7 a1 81 83 e0 1f [ 0.930047] RIP [<ffffffff8101f4b3>] rapl_cpu_prepare+0x83/0x110 [ 0.930047] RSP <ffff88007c777dc0> [ 0.953901] ---[ end trace 1a5a32cf5298005d ]--- [ 0.954374] Kernel panic - not syncing: Fatal exception [ 0.954947] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) I started a kernel bisect and the bad commit causing the problem was commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58 Author: Stephane Eranian <eranian@google.com> Date: Tue Nov 12 17:58:50 2013 +0100 perf/x86: Add Intel RAPL PMU support This patch adds a new uncore PMU to expose the Intel RAPL energy consumption counters. Up to 3 counters, each counting a particular RAPL event are exposed. The RAPL counters are available on Intel SandyBridge, IvyBridge, Haswell. The server skus add a 3rd counter. The following events are available and exposed in sysfs: - power/energy-cores: power consumption of all cores on socket - power/energy-pkg: power consumption of all cores + LLc cache - power/energy-dram: power consumption of DRAM (servers only) For each event both the unit (Joules) and scale (2^-32 J) is exposed in sysfs for use by perf stat and other tools. The files are: /sys/devices/power/events/energy-*.unit /sys/devices/power/events/energy-*.scale The RAPL PMU is uncore by nature and is implemented such that it only works in system-wide mode. Measuring only one CPU per socket is sufficient. The /sys/devices/power/cpumask file can be used by tools to figure out which CPUs to monitor by default. For instance, on a 2-socket system, 2 CPUs (one on each socket) will be shown. All the counters measure in the same unit (exposed via sysfs). The perf_events API exposes all RAPL counters as 64-bit integers counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools must convert the counts by multiplying them by 2^-32 to obtain Joules. The reason for this is that the kernel avoids doing floating point math whenever possible because it is expensive (user floating-point state must be saved). The method used avoids kernel floating-point usage. There is no loss of precision. Thanks to PeterZ for suggesting this approach. To convert the raw count in Watt: W = C * 2.3 / (1e10 * time) or ldexp(C, -32). RAPL PMU is a new standalone PMU which registers with the perf_event core subsystem. The PMU type (attr->type) is dynamically allocated and is available from /sys/device/power/type. Sampling is not supported by the RAPL PMU. There is no privilege level filtering either. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1384275531-10892-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> In v3.15 this bug is already fixed with commit commit 24223657806a0ebd0ae5c9caaf7b021091889cf2 Author: Venkatesh Srinivas <venkateshs@google.com> Date: Thu Mar 13 12:36:26 2014 -0700 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com Cc: zheng.z.yan@intel.com Cc: eranian@google.com Cc: ak@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Now the question is: How do I/we get this patch into 3.14? Reproducible: Always
My backport finally landed in linux-3.14.20, commit 2968314094d8db0af32de20ad51349a30ea54a01.