Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 954333 - sys-kernel/gentoo-kernel Random kernel panics: invalid opcode when allocating new memory
Summary: sys-kernel/gentoo-kernel Random kernel panics: invalid opcode when allocating...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-24 13:11 UTC by Daniel Raniz Raneland
Modified: 2025-04-24 19:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Log of more captured panics (panics.log,41.04 KB, text/plain)
2025-04-24 13:12 UTC, Daniel Raniz Raneland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Raniz Raneland 2025-04-24 13:11:05 UTC
I have intermittent issues with kernel panics, usually when my system is under heavy load.

Sometimes it panics shortly after booting up; sometimes it can go for days without a panic.

It seems to happen more often on more recent kernel versions. I never encountered it when installing Gentoo from the Admin CD (which uses an LTS kernel, afaik). It happens rarely on 6.9.11 and occasionally on 6.13.11, which I'm on now. I can't test with 6.14.2 because ZFS doesn't support 6.14 yet. I also can't use kernels older than 6.8 because those don't support the integrated graphics card in my processor (Iris Xe) - well, I can, but I'm limited to a TTY.

I've run the full memtest86+ suite overnight without any errors reported.

I've captured the kernel messages via netconsole on a second laptop, and they all follow this same pattern. The trace differs between panics, but the Oops and RIP are always "invalid opcode: 0000 [#1] SMP NOPTI" and "0010:__page_table_check_zero". The hex values last in RIP differ, but I guess those are kernel-dependent because they are always the same for the same kernel version.


Most recent panic below:

[ 1242.746736] ------------[ cut here ]------------
[ 1242.746750] kernel BUG at mm/page_table_check.c:156!
[ 1242.746757] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 1242.746762] CPU: 19 UID: 250 PID: 82304 Comm: cc1 Tainted: P           O       6.13.11-gentoo-dist-hardened #1
[ 1242.746767] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[ 1242.746769] Hardware name: Notebook V5xTNC_TND_TNE/V5xTNC_TND_TNE, BIOS Dasharo (coreboot+UEFI) v0.9.1 11/07/2024
[ 1242.746772] RIP: 0010:__page_table_check_zero+0x90/0xb0
[ 1242.746780] Code: 48 89 f8 f7 c7 ff 0f 00 00 75 a7 48 8b 17 83 e2 40 74 9f 48 8b 57 48 48 8d 42 ff 83 e2 01 48 0f 44 c7 80 78 33 f5 75 90 0f 0b <0f> 0b 0f 0b 5b c3 cc cc cc cc 48 83 e8 01 e9 75 ff ff ff 48 89 c7
[ 1242.746783] RSP: 0000:ffffadfc915e3bd0 EFLAGS: 00010206
[ 1242.746786] RAX: ffff8ad44274b000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1242.746788] RDX: ffff8ad44274b000 RSI: 000000005725eae8 RDI: 0000000000000001
[ 1242.746790] RBP: 0000000000000981 R08: 0000000000000000 R09: 0000000000000008
[ 1242.746792] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1242.746794] R13: 0000000000140dca R14: ffffffffb7894680 R15: 0000000000000981
[ 1242.746797] FS:  00007f35aefd0f00(0000) GS:ffff8aebbfb80000(0000) knlGS:0000000000000000
[ 1242.746800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1242.746802] CR2: 00007f35ab8b8000 CR3: 0000000402dec004 CR4: 0000000000f72ef0
[ 1242.746804] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1242.746806] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 1242.746808] PKRU: 55555554
[ 1242.746809] Call Trace:
[ 1242.746812]  <TASK>
[ 1242.746815]  prep_new_page+0x16/0x110
[ 1242.746819]  get_page_from_freelist+0x2da/0x1820
[ 1242.746824]  __alloc_pages_noprof+0x150/0x290
[ 1242.746828]  __folio_alloc_noprof+0xf/0x30
[ 1242.746830]  do_anonymous_page+0x4ea/0x970
[ 1242.746834]  __handle_mm_fault+0x818/0x840
[ 1242.746837]  handle_mm_fault+0xda/0x2b0
[ 1242.746839]  do_user_addr_fault+0x1e6/0x540
[ 1242.746844]  exc_page_fault+0x5f/0x80
[ 1242.746849]  asm_exc_page_fault+0x26/0x30
[ 1242.746854] RIP: 0033:0x7f35af135c09
[ 1242.746858] Code: 20 c5 fe 7f 07 c5 fe 7f 44 17 e0 c5 f8 77 c3 66 90 c5 fe 7f 47 c0 c5 fe 7f 47 e0 c5 f8 77 c3 66 90 48 3b 15 09 f6 07 00 77 77 <c5> fe 7f 07 c5 fe 7f 47 20 48 01 d7 48 81 fa 80 00 00 00 76 d2 c5
[ 1242.746862] RSP: 002b:00007ffeefab6de8 EFLAGS: 00010283
[ 1242.746864] RAX: 00007f35ab8b8000 RBX: 00000000000000e0 RCX: 000000000000000c
[ 1242.746866] RDX: 00000000000000e0 RSI: 0000000000000000 RDI: 00007f35ab8b8000
[ 1242.746869] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007f35af1b5ad0
[ 1242.746871] R10: 0000000000000007 R11: 0000000000000012 R12: 00007f35ab881c78
[ 1242.746872] R13: 00007f35ab8b6240 R14: 00007f35ae115a80 R15: 00007f35ab8afe88
[ 1242.746875]  </TASK>
[ 1242.746877] Modules linked in: netconsole xt_comment ip6_tables ip_tables x_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc netfs rfcomm nvidia_uvm(O) hid_kensington joydev r8153_ecm cdc_ether usbnet r8152 mii cdc_acm bnep uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common mc btusb btrtl btintel btbcm btmtk mei_gsc_proxy pmt_telemetry intel_rapl_msr pmt_class intel_uncore_frequency intel_uncore_frequency_common xe x86_pkg_temp_thermal intel_powerclamp gpu_sched drm_suballoc_helper drm_gpuvm drm_exec nvidia_drm(O) nvidia_modeset(O) nvidia(O) i915 iwlmvm processor_thermal_device_pci spi_intel_pci spi_intel iwlwifi processor_thermal_device mei_me i2c_algo_bit processor_thermal_wt_hint intel_vpu mei drm_buddy drm_ttm_helper processor_thermal_rfim intel_gtt ttm thunderbolt processor_thermal_rapl intel_rapl_common drm_display_helper processor_thermal_wt_req drm_shmem_helper drm_client_lib processor_thermal_power_floor igen6_edac processor_thermal_mbox drm_kms_helper
[ 1242.746931]  int340x_thermal_zone intel_vsec edac_core video wmi intel_hid sparse_keymap crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sdhci_pci sdhci_uhs2 sdhci cqhci mmc_core pinctrl_meteorlake pinctrl_intel i2c_hid_acpi i2c_hid pwm_lpss zfs(PO) spl(O) i2c_dev [last unloaded: netconsole]
[ 1242.746958] ---[ end trace 0000000000000000 ]---
[ 1242.746960] RIP: 0010:__page_table_check_zero+0x90/0xb0
[ 1242.746964] Code: 48 89 f8 f7 c7 ff 0f 00 00 75 a7 48 8b 17 83 e2 40 74 9f 48 8b 57 48 48 8d 42 ff 83 e2 01 48 0f 44 c7 80 78 33 f5 75 90 0f 0b <0f> 0b 0f 0b 5b c3 cc cc cc cc 48 83 e8 01 e9 75 ff ff ff 48 89 c7
[ 1242.746966] RSP: 0000:ffffadfc915e3bd0 EFLAGS: 00010206
[ 1242.746968] RAX: ffff8ad44274b000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1242.746971] RDX: ffff8ad44274b000 RSI: 000000005725eae8 RDI: 0000000000000001
[ 1242.746973] RBP: 0000000000000981 R08: 0000000000000000 R09: 0000000000000008
[ 1242.746974] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1242.746976] R13: 0000000000140dca R14: ffffffffb7894680 R15: 0000000000000981
[ 1242.746978] FS:  00007f35aefd0f00(0000) GS:ffff8aebbfb80000(0000) knlGS:0000000000000000
[ 1242.746980] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1242.746982] CR2: 00007f35ab8b8000 CR3: 0000000402dec004 CR4: 0000000000f72ef0
[ 1242.746984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1242.746986] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 1242.746988] PKRU: 55555554
[ 1242.746990] Kernel panic - not syncing: Fatal exception
[ 1242.747166] Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1242.747172] Rebooting in 10 seconds..

Reproducible: Sometimes

Steps to Reproduce:
1. Reboot system
2. Put system under heavy load, like compiling a kernel or running some benchmark
Actual Results:  
Kernel panics with

Oops: invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:__page_table_check_zero


Expected Results:  
No kernel panics

Computer: NovaCustom NV54
Processor: Intel(R) Core(TM) Ultra 7 155H
Memory: 2x48 GiB DDR5
Graphics: Integrated Iris Xe + GeForce RTX 4070 Max-Q / Mobile
Firmware: Dasharo (coreboot+UEFI) v0.9.1

Linux raniz-nv54.lan 6.13.11-gentoo-dist-hardened #1 SMP PREEMPT_DYNAMIC Thu Apr 24 07:28:10 CEST 2025 x86_64 Intel(R) Core(TM) Ultra 7 155H GenuineIntel GNU/Linux

Current uptime is slightly over 6 hours, with some heavy compilation at times.
Comment 1 Daniel Raniz Raneland 2025-04-24 13:12:51 UTC
Created attachment 925983 [details]
Log of more captured panics

Some more panics captured via netconsole. Not all panics that have occurred have been captured since the capturing laptop tends to get in the way on the desk.
Comment 2 Mike Gilbert gentoo-dev 2025-04-24 14:30:18 UTC
Please provide emerge --info.

What kernel package(s) are you using? Have you customized the config at all?
Comment 3 Holger Hoffstätte 2025-04-24 14:42:59 UTC
Thanks for doing the necessary things like running memcheck first.

Basically it looks to be somewhere here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/page_table_check.c?h=linux-6.13.y#n156

and the specific line:

  BUG_ON(atomic_read(&ptc->anon_map_count));

means it crashes because a page of anonymous memory is still mapped while it should be free/unmapped.

Your attached stack trace contains "spl_kmem_cache_destroy" and AFAIK spl is ZFS's Solaris Portability Layer. Since you are running with ZFS that's where I would focus my further search.
Comment 4 Daniel Raniz Raneland 2025-04-24 15:59:52 UTC
It's sys-kernel/gentoo kernel, not using savedconfig.

I'll see if I can boot something without ZFS and see how the system behaves under load.