Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 801481 - page_poison=1 on kernel 5.13.1 (vs 5.12.13) spams dmesg with page dumps due to "pagealloc: memory corruption"
Summary: page_poison=1 on kernel 5.13.1 (vs 5.12.13) spams dmesg with page dumps due t...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-10 14:20 UTC by bowsingbetee
Modified: 2021-08-17 17:53 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel .config aka .config.tmp_debugallocandowner_5_13_1 (.config.tmp_debugallocandowner_5_13_1,152.11 KB, text/plain)
2021-07-10 19:38 UTC, bowsingbetee
Details
reverted commit 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 in patch form (reverted_51cba1ebc60df9c4ce034a9f5441169c0d0956c0.patch,2.90 KB, patch)
2021-07-11 11:06 UTC, bowsingbetee
Details | Diff
0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch (0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch,3.43 KB, patch)
2021-07-12 22:00 UTC, Sergei Trofimovich (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description bowsingbetee 2021-07-10 14:20:15 UTC
This seems to be a regression, as it didn't happen on 5.12.13 (latest kernel I've tested/had), and without changing anything, on 5.13.1 I begun getting some large page dumps on dmesg which appear to be zero-filled pages...

[ 3548.030011] page dumped because: pagealloc: corrupted page details
[ 3553.029103] check_poison_mem: 2287988 callbacks suppressed
[ 3553.029105] pagealloc: memory corruption


Reproducible: Always

Steps to Reproduce:
I don't know?

1. have page_poison=1 on kernel cmdline? also init_on_free=0 init_on_alloc=0

Also these in .config:
CONFIG_PAGE_POISONING=y
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_INIT_ON_FREE_DEFAULT_ON=y
CONFIG_GENERIC_ALLOCATOR=y

2.  boot in 5.13.1 kernel
3. dmesg -w
Actual Results:  
[ 3548.029860] 0000000010a1d90a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029861] 00000000745cca7f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029862] 00000000946ea4d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029863] 0000000024e3f4cb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029864] 0000000096a93df3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029865] 0000000038c9f642: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029866] 00000000d84c6ccb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029867] 000000006a16913a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.... more lines like this here.......
[ 3548.029950] 000000003bcec204: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029951] 000000005b7fc85f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029952] 00000000a004f1db: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029953] 00000000e86574f6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029954] 0000000069289984: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029954] 00000000a11f75c4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029956] 000000004939488c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029956] 0000000034627e1e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029957] 00000000965c64e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029958] 000000000becd19b: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029959] 000000005e856756: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029960] 00000000aa4b9a7f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029963] 000000006c09343f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029964] 0000000041f98e55: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029967] 000000005c74c360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029968] 0000000088a401c4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029969] 00000000e89bc5a9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029970] 00000000d0860d35: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029970] 0000000075d70043: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029971] 000000008af1d76b: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029972] 00000000c0acc4e1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029973] 00000000417cd107: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029974] 000000009c1f47e9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029975] 000000009378fafd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029976] 000000002b866202: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029977] 00000000d8fb1e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029977] 00000000917cd363: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029978] 00000000d1fe818a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029979] 00000000f7451d44: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029980] CPU: 11 PID: 68163 Comm: cc1 Kdump: loaded Tainted: G     U     O      5.13.1-gentoo-x86_64 #1
[ 3548.029983] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
[ 3548.029983] Call Trace:
[ 3548.029984]  ? dump_stack+0x64/0x7c
[ 3548.029987]  ? __kernel_unpoison_pages.cold+0x48/0x84
[ 3548.029990]  ? get_page_from_freelist+0xd4b/0xe80
[ 3548.029992]  ? __alloc_pages+0x163/0x2b0
[ 3548.029993]  ? __handle_mm_fault+0x9fc/0x11a0
[ 3548.029996]  ? handle_mm_fault+0xc0/0x290
[ 3548.029998]  ? exc_page_fault+0x19c/0x5f0
[ 3548.030000]  ? asm_exc_page_fault+0x5/0x20
[ 3548.030002]  ? asm_exc_page_fault+0x1b/0x20
[ 3548.030005] page:00000000b4715c0c refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0xe238c1
[ 3548.030007] flags: 0x8000000000000000(zone=2)
[ 3548.030009] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
[ 3548.030010] raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
[ 3548.030011] page dumped because: pagealloc: corrupted page details

this is a nice dmesg, because most of them an mixed output as if printing at the same time from different cores, so the lines are all a mess like:

[ 3548.029051]  ? asm_exc_page_fault+0x1b/0x20
[ 3548.029053] 0000000007abdad7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029054] 00000000abba84cc: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029054] page:000000005c08b43e refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0xf186f8
[ 3548.029054] CPU: 0 PID: 68248 Comm: cc1 Kdump: loaded Tainted: G     U     O      5.13.1-gentoo-x86_64 #1
[ 3548.029055] 000000009b219e2a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029056] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
[ 3548.029056] flags: 0x8000000000000000(zone=2)
[ 3548.029056] 0000000038f6ddc1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029057] Call Trace:
[ 3548.029058] 00000000fe4eefd3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029058] 00000000cb5e3067: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3548.029058]  ? dump_stack+0x64/0x7c



here's a bit of another:
[   57.759602] 00000000aaa04ddf: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   57.760366] 00000000f50e5f9c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   57.761129] CPU: 7 PID: 12120 Comm: findfs Tainted: G     U            5.13.1-gentoo-x86_64 #1
[   57.761901] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
[   57.762678] Call Trace:
[   57.763458]  ? dump_stack+0x64/0x7c
[   57.764242]  ? __kernel_unpoison_pages.cold+0x48/0x84
[   57.765029]  ? get_page_from_freelist+0xd4b/0xe80
[   57.765818]  ? finish_task_switch.isra.0+0x176/0x240
[   57.766611]  ? __alloc_pages+0x163/0x2b0
[   57.767404]  ? page_cache_ra_unbounded+0x111/0x1e0
[   57.768201]  ? filemap_get_pages+0x106/0x550
[   57.769000]  ? filemap_read+0x147/0x3a0
[   57.769802]  ? blkdev_read_iter+0x37/0x50
[   57.770605]  ? new_sync_read+0x175/0x200
[   57.771408]  ? vfs_read+0xed/0x180
[   57.772210]  ? ksys_read+0x62/0xe0
[   57.773015]  ? do_syscall_64+0x68/0x80
[   57.773821]  ? do_syscall_64+0x11/0x80
[   57.774634]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[   57.775447] page:000000000474d9a8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1e00d6
[   57.776269] flags: 0x8000000000000000(zone=2)
[   57.777095] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
[   57.777929] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   57.778761] page dumped because: pagealloc: corrupted page details
[   57.779619] pagealloc: memory corruption


Expected Results:  
no spam, just like in 5.12.13, I've searched the logs there isn't even a mention of "pagealloc: memory corruption" which presumably is a message that would've existed in that version of kernel ie.
mm/page_poison.c:68:            pr_err("pagealloc: memory corruption\n");

...therefore I conclude this must be a regression and these 5.13.1 messages are displayed in error, like maybe it expected page to be poisoned but it was instead zeroed... somehow... 

What changed in .config from 5.12.13 to 5.13.1:

--- '.config.prev28_5_12_13'
+++ '.config.prev29_5_13_1'
-CONFIG_ABX500_CORE is not set
+CONFIG_ADV_SWBUTTON is not set
+CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y
+CONFIG_ARCH_USE_MEMTEST=y
+CONFIG_AS_IS_GNU=y
+CONFIG_AS_VERSION=23601
+CONFIG_BATTERY_GOLDFISH is not set
+CONFIG_BINARY_PRINTF=y
-CONFIG_BLK_DEV_UMEM is not set
-CONFIG_BOUNCE=y
+CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
-CONFIG_CC_VERSION_TEXT="gcc (Gentoo 11.1.0-r1 p2) 11.1.0"
+CONFIG_CC_VERSION_TEXT="gcc (Gentoo 11.1.0-r2 p3) 11.1.0"
+CONFIG_CGROUP_MISC is not set
+CONFIG_CMA_SYSFS is not set
+CONFIG_COMEDI is not set
+CONFIG_CRYPTO_ECC=y
+CONFIG_CRYPTO_ECDSA=y
-CONFIG_DEVKMEM is not set
+CONFIG_DRM_GUD is not set
+CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
+CONFIG_DW_XDATA_PCIE is not set
-CONFIG_GENTOO_KERNEL_SELF_PROTECTION is not set
+CONFIG_GIGABYTE_WMI is not set
+CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
-CONFIG_HAVE_NET_DSA=y
+CONFIG_HID_FT260 is not set
+CONFIG_HID_SEMITEK is not set
+CONFIG_I2C_CP2615 is not set
+CONFIG_INPUT_IQS626A is not set
+CONFIG_INTEL_TCC_COOLING is not set
-CONFIG_LEDS_BLINK=y
+CONFIG_MARVELL_88X2222_PHY is not set
+CONFIG_MFD_ATC260X_I2C is not set
+CONFIG_MODPROBE_PATH="/sbin/modprobe"
+CONFIG_MODULE_COMPRESS_GZIP is not set
-CONFIG_MODULE_COMPRESS is not set
+CONFIG_MODULE_COMPRESS_NONE=y
+CONFIG_MODULE_COMPRESS_XZ is not set
+CONFIG_MODULE_COMPRESS_ZSTD is not set
+CONFIG_NETFILTER_XTABLES_COMPAT=y
+CONFIG_NET_SELFTESTS=y
+CONFIG_NET_SOCK_MSG=y
+CONFIG_NET_VENDOR_MICROSOFT is not set
-CONFIG_NF_LOG_COMMON=y
-CONFIG_NF_LOG_NETDEV is not set
+CONFIG_NF_LOG_SYSLOG=y
+CONFIG_NXP_C45_TJA11XX_PHY is not set
+CONFIG_PCPU_DEV_REFCNT=y
+CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y
+CONFIG_RTC_DRV_GOLDFISH is not set
+CONFIG_SECURITY_LANDLOCK=y
-CONFIG_SECURITY_PATH is not set
+CONFIG_SECURITY_PATH=y
-CONFIG_SENSORS_AMD_ENERGY is not set
+CONFIG_SENSORS_NZXT_KRAKEN2 is not set
+CONFIG_SND_CTL_LED=y
+CONFIG_SND_HDA_PREALLOC_SIZE=0
-CONFIG_SND_HDA_PREALLOC_SIZE=4096
-CONFIG_SND_VERBOSE_PRINTK is not set
+CONFIG_SND_VERBOSE_PRINTK=y
+CONFIG_SYSTEM_REVOCATION_LIST is not set
+CONFIG_TEST_DIV64=y
-CONFIG_TRACE_SINK is not set
+CONFIG_VMLINUX_MAP is not set
+CONFIG_WWAN is not set
-CONFIG_XILINX_ZYNQMP_DPDMA is not set


Some list of "Comm:"`s that cause this (if not any/all):
openrc-run.sh
cc1
kworker/u24:10
stmpfiles-dev
swapper/5
syslog-ng
btrfs-transacti
kworker/u24:3
start-stop-daem
kworker/u24:9
crond
udevd
init
openrc
findfs
...

The workaround is to use page_poison=0 on kernel cmdline.

Someone mentioned this issue before, on 29 June 2021, here https://www.phoronix.com/forums/forum/software/general-linux-open-source/1264763-kernel-5-13-0-memory-corruption
but I've no idea why they say "I have now learnt, page_poison=1 is obsolete with 5.13" because that doesn't seem to make any sense... even if it did, it wouldn't be acting like this and spam dmesg.
Comment 1 Conrad Kostecki gentoo-dev 2021-07-10 14:56:26 UTC
Did you reported that to upstream?
Comment 2 bowsingbetee 2021-07-10 15:56:46 UTC
(In reply to Conrad Kostecki from comment #1)
> Did you reported that to upstream?

No I haven't, because " Please use your distribution's bug tracking tools

This bugzilla is for reporting bugs against upstream Linux kernels. "

but I did search for it to no avail.


Here's some output with these added .config options:
--- '.config.prev29_5_13_1'
+++ '.config.tmp_debugallocandowner_5_13_1'
+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT is not set
-CONFIG_DEBUG_PAGEALLOC is not set
+CONFIG_DEBUG_PAGEALLOC=y
+CONFIG_HAVE_RELIABLE_STACKTRACE=y
-CONFIG_PAGE_EXTENSION is not set
+CONFIG_PAGE_EXTENSION=y
-CONFIG_PAGE_OWNER is not set
+CONFIG_PAGE_OWNER=y
+CONFIG_STACKDEPOT=y
+CONFIG_STACK_HASH_ORDER=20
-CONFIG_UNWINDER_GUESS=y
-CONFIG_UNWINDER_ORC is not set
+CONFIG_UNWINDER_ORC=y

and /proc/cmdline having "page_owner=on debug_pagealloc=on page_poison=1 init_on_free=0 init_on_alloc=0 slub_debug=P randomize_kstack_offset=on" among other things.

[  648.414168] 0000000085629bdd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414169] 0000000022861832: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414169] 00000000c597f5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414170] CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G     U     O      5.13.1-gentoo-x86_64 #1
[  648.414171] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
[  648.414171] Call Trace:
[  648.414172]  dump_stack+0x64/0x7c
[  648.414173]  __kernel_unpoison_pages.cold+0x48/0x84
[  648.414174]  post_alloc_hook+0x60/0xa0
[  648.414175]  get_page_from_freelist+0xdb8/0x1000
[  648.414176]  __alloc_pages+0x163/0x2b0
[  648.414177]  __get_free_pages+0xc/0x30
[  648.414178]  pgd_alloc+0x2e/0x1a0
[  648.414179]  ? dup_mm+0x37/0x4f0
[  648.414181]  mm_init+0x185/0x270
[  648.414182]  dup_mm+0x6b/0x4f0
[  648.414183]  ? __lock_task_sighand+0x35/0x70
[  648.414184]  copy_process+0x190d/0x1b10
[  648.414186]  kernel_clone+0xba/0x3b0
[  648.414187]  __do_sys_clone+0x8f/0xb0
[  648.414189]  do_syscall_64+0x68/0x80
[  648.414191]  ? do_syscall_64+0x11/0x80
[  648.414192]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  648.414194] page:0000000072a7ea63 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24b45b
[  648.414195] flags: 0x8000000000000000(zone=2)
[  648.414196] raw: 8000000000000000 0000000000000000 ffffffff00000101 0000000000000000
[  648.414197] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  648.414197] page dumped because: pagealloc: corrupted page details
[  648.414198] page_owner tracks the page as freed
[  648.414198] page last allocated via order 1, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 16307, ts 647996700160, free_ts 648409217544
[  648.414199]  get_page_from_freelist+0xdb8/0x1000
[  648.414200]  __alloc_pages+0x163/0x2b0
[  648.414201]  __get_free_pages+0xc/0x30
[  648.414202]  pgd_alloc+0x2e/0x1a0
[  648.414203]  mm_init+0x185/0x270
[  648.414204]  alloc_bprm+0x80/0x250
[  648.414206]  do_execveat_common.isra.0+0x8a/0x1b0
[  648.414207]  __x64_sys_execve+0x2e/0x40
[  648.414208]  do_syscall_64+0x68/0x80
[  648.414210]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  648.414211] page last free stack trace:
[  648.414212]  __free_pages_ok+0x1a1/0x2a0
[  648.414212]  __mmdrop+0x4c/0x100
[  648.414214]  finish_task_switch.isra.0+0x176/0x240
[  648.414215]  __schedule+0x2ca/0x8a0
[  648.414217]  schedule+0x41/0xa0
[  648.414218]  schedule_hrtimeout_range_clock+0xf7/0x170
[  648.414220]  do_epoll_wait+0x60d/0x750
[  648.414221]  __x64_sys_epoll_wait+0x51/0x80
[  648.414222]  do_syscall_64+0x68/0x80
[  648.414225]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  648.414235] pagealloc: memory corruption
[  648.414235] 00000000816303a3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414236] 00000000612f6a1d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414238] 000000008fc4c0cd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414239] 00000000c2faed8e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414240] 00000000cbcf40f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414241] 000000000da85ded: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414241] 00000000fdd822a1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414242] 000000000397478a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414242] 000000009d2bf958: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414243] 000000003230ac53: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414243] 0000000059c12d76: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414244] 000000006d4b7fd5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414244] 00000000e4da12ad: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414245] 0000000057727747: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414247] 00000000bd7c04f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414248] 000000005de9b946: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414249] 000000008f1aed15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414250] 00000000b36569fb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414251] 000000006a40258c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414251] 00000000fdff468c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414252] 000000003fc587d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414252] 00000000e9de0a11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414252] 000000003f0f17da: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414253] 00000000a1ecd3eb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414253] 00000000b2eefa77: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414254] 00000000907ab495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
****lots more of these*****
[  648.414420] 000000003362efba: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414420] 000000009e26a725: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414421] 00000000c5329907: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  648.414422] CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G     U     O      5.13.1-gentoo-x86_64 #1
[  648.414423] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
[  648.414424] Call Trace:
[  648.414424]  dump_stack+0x64/0x7c
[  648.414425]  __kernel_unpoison_pages.cold+0x48/0x84
[  648.414426]  post_alloc_hook+0x60/0xa0
[  648.414428]  get_page_from_freelist+0xdb8/0x1000
[  648.414429]  ? vm_area_dup+0x21/0xa0
[  648.414431]  __alloc_pages+0x163/0x2b0
[  648.414432]  get_zeroed_page+0x14/0x40
[  648.414433]  __pud_alloc+0x23/0xb0
[  648.414436]  copy_page_range+0xeb5/0x1000
[  648.414437]  ? ___slab_alloc.constprop.0+0x39d/0x4c0
[  648.414440]  ? init_object+0x67/0x80
[  648.414441]  ? ___slab_alloc.constprop.0+0x39d/0x4c0
[  648.414443]  ? anon_vma_fork+0x97/0x160
[  648.414444]  ? anon_vma_clone+0x60/0x1e0
[  648.414445]  ? kmem_cache_alloc+0x174/0x2c0
[  648.414447]  ? anon_vma_fork+0x12d/0x160
[  648.414448]  dup_mm+0x347/0x4f0
[  648.414450]  copy_process+0x190d/0x1b10
[  648.414451]  kernel_clone+0xba/0x3b0
[  648.414453]  __do_sys_clone+0x8f/0xb0
[  648.414454]  do_syscall_64+0x68/0x80
[  648.414456]  ? do_syscall_64+0x11/0x80
[  648.414458]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  648.414460] page:0000000033679bc8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1a8082
[  648.414462] flags: 0x8000000000000000(zone=2)
[  648.414463] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
[  648.414463] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  648.414464] page dumped because: pagealloc: corrupted page details
[  648.414464] page_owner tracks the page as freed
[  648.414464] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 15828, ts 370690740164, free_ts 647995680491
[  648.414468]  get_page_from_freelist+0xdb8/0x1000
[  648.414469]  __alloc_pages+0x163/0x2b0
[  648.414469]  __pmd_alloc+0x2b/0x190
[  648.414471]  __handle_mm_fault+0x3fe/0x11a0
[  648.414472]  handle_mm_fault+0xc0/0x290
[  648.414474]  exc_page_fault+0x19c/0x5f0
[  648.414475]  asm_exc_page_fault+0x1b/0x20
[  648.414476] page last free stack trace:
[  648.414476]  free_pcp_prepare+0xe3/0x140
[  648.414478]  free_unref_page_list+0xbe/0x180
[  648.414478]  release_pages+0x193/0x3f0
[  648.414480]  tlb_finish_mmu+0x54/0x180
[  648.414481]  exit_mmap+0x166/0x1f0
[  648.414482]  mmput+0x37/0x100
[  648.414483]  do_exit+0x30b/0xa20
[  648.414484]  do_group_exit+0x2e/0x90
[  648.414485]  __x64_sys_exit_group+0xf/0x10
[  648.414486]  do_syscall_64+0x68/0x80
[  648.414487]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  653.483785] check_poison_mem: 7374 callbacks suppressed
[  653.483787] pagealloc: memory corruption
[  653.483790] 00000000e3a01f27: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  653.483791] 0000000048e2ff12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00


let me know if you need more info? brb in a few hours
Comment 3 bowsingbetee 2021-07-10 19:38:13 UTC
Created attachment 723175 [details]
kernel .config aka .config.tmp_debugallocandowner_5_13_1

this is the kernel .config where I've, in addition, enabled debug allocator and page owner, as per my prev. comment.

I'm using openrc on ~amd64 and this is my emerge --info too:
$ emerge --info
Portage 3.0.20 (python 3.9.6-final-0, default/linux/amd64/17.1, gcc-11.1.0, glibc-2.33-r1, 5.13.1-gentoo-x86_64 x86_64)
=================================================================
System uname: Linux-5.13.1-gentoo-x86_64-x86_64-Intel-R-_Core-TM-_i7-8700K_CPU_@_3.70GHz-with-glibc2.33
KiB Mem:    63899348 total,  56057852 free
KiB Swap:  100663292 total, 100663292 free
Timestamp of repository gentoo: Sat, 10 Jul 2021 07:30:01 +0000
Head commit of repository gentoo: a5682095d312e494815a0aff6bf84983d3c271a5
sh bash 9999
ld GNU ld (Gentoo 2.36.1 p3) 2.36.1
ccache version 4.3 [enabled]
app-shells/bash:          9999::localrepo
dev-lang/perl:            5.34.0::gentoo
dev-lang/python:          3.9.6::gentoo, 3.10.0_beta3::gentoo
dev-lang/rust:            1.52.1::localrepo
dev-util/ccache:          4.3-r2::gentoo
dev-util/cmake:           3.20.5::gentoo
sys-apps/baselayout:      2.7-r3::gentoo
sys-apps/openrc:          0.43.3::gentoo
sys-apps/sandbox:         2.24::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r5::gentoo
sys-devel/automake:       1.16.3-r1::gentoo
sys-devel/binutils:       2.36.1-r1::gentoo
sys-devel/gcc:            11.1.0-r2::gentoo
sys-devel/gcc-config:     2.4::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.13::gentoo (virtual/os-headers)
sys-libs/glibc:           2.33-r1::gentoo
Repositories:                                                                   
                                                                                
gentoo                                                                          
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: 5000
    sync-rsync-verify-metamanifest: yes
    sync-rsync-extra-opts: 
    sync-rsync-verify-jobs: 0
    sync-rsync-vcs-ignore: false
    sync-rsync-verify-max-age: 2

localrepo
    location: /var/db/repos/localrepo
    masters: gentoo
    priority: 6000

Binary Repositories:

var-cache-binpkgs--local-binhost
    priority: 5000
    sync-uri: file:///var/cache/binpkgs

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.0/ext-active/ /etc/php/cgi-php8.0/ext-active/ /etc/php/cli-php8.0/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1"
DISTDIR="/var/cache/distfiles"
EMERGE_DEFAULT_OPTS=" --jobs=4 --load-average=4 --keep-going=n --usepkg=y --ask --ask-enter-invalid --binpkg-respect-use=y --binpkg-changed-deps=y --tree --deep --nospinner --backtrack=300 --with-bdeps=y --forceWKDupdate n --jobs=12 --load-average=12"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg buildsyspkg ccache cgroup collision-protect config-protect-if-modified distlocks downgrade-backup ebuild-locks fakeroot fixlafiles force-mirror getbinpkg installsources ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox prelink-checksums preserve-libs qa-unresolved-soname-deps sandbox sfperms skiprocheck split-elog split-log splitdebug strict suidctl unknown-features-warn unmerge-logs userpriv usersandbox"
FFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1"
GENTOO_MIRRORS="https://mirrors.evowise.com/gentoo/ https://mirror.dkm.cz/gentoo/ https://ftp.fau.de/gentoo https://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ https://gentoo.wheel.sk/ https://gentoo.osuosl.org/ https://mirror.ps.kz/gentoo/pub/ https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ https://mirror.yandex.ru/gentoo-distfiles/ https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://ftp.halifax.rwth-aachen.de/gentoo/ https://ftp.halifax.rwth-aachen.de/gentoo/distfiles/ https://distfiles.gentoo.org"
INSTALL_MASK="/lib/systemd /lib32/systemd /lib64/systemd /usr/lib/systemd /usr/lib32/systemd /usr/lib64/systemd /etc/systemd"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro"
MAKEOPTS="--no-keep-going --output-sync=target -j18"
PKGDIR="/var/cache/binpkgs"
PORTAGE_BINHOST=""
PORTAGE_COMPRESS=""
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
RUSTFLAGS="-C target-cpu=skylake "
USE="X acl aes amd64 avx avx2 bindist btrfs bzip2 ccache cli cscope dbus dri elogind extensions f16c ffmpeg fma3 gdbm git gpg gpm gtk3 iconv jpeg libglvnd libtirpc lm_sensors lock mmx mmxext mosh-hardening multilib ncurses nptl ogg openmp opus pam pclmul pcre pie png policykit popcnt pulseaudio qt5 readline rsync-verify seccomp session smp source-highlight split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification strong-security unicode verify-sig xcomposite zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc" INPUT_DEVICES="libinput evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, LC_ALL, LINGUAS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS


Should I even attempt bisecting kernel? any info(links?) on how to do it on Gentoo maybe?
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-10 19:45:05 UTC
Upstream has a few regressions and fixes in 5.12/5.13 around interaction between page_poison/debug_pagealloc/_init+_free. Probably another regression cropped up.

Looking at the backtrace freelist pages are memset(0) while they should be poisoned there.

Is it a sys-kernel/gentoo-sources?
Comment 5 bowsingbetee 2021-07-10 19:56:23 UTC
(In reply to Sergei Trofimovich from comment #4)
> Upstream has a few regressions and fixes in 5.12/5.13 around interaction
> between page_poison/debug_pagealloc/_init+_free. Probably another regression
> cropped up.
> 
> Looking at the backtrace freelist pages are memset(0) while they should be
> poisoned there.
> 
> Is it a sys-kernel/gentoo-sources?

yes it is:

sys-kernel/gentoo-sources-5.12.13::gentoo was built with the following:
USE="-build -experimental -symlink" ABI_X86="(64)"


sys-kernel/gentoo-sources-5.13.1::gentoo was built with the following:
USE="-build -experimental -symlink" ABI_X86="(64)"


Btw, I've tried bisecting with instructions from https://wiki.gentoo.org/wiki/Kernel_git-bisect
but I'm failing at this step:
root #git bisect bad v2.6.39.2 | tee -a /root/bisect.log 
the problem is that v5.13.1 doesn't exist as a tag only v5.13 and the 7 rc`s
so I've no idea how to get v5.13.1 or v5.12.13 to show in 'git tag' or for 'git bisect bad' to see it...
Comment 6 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-10 20:03:49 UTC
Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around.
Comment 7 bowsingbetee 2021-07-10 20:14:34 UTC
(In reply to Sergei Trofimovich from comment #6)
> Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around.

hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I might hit some btrfs corruption bug(or similar) that would mess with my data hmm... still, I'll try to find a way... maybe copy whole system (minus my data) to another drive and test on it.
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-10 20:56:38 UTC
(In reply to bowsingbetee from comment #7)
> (In reply to Sergei Trofimovich from comment #6)
> > Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around.
> 
> hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I
> might hit some btrfs corruption bug(or similar) that would mess with my data
> hmm... still, I'll try to find a way... maybe copy whole system (minus my
> data) to another drive and test on it.

I think it's only relevant if you have a swap device: https://lwn.net/Articles/848431/ . The regression was in incorrect block number calculation when writing to a swap partition on a block device.  If you disable any swap partitions temporarily it should be safe.
Comment 9 bowsingbetee 2021-07-10 21:03:51 UTC
(In reply to Sergei Trofimovich from comment #4)
> Upstream has a few regressions and fixes in 5.12/5.13 around interaction
> between page_poison/debug_pagealloc/_init+_free. Probably another regression
> cropped up.
> 
> Looking at the backtrace freelist pages are memset(0) while they should be
> poisoned there.
> 

Where did you find that memset(0) thing? maybe it's easier for me to start from there than bisect.

(In reply to Sergei Trofimovich from comment #8)
> (In reply to bowsingbetee from comment #7)
> > (In reply to Sergei Trofimovich from comment #6)
> > > Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around.
> > 
> > hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I
> > might hit some btrfs corruption bug(or similar) that would mess with my data
> > hmm... still, I'll try to find a way... maybe copy whole system (minus my
> > data) to another drive and test on it.
> 
> I think it's only relevant if you have a swap device:
> https://lwn.net/Articles/848431/ . The regression was in incorrect block
> number calculation when writing to a swap partition on a block device.  If
> you disable any swap partitions temporarily it should be safe.

I didn't look into what that was about when I wrote that. Thanks! 
Generally speaking, I will be using a test drive (as soon as I figure out how to do it) to boot that kernel, just to avoid corruption on my main one(as long as it's not going to be mounted rw), just in case there are any corruption bugs - like kernel may panic at the wrong time I don't know (there may not be any bugs, I'm just trying to be sure, because I've already lost btrfs contents twice thus far in my lifetime from OS crashes)
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-10 21:57:15 UTC
I see the same poisoning mis-reports on vanilla linux.git on the following setup:

- kernel command: page_poison=1 init_on_free=0 init_on_alloc=0
- kernel config:
  * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
  * CONFIG_INIT_ON_FREE_DEFAULT_ON=y
  * CONFIG_PAGE_POISONING=y

v5.12 works ok, boots as:
  [    0.009691][    T0] mem auto-init: stack:off, heap alloc:off, heap free:off

v5.13 warns, boots as:
  [    0.009746][    T0] mem auto-init: stack:off, heap alloc:on, heap free:on

I think it's a bug and initial memory initialization adheres to CONFIG_INIT_ON_FREE_DEFAULT_ON=y instead of expected CONFIG_PAGE_POISONING=y

Easily reproducible in qemu. I'll bisect, but it's probably related to static key conversion.
Comment 11 bowsingbetee 2021-07-11 11:06:48 UTC
Created attachment 723247 [details, diff]
reverted commit 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 in patch form

(In reply to Sergei Trofimovich from comment #10)
> I see the same poisoning mis-reports on vanilla linux.git on the following
> setup:
> 
> - kernel command: page_poison=1 init_on_free=0 init_on_alloc=0
> - kernel config:
>   * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
>   * CONFIG_INIT_ON_FREE_DEFAULT_ON=y
>   * CONFIG_PAGE_POISONING=y
> 
> v5.12 works ok, boots as:
>   [    0.009691][    T0] mem auto-init: stack:off, heap alloc:off, heap
> free:off
> 
> v5.13 warns, boots as:
>   [    0.009746][    T0] mem auto-init: stack:off, heap alloc:on, heap
> free:on
> 
> I think it's a bug and initial memory initialization adheres to
> CONFIG_INIT_ON_FREE_DEFAULT_ON=y instead of expected CONFIG_PAGE_POISONING=y
> 
> Easily reproducible in qemu. I'll bisect, but it's probably related to
> static key conversion.

I hadn't bisected yet, but based on what you just said I had guessed that the problematic commit is likely 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 which I have tested to be so by applying its reverse on top of 5.13.1 gentoo-sources, then removing it (patch -R) to confirm the problem came back, without changing anything in .config or /proc/cmdline

So, maybe your bisect will show it too, unless I've made some severe mistake that I'm not aware of, which is always possible.

If that is indeed the one, could you by any chance notify upstream? I'm not familiar with how they do things (email lists and such). Thanks in advance either way and many thanks for finding this! I'll be applying this revert patch locally on my system until further notice.
Comment 12 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-11 23:59:37 UTC
Bisecting warning was a bit complicate because the static key commit broke boot for my VM. Sent the report with more details to to linux-mm@ as https://lore.kernel.org/linux-mm/20210712005732.4f9bfa78@zn3/T/#u
Comment 13 Sergei Trofimovich (RETIRED) gentoo-dev 2021-07-12 22:00:32 UTC
Created attachment 723640 [details, diff]
0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch

Try the 0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch.

Also proposed the same patch upstream as https://lore.kernel.org/linux-mm/20210712215816.1512739-1-slyfox@gentoo.org/T/#u
Comment 14 bowsingbetee 2021-07-13 07:05:16 UTC
The patch works for me! I've applied it on top of sys-kernel/gentoo-sources-5.13.1::gentoo.

[   30.836197] mem auto-init: SLAB_POISON will take precedence over init_on_alloc/init_on_free
[   30.848943] mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_alloc
[   30.850757] mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_free
[   30.852642] mem auto-init: stack:byref_all(zero), heap alloc:off, heap free:off

tested with: page_poison=1 init_on_free=0 init_on_alloc=0 slub_debug=P

As an aside, if I try to get the raw[1] version from lore there are extra chars there like "=3D" and "=20" inserted compared to what's seen in [2] so, it's good to know to avoid that raw thing. eg. "That caused page_poison=3D1 / init_on_free=3D1 conflict."

[1] https://lore.kernel.org/linux-mm/20210712215816.1512739-1-slyfox@gentoo.org/raw
[2] https://lore.kernel.org/linux-mm/20210712005732.4f9bfa78@zn3/t/#m9696b571d816104c1d38a07ff4689c3c25bc64ba

Thank you for your work!
Comment 15 Mike Pagano gentoo-dev 2021-08-17 17:53:23 UTC
This is now in kernels >= 13.6. Thanks for reporting and thanks for the great work, slyfox!