Summary: | page_poison=1 on kernel 5.13.1 (vs 5.12.13) spams dmesg with page dumps due to "pagealloc: memory corruption" | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | bowsingbetee <bowsingbetee> |
Component: | Current packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bowsingbetee, slyfox |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=213697 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
kernel .config aka .config.tmp_debugallocandowner_5_13_1
reverted commit 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 in patch form 0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch |
Description
bowsingbetee
2021-07-10 14:20:15 UTC
Did you reported that to upstream? (In reply to Conrad Kostecki from comment #1) > Did you reported that to upstream? No I haven't, because " Please use your distribution's bug tracking tools This bugzilla is for reporting bugs against upstream Linux kernels. " but I did search for it to no avail. Here's some output with these added .config options: --- '.config.prev29_5_13_1' +++ '.config.tmp_debugallocandowner_5_13_1' +CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT is not set -CONFIG_DEBUG_PAGEALLOC is not set +CONFIG_DEBUG_PAGEALLOC=y +CONFIG_HAVE_RELIABLE_STACKTRACE=y -CONFIG_PAGE_EXTENSION is not set +CONFIG_PAGE_EXTENSION=y -CONFIG_PAGE_OWNER is not set +CONFIG_PAGE_OWNER=y +CONFIG_STACKDEPOT=y +CONFIG_STACK_HASH_ORDER=20 -CONFIG_UNWINDER_GUESS=y -CONFIG_UNWINDER_ORC is not set +CONFIG_UNWINDER_ORC=y and /proc/cmdline having "page_owner=on debug_pagealloc=on page_poison=1 init_on_free=0 init_on_alloc=0 slub_debug=P randomize_kstack_offset=on" among other things. [ 648.414168] 0000000085629bdd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414169] 0000000022861832: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414169] 00000000c597f5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414170] CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G U O 5.13.1-gentoo-x86_64 #1 [ 648.414171] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021 [ 648.414171] Call Trace: [ 648.414172] dump_stack+0x64/0x7c [ 648.414173] __kernel_unpoison_pages.cold+0x48/0x84 [ 648.414174] post_alloc_hook+0x60/0xa0 [ 648.414175] get_page_from_freelist+0xdb8/0x1000 [ 648.414176] __alloc_pages+0x163/0x2b0 [ 648.414177] __get_free_pages+0xc/0x30 [ 648.414178] pgd_alloc+0x2e/0x1a0 [ 648.414179] ? dup_mm+0x37/0x4f0 [ 648.414181] mm_init+0x185/0x270 [ 648.414182] dup_mm+0x6b/0x4f0 [ 648.414183] ? __lock_task_sighand+0x35/0x70 [ 648.414184] copy_process+0x190d/0x1b10 [ 648.414186] kernel_clone+0xba/0x3b0 [ 648.414187] __do_sys_clone+0x8f/0xb0 [ 648.414189] do_syscall_64+0x68/0x80 [ 648.414191] ? do_syscall_64+0x11/0x80 [ 648.414192] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 648.414194] page:0000000072a7ea63 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24b45b [ 648.414195] flags: 0x8000000000000000(zone=2) [ 648.414196] raw: 8000000000000000 0000000000000000 ffffffff00000101 0000000000000000 [ 648.414197] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 648.414197] page dumped because: pagealloc: corrupted page details [ 648.414198] page_owner tracks the page as freed [ 648.414198] page last allocated via order 1, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 16307, ts 647996700160, free_ts 648409217544 [ 648.414199] get_page_from_freelist+0xdb8/0x1000 [ 648.414200] __alloc_pages+0x163/0x2b0 [ 648.414201] __get_free_pages+0xc/0x30 [ 648.414202] pgd_alloc+0x2e/0x1a0 [ 648.414203] mm_init+0x185/0x270 [ 648.414204] alloc_bprm+0x80/0x250 [ 648.414206] do_execveat_common.isra.0+0x8a/0x1b0 [ 648.414207] __x64_sys_execve+0x2e/0x40 [ 648.414208] do_syscall_64+0x68/0x80 [ 648.414210] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 648.414211] page last free stack trace: [ 648.414212] __free_pages_ok+0x1a1/0x2a0 [ 648.414212] __mmdrop+0x4c/0x100 [ 648.414214] finish_task_switch.isra.0+0x176/0x240 [ 648.414215] __schedule+0x2ca/0x8a0 [ 648.414217] schedule+0x41/0xa0 [ 648.414218] schedule_hrtimeout_range_clock+0xf7/0x170 [ 648.414220] do_epoll_wait+0x60d/0x750 [ 648.414221] __x64_sys_epoll_wait+0x51/0x80 [ 648.414222] do_syscall_64+0x68/0x80 [ 648.414225] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 648.414235] pagealloc: memory corruption [ 648.414235] 00000000816303a3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414236] 00000000612f6a1d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414238] 000000008fc4c0cd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414239] 00000000c2faed8e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414240] 00000000cbcf40f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414241] 000000000da85ded: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414241] 00000000fdd822a1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414242] 000000000397478a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414242] 000000009d2bf958: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414243] 000000003230ac53: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414243] 0000000059c12d76: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414244] 000000006d4b7fd5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414244] 00000000e4da12ad: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414245] 0000000057727747: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414247] 00000000bd7c04f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414248] 000000005de9b946: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414249] 000000008f1aed15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414250] 00000000b36569fb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414251] 000000006a40258c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414251] 00000000fdff468c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414252] 000000003fc587d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414252] 00000000e9de0a11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414252] 000000003f0f17da: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414253] 00000000a1ecd3eb: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414253] 00000000b2eefa77: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414254] 00000000907ab495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ****lots more of these***** [ 648.414420] 000000003362efba: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414420] 000000009e26a725: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414421] 00000000c5329907: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 648.414422] CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G U O 5.13.1-gentoo-x86_64 #1 [ 648.414423] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021 [ 648.414424] Call Trace: [ 648.414424] dump_stack+0x64/0x7c [ 648.414425] __kernel_unpoison_pages.cold+0x48/0x84 [ 648.414426] post_alloc_hook+0x60/0xa0 [ 648.414428] get_page_from_freelist+0xdb8/0x1000 [ 648.414429] ? vm_area_dup+0x21/0xa0 [ 648.414431] __alloc_pages+0x163/0x2b0 [ 648.414432] get_zeroed_page+0x14/0x40 [ 648.414433] __pud_alloc+0x23/0xb0 [ 648.414436] copy_page_range+0xeb5/0x1000 [ 648.414437] ? ___slab_alloc.constprop.0+0x39d/0x4c0 [ 648.414440] ? init_object+0x67/0x80 [ 648.414441] ? ___slab_alloc.constprop.0+0x39d/0x4c0 [ 648.414443] ? anon_vma_fork+0x97/0x160 [ 648.414444] ? anon_vma_clone+0x60/0x1e0 [ 648.414445] ? kmem_cache_alloc+0x174/0x2c0 [ 648.414447] ? anon_vma_fork+0x12d/0x160 [ 648.414448] dup_mm+0x347/0x4f0 [ 648.414450] copy_process+0x190d/0x1b10 [ 648.414451] kernel_clone+0xba/0x3b0 [ 648.414453] __do_sys_clone+0x8f/0xb0 [ 648.414454] do_syscall_64+0x68/0x80 [ 648.414456] ? do_syscall_64+0x11/0x80 [ 648.414458] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 648.414460] page:0000000033679bc8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1a8082 [ 648.414462] flags: 0x8000000000000000(zone=2) [ 648.414463] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000 [ 648.414463] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 648.414464] page dumped because: pagealloc: corrupted page details [ 648.414464] page_owner tracks the page as freed [ 648.414464] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 15828, ts 370690740164, free_ts 647995680491 [ 648.414468] get_page_from_freelist+0xdb8/0x1000 [ 648.414469] __alloc_pages+0x163/0x2b0 [ 648.414469] __pmd_alloc+0x2b/0x190 [ 648.414471] __handle_mm_fault+0x3fe/0x11a0 [ 648.414472] handle_mm_fault+0xc0/0x290 [ 648.414474] exc_page_fault+0x19c/0x5f0 [ 648.414475] asm_exc_page_fault+0x1b/0x20 [ 648.414476] page last free stack trace: [ 648.414476] free_pcp_prepare+0xe3/0x140 [ 648.414478] free_unref_page_list+0xbe/0x180 [ 648.414478] release_pages+0x193/0x3f0 [ 648.414480] tlb_finish_mmu+0x54/0x180 [ 648.414481] exit_mmap+0x166/0x1f0 [ 648.414482] mmput+0x37/0x100 [ 648.414483] do_exit+0x30b/0xa20 [ 648.414484] do_group_exit+0x2e/0x90 [ 648.414485] __x64_sys_exit_group+0xf/0x10 [ 648.414486] do_syscall_64+0x68/0x80 [ 648.414487] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 653.483785] check_poison_mem: 7374 callbacks suppressed [ 653.483787] pagealloc: memory corruption [ 653.483790] 00000000e3a01f27: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 653.483791] 0000000048e2ff12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 let me know if you need more info? brb in a few hours Created attachment 723175 [details] kernel .config aka .config.tmp_debugallocandowner_5_13_1 this is the kernel .config where I've, in addition, enabled debug allocator and page owner, as per my prev. comment. I'm using openrc on ~amd64 and this is my emerge --info too: $ emerge --info Portage 3.0.20 (python 3.9.6-final-0, default/linux/amd64/17.1, gcc-11.1.0, glibc-2.33-r1, 5.13.1-gentoo-x86_64 x86_64) ================================================================= System uname: Linux-5.13.1-gentoo-x86_64-x86_64-Intel-R-_Core-TM-_i7-8700K_CPU_@_3.70GHz-with-glibc2.33 KiB Mem: 63899348 total, 56057852 free KiB Swap: 100663292 total, 100663292 free Timestamp of repository gentoo: Sat, 10 Jul 2021 07:30:01 +0000 Head commit of repository gentoo: a5682095d312e494815a0aff6bf84983d3c271a5 sh bash 9999 ld GNU ld (Gentoo 2.36.1 p3) 2.36.1 ccache version 4.3 [enabled] app-shells/bash: 9999::localrepo dev-lang/perl: 5.34.0::gentoo dev-lang/python: 3.9.6::gentoo, 3.10.0_beta3::gentoo dev-lang/rust: 1.52.1::localrepo dev-util/ccache: 4.3-r2::gentoo dev-util/cmake: 3.20.5::gentoo sys-apps/baselayout: 2.7-r3::gentoo sys-apps/openrc: 0.43.3::gentoo sys-apps/sandbox: 2.24::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.69-r5::gentoo sys-devel/automake: 1.16.3-r1::gentoo sys-devel/binutils: 2.36.1-r1::gentoo sys-devel/gcc: 11.1.0-r2::gentoo sys-devel/gcc-config: 2.4::gentoo sys-devel/libtool: 2.4.6-r6::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 5.13::gentoo (virtual/os-headers) sys-libs/glibc: 2.33-r1::gentoo Repositories: gentoo location: /var/db/repos/gentoo sync-type: rsync sync-uri: rsync://rsync.gentoo.org/gentoo-portage priority: 5000 sync-rsync-verify-metamanifest: yes sync-rsync-extra-opts: sync-rsync-verify-jobs: 0 sync-rsync-vcs-ignore: false sync-rsync-verify-max-age: 2 localrepo location: /var/db/repos/localrepo masters: gentoo priority: 6000 Binary Repositories: var-cache-binpkgs--local-binhost priority: 5000 sync-uri: file:///var/cache/binpkgs ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="@FREE" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.0/ext-active/ /etc/php/cgi-php8.0/ext-active/ /etc/php/cli-php8.0/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1" DISTDIR="/var/cache/distfiles" EMERGE_DEFAULT_OPTS=" --jobs=4 --load-average=4 --keep-going=n --usepkg=y --ask --ask-enter-invalid --binpkg-respect-use=y --binpkg-changed-deps=y --tree --deep --nospinner --backtrack=300 --with-bdeps=y --forceWKDupdate n --jobs=12 --load-average=12" ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg buildsyspkg ccache cgroup collision-protect config-protect-if-modified distlocks downgrade-backup ebuild-locks fakeroot fixlafiles force-mirror getbinpkg installsources ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox prelink-checksums preserve-libs qa-unresolved-soname-deps sandbox sfperms skiprocheck split-elog split-log splitdebug strict suidctl unknown-features-warn unmerge-logs userpriv usersandbox" FFLAGS="-march=skylake -mtune=skylake -mprefer-vector-width=128 -O2 -pipe -frecord-gcc-switches -ggdb -fvar-tracking-assignments -fno-omit-frame-pointer -ftrack-macro-expansion=2 -fstack-protector-all -Wno-trigraphs -fno-schedule-insns2 -fno-delete-null-pointer-checks -D_FORTIFY_SOURCE=2 -rdynamic -flifetime-dse=1" GENTOO_MIRRORS="https://mirrors.evowise.com/gentoo/ https://mirror.dkm.cz/gentoo/ https://ftp.fau.de/gentoo https://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ https://gentoo.wheel.sk/ https://gentoo.osuosl.org/ https://mirror.ps.kz/gentoo/pub/ https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ https://mirror.yandex.ru/gentoo-distfiles/ https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://ftp.halifax.rwth-aachen.de/gentoo/ https://ftp.halifax.rwth-aachen.de/gentoo/distfiles/ https://distfiles.gentoo.org" INSTALL_MASK="/lib/systemd /lib32/systemd /lib64/systemd /usr/lib/systemd /usr/lib32/systemd /usr/lib64/systemd /etc/systemd" LANG="en_US.utf8" LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro" MAKEOPTS="--no-keep-going --output-sync=target -j18" PKGDIR="/var/cache/binpkgs" PORTAGE_BINHOST="" PORTAGE_COMPRESS="" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" RUSTFLAGS="-C target-cpu=skylake " USE="X acl aes amd64 avx avx2 bindist btrfs bzip2 ccache cli cscope dbus dri elogind extensions f16c ffmpeg fma3 gdbm git gpg gpm gtk3 iconv jpeg libglvnd libtirpc lm_sensors lock mmx mmxext mosh-hardening multilib ncurses nptl ogg openmp opus pam pclmul pcre pie png policykit popcnt pulseaudio qt5 readline rsync-verify seccomp session smp source-highlight split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification strong-security unicode verify-sig xcomposite zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc" INPUT_DEVICES="libinput evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, LC_ALL, LINGUAS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS Should I even attempt bisecting kernel? any info(links?) on how to do it on Gentoo maybe? Upstream has a few regressions and fixes in 5.12/5.13 around interaction between page_poison/debug_pagealloc/_init+_free. Probably another regression cropped up. Looking at the backtrace freelist pages are memset(0) while they should be poisoned there. Is it a sys-kernel/gentoo-sources? (In reply to Sergei Trofimovich from comment #4) > Upstream has a few regressions and fixes in 5.12/5.13 around interaction > between page_poison/debug_pagealloc/_init+_free. Probably another regression > cropped up. > > Looking at the backtrace freelist pages are memset(0) while they should be > poisoned there. > > Is it a sys-kernel/gentoo-sources? yes it is: sys-kernel/gentoo-sources-5.12.13::gentoo was built with the following: USE="-build -experimental -symlink" ABI_X86="(64)" sys-kernel/gentoo-sources-5.13.1::gentoo was built with the following: USE="-build -experimental -symlink" ABI_X86="(64)" Btw, I've tried bisecting with instructions from https://wiki.gentoo.org/wiki/Kernel_git-bisect but I'm failing at this step: root #git bisect bad v2.6.39.2 | tee -a /root/bisect.log the problem is that v5.13.1 doesn't exist as a tag only v5.13 and the 7 rc`s so I've no idea how to get v5.13.1 or v5.12.13 to show in 'git tag' or for 'git bisect bad' to see it... Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around. (In reply to Sergei Trofimovich from comment #6) > Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around. hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I might hit some btrfs corruption bug(or similar) that would mess with my data hmm... still, I'll try to find a way... maybe copy whole system (minus my data) to another drive and test on it. (In reply to bowsingbetee from comment #7) > (In reply to Sergei Trofimovich from comment #6) > > Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around. > > hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I > might hit some btrfs corruption bug(or similar) that would mess with my data > hmm... still, I'll try to find a way... maybe copy whole system (minus my > data) to another drive and test on it. I think it's only relevant if you have a swap device: https://lwn.net/Articles/848431/ . The regression was in incorrect block number calculation when writing to a swap partition on a block device. If you disable any swap partitions temporarily it should be safe. (In reply to Sergei Trofimovich from comment #4) > Upstream has a few regressions and fixes in 5.12/5.13 around interaction > between page_poison/debug_pagealloc/_init+_free. Probably another regression > cropped up. > > Looking at the backtrace freelist pages are memset(0) while they should be > poisoned there. > Where did you find that memset(0) thing? maybe it's easier for me to start from there than bisect. (In reply to Sergei Trofimovich from comment #8) > (In reply to bowsingbetee from comment #7) > > (In reply to Sergei Trofimovich from comment #6) > > > Try vanilla 5.13 and 5.12. Maybe those are easier to bisect around. > > > > hmm there's a tag v5.12-rc1-dontuse and if I do that bisect I'm worried I > > might hit some btrfs corruption bug(or similar) that would mess with my data > > hmm... still, I'll try to find a way... maybe copy whole system (minus my > > data) to another drive and test on it. > > I think it's only relevant if you have a swap device: > https://lwn.net/Articles/848431/ . The regression was in incorrect block > number calculation when writing to a swap partition on a block device. If > you disable any swap partitions temporarily it should be safe. I didn't look into what that was about when I wrote that. Thanks! Generally speaking, I will be using a test drive (as soon as I figure out how to do it) to boot that kernel, just to avoid corruption on my main one(as long as it's not going to be mounted rw), just in case there are any corruption bugs - like kernel may panic at the wrong time I don't know (there may not be any bugs, I'm just trying to be sure, because I've already lost btrfs contents twice thus far in my lifetime from OS crashes) I see the same poisoning mis-reports on vanilla linux.git on the following setup: - kernel command: page_poison=1 init_on_free=0 init_on_alloc=0 - kernel config: * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y * CONFIG_INIT_ON_FREE_DEFAULT_ON=y * CONFIG_PAGE_POISONING=y v5.12 works ok, boots as: [ 0.009691][ T0] mem auto-init: stack:off, heap alloc:off, heap free:off v5.13 warns, boots as: [ 0.009746][ T0] mem auto-init: stack:off, heap alloc:on, heap free:on I think it's a bug and initial memory initialization adheres to CONFIG_INIT_ON_FREE_DEFAULT_ON=y instead of expected CONFIG_PAGE_POISONING=y Easily reproducible in qemu. I'll bisect, but it's probably related to static key conversion. Created attachment 723247 [details, diff] reverted commit 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 in patch form (In reply to Sergei Trofimovich from comment #10) > I see the same poisoning mis-reports on vanilla linux.git on the following > setup: > > - kernel command: page_poison=1 init_on_free=0 init_on_alloc=0 > - kernel config: > * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y > * CONFIG_INIT_ON_FREE_DEFAULT_ON=y > * CONFIG_PAGE_POISONING=y > > v5.12 works ok, boots as: > [ 0.009691][ T0] mem auto-init: stack:off, heap alloc:off, heap > free:off > > v5.13 warns, boots as: > [ 0.009746][ T0] mem auto-init: stack:off, heap alloc:on, heap > free:on > > I think it's a bug and initial memory initialization adheres to > CONFIG_INIT_ON_FREE_DEFAULT_ON=y instead of expected CONFIG_PAGE_POISONING=y > > Easily reproducible in qemu. I'll bisect, but it's probably related to > static key conversion. I hadn't bisected yet, but based on what you just said I had guessed that the problematic commit is likely 51cba1ebc60df9c4ce034a9f5441169c0d0956c0 which I have tested to be so by applying its reverse on top of 5.13.1 gentoo-sources, then removing it (patch -R) to confirm the problem came back, without changing anything in .config or /proc/cmdline So, maybe your bisect will show it too, unless I've made some severe mistake that I'm not aware of, which is always possible. If that is indeed the one, could you by any chance notify upstream? I'm not familiar with how they do things (email lists and such). Thanks in advance either way and many thanks for finding this! I'll be applying this revert patch locally on my system until further notice. Bisecting warning was a bit complicate because the static key commit broke boot for my VM. Sent the report with more details to to linux-mm@ as https://lore.kernel.org/linux-mm/20210712005732.4f9bfa78@zn3/T/#u Created attachment 723640 [details, diff] 0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch Try the 0001-mm-page_alloc-fix-page_poison-1-INIT_ON_ALLOC_DEFAUL.patch. Also proposed the same patch upstream as https://lore.kernel.org/linux-mm/20210712215816.1512739-1-slyfox@gentoo.org/T/#u The patch works for me! I've applied it on top of sys-kernel/gentoo-sources-5.13.1::gentoo. [ 30.836197] mem auto-init: SLAB_POISON will take precedence over init_on_alloc/init_on_free [ 30.848943] mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_alloc [ 30.850757] mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_free [ 30.852642] mem auto-init: stack:byref_all(zero), heap alloc:off, heap free:off tested with: page_poison=1 init_on_free=0 init_on_alloc=0 slub_debug=P As an aside, if I try to get the raw[1] version from lore there are extra chars there like "=3D" and "=20" inserted compared to what's seen in [2] so, it's good to know to avoid that raw thing. eg. "That caused page_poison=3D1 / init_on_free=3D1 conflict." [1] https://lore.kernel.org/linux-mm/20210712215816.1512739-1-slyfox@gentoo.org/raw [2] https://lore.kernel.org/linux-mm/20210712005732.4f9bfa78@zn3/t/#m9696b571d816104c1d38a07ff4689c3c25bc64ba Thank you for your work! This is now in kernels >= 13.6. Thanks for reporting and thanks for the great work, slyfox! |