When migrating from sys-kernel/gentoo-sources-6.7.0 to sys-kernel/gentoo-sources-6.7.3 the machine freezes when mounting an OCFS2 drive. I did a bisect but the result was suspect in my opinion. Because I didn't trust the result, I repeated the entire bisect, but the result was the same again. These are the results: dmesg from the machine that wants to mount the OCFS2 drive. Feb 4 14:36] ocfs2: Registered cluster interface o2cb [Feb 4 14:37] rcu: INFO: rcu_preempt self-detected stall on CPU [ +0.000006] rcu: 13-....: (12 GPs behind) idle=d54c/1/0x4000000000000000 softirq=725/725 fqs=7490 [ +0.000008] rcu: (t=15000 jiffies g=3701 q=1515 ncpus=24) [ +0.000004] CPU: 13 PID: 2501 Comm: kworker/u48:2 Not tainted 6.7.1+ #1 [ +0.000003] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ +0.000002] Workqueue: o2net o2net_accept_many [ocfs2_nodemanager] [ +0.000021] RIP: 0010:queued_spin_lock_slowpath+0x12/0x192 [ +0.000007] Code: 02 00 00 e9 da 31 66 ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 fa 66 90 b9 01 00 00 00 8b 02 85 c0 74 04 f3 90 <eb> f6 f0 0f b1 0a 85 c0 75 ee c3 cc cc cc cc 81 fe 00 01 00 00 75 [ +0.000003] RSP: 0018:ffffc90000d33d68 EFLAGS: 00000202 [ +0.000002] RAX: 0000000000000001 RBX: ffff888102d79180 RCX: 0000000000000001 [ +0.000002] RDX: ffff888102d79218 RSI: 0000000000000001 RDI: ffff888102d79218 [ +0.000001] RBP: ffff888102d79218 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888102d79180 [ +0.000002] R13: ffffc90000d33dec R14: ffffffffa0585f08 R15: ffff8881033d5900 dmesg from a machine that has mounted the OCFS2 drive. [Feb 4 14:36] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071847] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071904] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071983] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071801] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071954] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +0.256005] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration. [ +2.816065] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071867] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071909] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.071839] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ Bisect result: status: waiting for both good and bad commits status: waiting for bad commit, 1 good commit known Bisecting: 508 revisions left to test after this (roughly 9 steps) [3437e35fe8a6a937b96da90716c0c538e15be0f7] serial: sc16is7xx: set safe default SPI clock frequency Bisecting: 253 revisions left to test after this (roughly 8 steps) [3e2680bd68fab7ba145393e4eb069d6c4d8a30fa] Bluetooth: btmtkuart: fix recv_buf() return value Bisecting: 126 revisions left to test after this (roughly 7 steps) [06a7919489f2e310f6ba1ccd8c62e5b19189b6e0] bpf, lpm: Fix check prefixlen before walking trie Bisecting: 63 revisions left to test after this (roughly 6 steps) [4d08e6c01cf2d853971d68700785646045a5afd6] ACPI: LPSS: Fix the fractional clock divider flags Bisecting: 31 revisions left to test after this (roughly 5 steps) [7fea58e8e647ffcc70dfa1ea53c325d7e52c893b] crypto: hisilicon/sec2 - save capability registers in probe process Bisecting: 15 revisions left to test after this (roughly 4 steps) [0055ff3ee060002c97cbbaf77ceead8c11d0b984] crypto: ccp - fix memleak in ccp_init_dm_workarea Bisecting: 7 revisions left to test after this (roughly 3 steps) [7f25cd51aa71711438a6c11fa638ba5aba61ed95] selinux: Fix error priority for bind with AF_UNSPEC on PF_INET6 socket Bisecting: 3 revisions left to test after this (roughly 2 steps) [568a410e89f0ae73f2663bd1a48f981d3377aac9] kunit: debugfs: Handle errors from alloc_string_stream() Bisecting: 1 revision left to test after this (roughly 1 step) [02871710b93058eb1249d5847c0b2d1c2c3c98ae] thermal: core: Fix NULL pointer dereference in zone registration error path Bisecting: 0 revisions left to test after this (roughly 0 steps) [f3bd89340eab3ff2740a499384977f6562b9a53f] ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error 02871710b93058eb1249d5847c0b2d1c2c3c98ae is the first bad commit commit 02871710b93058eb1249d5847c0b2d1c2c3c98ae Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Thu Dec 14 11:52:25 2023 +0100 thermal: core: Fix NULL pointer dereference in zone registration error path [ Upstream commit 04e6ccfc93c5a1aa1d75a537cf27e418895e20ea ] If device_register() in thermal_zone_device_register_with_trips() returns an error, the tz variable is set to NULL and subsequently dereferenced in kfree(tz->tzp). Commit adc8749b150c ("thermal/drivers/core: Use put_device() if device_register() fails") added the tz = NULL assignment in question to avoid a possible double-free after dropping the reference to the zone device. However, after commit 4649620d9404 ("thermal: core: Make thermal_zone_device_unregister() return after freeing the zone"), that assignment has become redundant, because dropping the reference to the zone device does not cause the zone object to be freed any more. Drop it to address the NULL pointer dereference. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drivers/thermal/thermal_core.c | 1 - 1 file changed, 1 deletion(-) git bisect log git bisect start # status: waiting for both good and bad commits # good: [0dd3ee31125508cd67f7e7172247f05b7fd1753a] Linux 6.7 git bisect good 0dd3ee31125508cd67f7e7172247f05b7fd1753a # status: waiting for bad commit, 1 good commit known # bad: [01e08e5d7656e660c8a4852191e1e133cbdb0a66] Linux 6.7.3 git bisect bad 01e08e5d7656e660c8a4852191e1e133cbdb0a66 # bad: [3437e35fe8a6a937b96da90716c0c538e15be0f7] serial: sc16is7xx: set safe default SPI clock frequency git bisect bad 3437e35fe8a6a937b96da90716c0c538e15be0f7 # bad: [3e2680bd68fab7ba145393e4eb069d6c4d8a30fa] Bluetooth: btmtkuart: fix recv_buf() return value git bisect bad 3e2680bd68fab7ba145393e4eb069d6c4d8a30fa # bad: [06a7919489f2e310f6ba1ccd8c62e5b19189b6e0] bpf, lpm: Fix check prefixlen before walking trie git bisect bad 06a7919489f2e310f6ba1ccd8c62e5b19189b6e0 # good: [4d08e6c01cf2d853971d68700785646045a5afd6] ACPI: LPSS: Fix the fractional clock divider flags git bisect good 4d08e6c01cf2d853971d68700785646045a5afd6 # bad: [7fea58e8e647ffcc70dfa1ea53c325d7e52c893b] crypto: hisilicon/sec2 - save capability registers in probe process git bisect bad 7fea58e8e647ffcc70dfa1ea53c325d7e52c893b # bad: [0055ff3ee060002c97cbbaf77ceead8c11d0b984] crypto: ccp - fix memleak in ccp_init_dm_workarea git bisect bad 0055ff3ee060002c97cbbaf77ceead8c11d0b984 # bad: [7f25cd51aa71711438a6c11fa638ba5aba61ed95] selinux: Fix error priority for bind with AF_UNSPEC on PF_INET6 socket git bisect bad 7f25cd51aa71711438a6c11fa638ba5aba61ed95 # bad: [568a410e89f0ae73f2663bd1a48f981d3377aac9] kunit: debugfs: Handle errors from alloc_string_stream() git bisect bad 568a410e89f0ae73f2663bd1a48f981d3377aac9 # bad: [02871710b93058eb1249d5847c0b2d1c2c3c98ae] thermal: core: Fix NULL pointer dereference in zone registration error path git bisect bad 02871710b93058eb1249d5847c0b2d1c2c3c98ae # good: [f3bd89340eab3ff2740a499384977f6562b9a53f] ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error git bisect good f3bd89340eab3ff2740a499384977f6562b9a53f # first bad commit: [02871710b93058eb1249d5847c0b2d1c2c3c98ae] thermal: core: Fix NULL pointer dereference in zone registration error path
sys-kernel/gentoo-sources-6.7.6 Mounting an OCFS2 file system still causes the entire machine to crash. dmesg from the machine that wants to mount the OCFS2 drive. +0.046710] OCFS2 User DLM kernel interface loaded [ +0.001107] FAT-fs (vda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ +0.400618] elogind-daemon[1557]: New seat seat0. [ +0.000473] elogind-daemon[1557]: Watching system buttons on /dev/input/event3 (Power Button) [ +0.000062] elogind-daemon[1557]: Watching system buttons on /dev/input/event0 (AT Translated Set 2 keyboard) [Feb29 16:48] rcu: INFO: rcu_preempt self-detected stall on CPU [ +0.000004] rcu: 3-....: (6315 ticks this GP) idle=3c24/1/0x4000000000000000 softirq=2798/2799 fqs=2982 [ +0.000004] rcu: (t=6000 jiffies g=309 q=57 ncpus=4) [ +0.000002] Sending NMI from CPU 3 to CPUs 1: [ +0.000009] NMI backtrace for cpu 1 [ +0.000002] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.6-gentoo #3 [ +0.000002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 4.2023.08-4 02/15/2024 [ +0.000001] RIP: 0010:__local_bh_enable_ip+0x1b/0x66 [ +0.000013] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 8b 05 a0 ed f9 7e a9 00 00 0f 00 74 02 0f 0b ff ce f7 de 65 01 35 8c ed f9 7e <65> 8b 05 85 ed f9 7e a9 00 ff ff 00 75 1e 48 c7 c7 99 5d f9 81 e8 [ +0.000001] RSP: 0018:ffffc900000e8aa0 EFLAGS: 00000207 [ +0.000002] RAX: 0000000000000303 RBX: ffff888008ee1200 RCX: 000000004d5e522e [ +0.000001] RDX: 0000000000000103 RSI: 00000000fffffe00 RDI: ffffffffa05000eb [ +0.000001] RBP: ffffffffa0500090 R08: 0000000000000430 R09: ffff8880080c1c4e [ +0.000000] R10: 0000000041042000 R11: 00000000000001d9 R12: ffff888008ee1418 [ +0.000001] R13: 000000004d5e522e R14: 0000000000000020 R15: 0000000000000020 [ +0.000001] FS: 0000000000000000(0000) GS:ffff88807de80000(0000) knlGS:0000000000000000 [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 0000557c93ce65f0 CR3: 000000000d684000 CR4: 00000000000006f0 [ +0.000003] Call Trace: [ +0.000004] <NMI> [ +0.000001] ? nmi_cpu_backtrace+0xa5/0xd7 [ +0.000004] ? __local_bh_enable_ip+0x1b/0x66 [ +0.000002] ? nmi_cpu_backtrace_handler+0x8/0x10 [ +0.000004] ? nmi_handle+0x52/0x132 [ +0.000001] ? __local_bh_enable_ip+0x1b/0x66 [ +0.000002] ? default_do_nmi+0x66/0x279 [ +0.000003] ? exc_nmi+0xc4/0x13d [ +0.000001] ? end_repeat_nmi+0xf/0x60 [ +0.000005] ? __pfx_o2net_listen_data_ready+0x10/0x10 [ocfs2_nodemanager] [ +0.000007] ? o2net_listen_data_ready+0x5b/0x78 [ocfs2_nodemanager] [ +0.000006] ? __local_bh_enable_ip+0x1b/0x66 [ +0.000002] ? __local_bh_enable_ip+0x1b/0x66 [ +0.000002] ? __local_bh_enable_ip+0x1b/0x66 [ +0.000002] </NMI> [ +0.000000] <IRQ> [ +0.000001] o2net_listen_data_ready+0x5b/0x78 [ocfs2_nodemanager] [ +0.000006] tcp_data_queue+0x3fc/0x8e2 [ +0.000003] tcp_rcv_established+0x33e/0x43f [ +0.000001] tcp_v4_do_rcv+0xb8/0x197 [ +0.000003] tcp_v4_rcv+0x757/0xa1a [ +0.000002] ? raw_local_deliver+0x1a0/0x1cb [ +0.000003] ip_protocol_deliver_rcu+0x97/0x15f [ +0.000002] ip_local_deliver_finish+0x81/0x8f [ +0.000002] ip_sublist_rcv_finish+0x28/0x38 [ +0.000002] ip_sublist_rcv+0x162/0x18a [ +0.000003] ip_list_rcv+0xe7/0x10f [ +0.000002] __netif_receive_skb_list_core+0xf1/0x119 [ +0.000002] netif_receive_skb_list_internal+0x1e7/0x21e [ +0.000002] gro_normal_list+0x1d/0x3f [ +0.000001] napi_complete_done+0x76/0x11b [ +0.000003] virtnet_poll+0x272/0x38f [virtio_net] [ +0.000005] __napi_poll.constprop.0+0x26/0x119 [ +0.000002] net_rx_action+0x110/0x20c [ +0.000001] __do_softirq+0x122/0x290 [ +0.000004] common_interrupt+0x9b/0xc1 [ +0.000002] </IRQ> [ +0.000000] <TASK> [ +0.000001] asm_common_interrupt+0x22/0x40 [ +0.000002] RIP: 0010:pv_native_safe_halt+0x13/0x18 [ +0.000003] Code: 90 90 90 90 90 0f 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 8b 05 55 52 06 01 85 c0 7e 07 0f 00 2d 34 9c 26 00 fb f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 [ +0.000001] RSP: 0018:ffffc900000a3ee8 EFLAGS: 00000246 [ +0.000001] RAX: 0000000000000000 RBX: ffff888001a2e040 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000000 RSI: ffffffff81f87bee RDI: 000000000009ca64 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ +0.000000] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ +0.000002] default_idle+0x5/0x11 [ +0.000001] default_idle_call+0x30/0x4f [ +0.000002] do_idle+0xb9/0x19d [ +0.000003] cpu_startup_entry+0x25/0x27 [ +0.000002] start_secondary+0xf5/0xf5 [ +0.000002] secondary_startup_64_no_verify+0x178/0x17b [ +0.000003] </TASK> [ +0.000842] CPU: 3 PID: 319 Comm: kworker/u8:3 Not tainted 6.7.6-gentoo #3 [ +0.000008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 4.2023.08-4 02/15/2024 [ +0.000002] Workqueue: o2net o2net_accept_many [ocfs2_nodemanager] [ +0.000009] RIP: 0010:queued_spin_lock_slowpath+0x3d/0x27c [ +0.000004] Code: 48 89 fb 85 c0 7f 1b 81 fe 00 01 00 00 75 2d b8 01 02 00 00 eb 1c f0 0f b1 13 85 c0 0f 84 3a 02 00 00 8b 03 85 c0 74 ee f3 90 <eb> f6 ff c8 74 5b f3 90 8b 33 81 fe 00 01 00 00 74 f0 81 fe ff 00 [ +0.000001] RSP: 0018:ffffc9000037fd40 EFLAGS: 00000202 [ +0.000002] RAX: 0000000000000001 RBX: ffff888008ee1298 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888008ee1298 [ +0.000001] RBP: ffff888008ee1298 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888004ef2270 [ +0.000001] R13: ffffc9000037fdec R14: ffffffffa0508f88 R15: ffff88800bf00dc0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff88807df80000(0000) knlGS:0000000000000000 [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007fff5175af04 CR3: 00000000074a8000 CR4: 00000000000006f0 [ +0.000004] Call Trace: [ +0.000001] <IRQ> [ +0.000001] ? rcu_dump_cpu_stacks+0x8f/0xb3 [ +0.000003] ? rcu_sched_clock_irq+0x345/0x91b [ +0.000003] ? update_load_avg+0x383/0x3ac [ +0.000003] ? kthread_data+0x5/0xe [ +0.000003] ? wq_worker_tick+0x9/0xa3 [ +0.000003] ? __pfx_tick_nohz_highres_handler+0x10/0x10 [ +0.000003] ? update_process_times+0x4d/0x6c [ +0.000003] ? tick_nohz_highres_handler+0x90/0xbe [ +0.000002] ? __hrtimer_run_queues+0xec/0x197 [ +0.000003] ? hrtimer_interrupt+0x97/0x169 [ +0.000001] ? __sysvec_apic_timer_interrupt+0xc6/0x136 [ +0.000002] ? sysvec_apic_timer_interrupt+0x80/0xa6 [ +0.000002] </IRQ> [ +0.000001] <TASK> [ +0.000001] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ +0.000003] ? queued_spin_lock_slowpath+0x3d/0x27c [ +0.000003] lock_sock_nested+0x19/0x3e [ +0.000003] inet_csk_accept+0x1fb/0x2ad [ +0.000002] ? inode_init_always+0x174/0x1cc [ +0.000003] ? preempt_latency_start+0x2b/0x46 [ +0.000002] inet_accept+0x41/0x91 [ +0.000003] o2net_accept_many+0xac/0x398 [ocfs2_nodemanager] [ +0.000007] ? __schedule+0x6a8/0x6f3 [ +0.000002] process_scheduled_works+0x199/0x29d [ +0.000003] worker_thread+0x1c1/0x21b [ +0.000002] ? __pfx_worker_thread+0x10/0x10 [ +0.000003] kthread+0xf2/0xfa [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork+0x1f/0x31 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork_asm+0x1b/0x30 [ +0.000003] </TASK> dmesg from a machine that has mounted the OCFS2 drive. [Feb29 16:47] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration. [ +3.072093] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.071958] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [Feb29 16:48] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072012] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072009] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072063] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072016] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072068] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.072017] o2net: Connection to node Gentoo-OCFS (num 1) at 192.168.0.14:7777 shutdown, state 7 [ +3.071967] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration. ...... It would be greatly appreciated if someone would address the issue.
Please bring this upstream at https://bugzilla.kernel.org
Thanks, we'll watch the upstream bug and backport and fixes identified.