An OCFS2 cluster host can no longer mount the volumes after updating to sys-kernel/gentoo-sources-6.1.46. The cluster consists of 8 nodes and works without problems using sys-kernel/gentoo-sources-6.1.41 Installing kernel/gentoo-sources-6.1.46 on a cluster host causes the host to hang on mount. (reboot) The following information can be found on another cluster host in dmesg. [Aug23 05:41] o2dlm: Node 1 leaves domain B67B32A59E50433898EF51028871D7CC [ +0.000010] ( [ +0.000004] 2 [ +0.000003] 3 [ +0.000002] 4 [ +0.000002] 5 [ +0.000002] 6 [ +0.000002] 7 [ +0.000002] 8 [ +0.000002] ) 7 nodes [ +3.728999] o2dlm: Node 1 leaves domain DD82513CF0704040B22E5AD55D9A22AA [ +0.000011] ( [ +0.000003] 2 [ +0.000003] 3 [ +0.000002] 4 [ +0.000002] 5 [ +0.000003] 6 [ +0.000002] 7 [ +0.000002] 8 [ +0.000002] ) 7 nodes [ +0.012942] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 8 [ +0.000273] o2net: No longer connected to node Buildhost (num 1) at 192.168.1.72:7777 [Aug23 05:42] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.120033] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.129934] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.110866] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +2.878667] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration. [ +0.240094] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +4.970060] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [Aug23 05:43] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.119771] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.120053] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.119861] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.119951] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.119838] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.119927] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.039631] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration. [ +0.080119] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +5.120018] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 [ +3.129939] o2net: Connection to node Buildhost (num 1) at 192.168.1.72:7777 shutdown, state 7 ...... Downgrading to sys-kernel/gentoo-sources-6.1.41 solves the problem.
I don't immediately see any upstream reports yet. Could you bisect (https://wiki.gentoo.org/wiki/Kernel_git-bisect) and report it upstream to the ML, and link it here?
(In reply to Sam James from comment #1) > I don't immediately see any upstream reports yet. > > Could you bisect (https://wiki.gentoo.org/wiki/Kernel_git-bisect) and report > it upstream to the ML, and link it here? I have to study that first. I've never done that. But what I know now sys-kernel/gentoo-sources-6.1.45 has the same error. A comparison of the sources /usr/src/linux-6.1.41-gentoo/fs/ocfs2/ with /usr/src/linux-6.1.46-gentoo/fs/ocfs2/ made no difference.
Created attachment 868522 [details] git bisect log commit ace0efeb56f4275148c37fc8b2699fddf29795dc Author: Ruihong Luo <colorsu1922@gmail.com> Date: Thu Jul 13 08:42:36 2023 +0800 serial: 8250_dw: Preserve original value of DLF register commit 748c5ea8b8796ae8ee80b8d3a3d940570b588d59 upstream. Preserve the original value of the Divisor Latch Fraction (DLF) register. When the DLF register is modified without preservation, it can disrupt the baudrate settings established by firmware or bootloader, leading to data corruption and the generation of unreadable or distorted characters. Fixes: 701c5e73b296 ("serial: 8250_dw: add fractional divisor support") Cc: stable <stable@kernel.org> Signed-off-by: Ruihong Luo <colorsu1922@gmail.com> Link: https://lore.kernel.org/stable/20230713004235.35904-1-colorsu1922%40gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230713004235.35904-1-colorsu1922@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/tty/serial/8250/8250_dwlib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
There are a bunch of new patches staged for later inclusion, but I do not know if these would help. https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git/log/drivers/tty/serial/8250?h=tty-next You can report this upstream by including this bug's information, the symptoms, the result of the bisect to the linux-serial mailing list: 8250/16?50 (AND CLONE UARTS) SERIAL DRIVER M: Greg Kroah-Hartman <gregkh@linuxfoundation.org> L: linux-serial@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git F: drivers/tty/serial/8250* F: include/linux/serial_8250.h
Once you've reported this upstream, please comment back here with the relevant links.