After upgrading from iproute2-4.4.0 to iproute2-4.14.1 (both -r2 and -r4), my MariaDB galera gluster started to act weird, it was starting a bit longer than usual. The kernel log had the following entries: Feb 15 08:46:38 db3a kernel: [6668513.173873] ss[18846]: segfault at 0 ip 000055ece8827e48 sp 00007fffb69b9160 error 4 in ss[55ece881c000+1c000] Feb 15 08:46:41 db3a kernel: [6668516.692326] show_signal_msg: 16 callbacks suppressed Feb 15 08:46:41 db3a kernel: [6668516.692335] ss[19047]: segfault at 0 ip 00005636dda9fe48 sp 00007ffe065c6070 error 4 in ss[5636dda94000+1c000] Feb 15 08:46:41 db3a kernel: [6668516.899094] ss[19061]: segfault at 0 ip 0000560019d09e48 sp 00007ffd6e6abd60 error 4 in ss[560019cfe000+1c000] Feb 15 08:46:42 db3a kernel: [6668517.103536] ss[19070]: segfault at 0 ip 00005560c544be48 sp 00007ffec55c6690 error 4 in ss[5560c5440000+1c000] Feb 15 08:46:42 db3a kernel: [6668517.308507] ss[19079]: segfault at 0 ip 000055ee89eb5e48 sp 00007ffce4549200 error 4 in ss[55ee89eaa000+1c000] Feb 15 08:46:42 db3a kernel: [6668517.516238] ss[19085]: segfault at 0 ip 0000562d0efbae48 sp 00007ffdce303ca0 error 4 in ss[562d0efaf000+1c000] Feb 15 08:46:42 db3a kernel: [6668517.721847] ss[19107]: segfault at 0 ip 0000560ce68a2e48 sp 00007ffcee6aa8c0 error 4 in ss[560ce6897000+1c000] Feb 15 08:46:42 db3a kernel: [6668517.926715] ss[19110]: segfault at 0 ip 00005564af34ae48 sp 00007fff40df18f0 error 4 in ss[5564af33f000+1c000] Feb 15 08:46:43 db3a kernel: [6668518.133456] ss[19122]: segfault at 0 ip 00005639d1354e48 sp 00007ffe1c568700 error 4 in ss[5639d1349000+1c000] Feb 15 08:46:43 db3a kernel: [6668518.339973] ss[19137]: segfault at 0 ip 000055b9956f9e48 sp 00007ffc57cf6ee0 error 4 in ss[55b9956ee000+1c000] Feb 15 08:46:43 db3a kernel: [6668518.548122] ss[19149]: segfault at 0 ip 000055eb059c7e48 sp 00007ffcc5172560 error 4 in ss[55eb059bc000+1c000] Feb 15 08:46:47 db3a kernel: [6668522.057396] show_signal_msg: 16 callbacks suppressed Feb 15 08:46:47 db3a kernel: [6668522.057404] ss[19396]: segfault at 0 ip 000055d88c8bbe48 sp 00007fff412fef50 error 4 in ss[55d88c8b0000+1c000] Either reverting to version 4.4.0 or upgrading to 4.15.0 solved the problem. There are a few ss crashes mentioned in the 4.15.0 changelog (https://lkml.org/lkml/2018/1/29/357). I know this is not a mariadb bug, but just letting you know if anybody runs into the same problem.
Maybe you are able to compile with debug symbols and provide a core dump/backtrace?
With this upstream patch it does not segfault (it's in the 4.15.0 release): commit ebbb219c924ccedbc59e209d40b77d5dbeecd7cd Author: Antonio Quartulli <a@unstable.cc> Date: Sun Jan 7 02:31:50 2018 +0800 ss: fix NULL pointer access when parsing unix sockets with oldformat When parsing and printing the unix sockets in unix_show(), if the oldformat is detected, the peer_name member of the sockstat object is left uninitialized (NULL). For this reason, if a filter has been specified on the command line, a strcmp() will crash when trying to access it. Avoid crash by checking that peer_name is not NULL before passing it to strcmp(). Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> diff --git a/misc/ss.c b/misc/ss.c index b35859dc..29a25070 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3711,7 +3711,10 @@ static int unix_show(struct filter *f) }; memcpy(st.local.data, &u->name, sizeof(u->name)); - if (strcmp(u->peer_name, "*")) + /* when parsing the old format rport is set to 0 and + * therefore peer_name remains NULL + */ + if (u->peer_name && strcmp(u->peer_name, "*")) memcpy(st.remote.data, &u->peer_name, sizeof(u->peer_name)); if (run_ssfilter(f->f, &st) == 0) {
Closing this one: Patch is in >=iproute2-4.15 and oldest version in repository is 4.19.