reference: https://bugs.gentoo.org/635172#c16 When sys-cluster/glusterfs-3.12.3 is compiled with libtirpc, glusterd (and glusterfsd) crash with segmentation fault, possible because libtirpc is incompatible with glusterfs. Reproducible: Always Steps to Reproduce: 1. USE="libtirpc" emerge -v1 '=sys-cluster/glusterfs-3.12.3' 2. /etc/init.d/glusterd restart Actual Results: glusterd crashes with a segmentation fault inside XDR procedures Expected Results: glusterd spawns and keeps running The libtirpc use flag is there because glibc-2.26 isn't bundled with RPC stuff anymore in gentoo. glusterfs-3.12.2 with libtirpc doesn't work. glusterfs-3.12.2 without libtirpc works.
dilfridge, hope you don't mind me assigning this one to you.
I received an answer from Thorsten Kuluk (maintainer of libtirpc). ---- Hi, your build environment is messed up. You compile against the header files of glibc, but link against libtirpc. This will never work. If you link against libtirpc, you have to compile against libtirpc header files, too. Thorsten On Fri, Dec 01, Erik Kai Alain Zscheile wrote: > Hello Steve Dickson and Thorsten Kukuk, upstream maintainers of libtirpc, I've noticed a failure in GlusterFS on Gentoo Linux when it's compiled against libtirpc. This leads into a segmentation fault, but I can't figure out where exactly the bug comes from. The bug appears to be in the XDR part of libtirpc and probabaly produces a null pointer dereference inside (or in something called inside) xdr_u_int64_t(xdrs, ullp) while xdrs and ullp aren't null. references: https://bugzilla.redhat.com/show_bug.cgi?id=1519315 https://bugs.gentoo.org/635172 Erik Zscheile > -- Thorsten Kukuk kukuk@thkukuk.de http://www.linux-nis.org|http://osm.thkukuk.de|http://www.thkukuk.de -------------------------------------------------------------------- Key fingerprint = A368 676B 5E1B 3E46 CFCE 2D97 F8FD 4E23 56C6 FB4B
That might be my fault then as I wrote the patch, though it's also concerning as they've just merged it upstream. Please post your config.log and build log then.
backlink: https://bugzilla.redhat.com/show_bug.cgi?id=1521004
(In reply to Erik Zscheile from comment #4) > backlink: https://bugzilla.redhat.com/show_bug.cgi?id=1521004 Err, please hold fire there, the patch might be fine. I want to see what's going on in your logs first.
workdir + logfiles: http://ezscheile.bplaced.net/glusterd-ptd-pack.tar.xz
I don't know why they think you're building against glibc's headers as I can clearly see -I/usr/include/tirpc in config.log and build.log. There are some redefinition warnings though so I'll take a closer look later. It's worth adding that even though I contributed the patch to use libtirpc upstream, it was already being used in master when some IPv6-related configure flag was enabled. I need to check whether they made other changes for this.
I think the redefinitions are there because (in libglusterfs) compat.h is included before the RPC and XDR headers.
Created attachment 508446 [details, diff] glusterfs-3.12.3-libtirpc.patch I wasn't able to reproduce your issue (does it segfault immediately?) but I have found the cause of the redefinitions and updated the patch accordingly. Please replace the existing patch, rebuild, and report back.
It doesn't segfault immediately, but if you execute: strace -f /usr/sbin/glusterd after a while it throws a segmentation fault. Note that in my configuration I have some connected gluster peers, which are disconnected. I applied the new patch and the error is still there.
Okay, I was able to reproduce the issue after starting the other peer. Luckily I still had it installed even though I'm not using it for the foreseeable future.
I'm really struggling here and I'm concerned because my upstream patches have already been merged. I wonder if this is not a problem in the latest master but running that without actually installing it is proving tricky.
I made a 9999 ebuild so that I could install from master. I was able to reproduce the problem there but when I pass --with-ipv6-default, the problem goes away, which is obviously very interesting! My recent patch should have taken account of this but evidently I'm missing something.
Created attachment 509026 [details, diff] glusterfs-3.12.3-libtirpc.patch Here's a new patch that seems to work. Trouble is I don't really understand why. If it gets past the initial crash for you then please test it more thoroughly.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=22c2b5002a396b340e12df805ca051ac5743d70d commit 22c2b5002a396b340e12df805ca051ac5743d70d Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2017-12-09 18:50:20 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2017-12-09 18:50:48 +0000 sys-cluster/glusterfs: Bump to 3.13.0 and add 9999 live ebuild I have forced --with-ipv6-default when the libtirpc flag is enabled to avoid bug #639838. This bug affects 3.12.3 too but the aforementioned flag is not available in that version. I'll try to get this issue resolved properly and then push for stabilisation. Bug: https://bugs.gentoo.org/639838 Package-Manager: Portage-2.3.17, Repoman-2.3.6 sys-cluster/glusterfs/Manifest | 1 + sys-cluster/glusterfs/glusterfs-3.13.0.ebuild | 224 ++++++++++++++++++++++++++ sys-cluster/glusterfs/glusterfs-9999.ebuild | 224 ++++++++++++++++++++++++++ 3 files changed, 449 insertions(+)}
ok, tried the new patch, segfault happens again, but with a different backtrace: #0 __GI_xdr_uint64_t (xdrs=0x7fe3880ffc10, uip=0x7fe3880ffcb0) at xdr_intXX_t.c:71 #1 0x00007fe39111ba71 in xdr_gf_dump_rsp (xdrs=0x7fe3880ffc10, objp=0x7fe3880ffcb0) at rpc-common-xdr.c:203 #2 0x00007fe391344aa3 in xdr_sizeof () from /lib64/libtirpc.so.3 #3 0x00007fe39156270b in rpcsvc_dump (req=0x7fe37c002cb0) at rpcsvc.c:2088 #4 0x00007fe391562f72 in rpcsvc_handle_rpc_call (svc=0x1fc6100, trans=trans@entry=0x7fe37c0019f0, msg=msg@entry=0x7fe37c002b60) at rpcsvc.c:711 #5 0x00007fe3915631d6 in rpcsvc_notify (trans=0x7fe37c0019f0, mydata=<optimized out>, event=<optimized out>, data=0x7fe37c002b60) at rpcsvc.c:805 #6 0x00007fe391565143 in rpc_transport_notify (this=this@entry=0x7fe37c0019f0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=<optimized out>) at rpc-transport.c:538 #7 0x00007fe38858e382 in socket_event_poll_in (this=this@entry=0x7fe37c0019f0, notify_handled=_gf_true) at socket.c:2315 #8 0x00007fe38858e557 in socket_event_handler (fd=fd@entry=6, idx=idx@entry=2, gen=gen@entry=1, data=data@entry=0x7fe37c0019f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467 #9 0x00007fe3917f73da in event_dispatch_epoll_handler (event=0x7fe388101e7c, event_pool=0x1fb7770) at event-epoll.c:583 #10 event_dispatch_epoll_worker (data=0x2037300) at event-epoll.c:659 #11 0x00007fe390cec839 in start_thread (arg=0x7fe388102700) at pthread_create.c:456 #12 0x00007fe390a2aadf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
As I suspected, it may be necessary to enable more of the --with-ipv6-default code. I have pushed 3.13.0 to the tree, which happens to include that flag, so I have enabled it with the libtirpc USE flag. Please try this version and report back.
glusterfs version 3.13.0 from the tree works.
It looks like this will get fixed properly in the next release.
I have to put my 5 cents against +libtirpc USE flag that enables --with-ipv6-default. It should be named ipv6 or (ipv6-default that depends on ipv6). My system has ipv4 only (no support in kernel for v6, USE="-ipv6" in make.conf). Today I've upgraded gluster from 3.12.x to 3.13.0 and it stopped working with: [2018-02-28 21:29:05.628887] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host1...] [2018-02-28 21:29:05.629014] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host2...] [2018-02-28 21:29:08.629373] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host1...] [2018-02-28 21:29:08.629516] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host2...] [2018-02-28 21:29:11.629874] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host1...] [2018-02-28 21:29:11.630020] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host2...] [2018-02-28 21:29:14.630366] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host1...] [2018-02-28 21:29:14.630514] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host [...host2...] and so on... glusterd didn't start glusterfs and glusterfsd processes. After emerge without libtirpc everything is fine.
(In reply to manwe from comment #20) > I have to put my 5 cents against +libtirpc USE flag that enables > --with-ipv6-default. It should be named ipv6 or (ipv6-default that depends > on ipv6). > > My system has ipv4 only (no support in kernel for v6, USE="-ipv6" in > make.conf). Today I've upgraded gluster from 3.12.x to 3.13.0 and it stopped > working with: > > and so on... glusterd didn't start glusterfs and glusterfsd processes. After > emerge without libtirpc everything is fine. Fair enough but at the same time, that would crash horribly with systems that do enable IPv6. I could try and patch things further but the real fix is just around the corner so let's just sit tight for now. They release often so I expect the next one soon.
OK, so let's keep it turned on by default, but change the name from libtirpc that sais nothing, to something like ipv6-default. This way people like me, without ipv6, can spot it during -uDN @world and disable.
4.0.0 has just been released and that should deal with all the libtirpc issues properly so I've pushed it up. I'll close this when it goes stable.
I suddenly remembered I needed to add an ipv6 USE flag to control --with-ipv6-default but when I tried it, I found that configure.ac has mixed the clauses up: AC_ARG_WITH([ipv6-default], AC_HELP_STRING([--with-ipv6-default], [Set IPv6 as default.]), [with_ipv6_default=$with_libtirpc], [with_ipv6_default=no]) This should probably be: AC_ARG_WITH([ipv6-default], AC_HELP_STRING([--with-ipv6-default], [Set IPv6 as default.]), [], [with_ipv6_default=$with_libtirpc]) I'll ask upstream tomorrow.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c4217a516c1e8dc5b04400198f9b0e20a37d0bd0 commit c4217a516c1e8dc5b04400198f9b0e20a37d0bd0 Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2018-03-09 23:33:30 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2018-03-09 23:34:49 +0000 sys-cluster/glusterfs: Add ipv6 USE flag to control ipv6-default This is important because ipv6-default breaks Gluster for systems that have IPv6 disabled. A couple of patches were required because --without-ipv6-default is broken and the configure summary is sometimes misleading. Bug: https://bugs.gentoo.org/639838 Package-Manager: Portage-2.3.24, Repoman-2.3.6 .../files/glusterfs-TIRPC-config-summary.patch | 48 ++++++++++++++++++++++ .../files/glusterfs-without-ipv6-default.patch | 38 +++++++++++++++++ ...erfs-4.0.0.ebuild => glusterfs-4.0.0-r1.ebuild} | 9 ++-- sys-cluster/glusterfs/glusterfs-9999.ebuild | 9 ++-- sys-cluster/glusterfs/metadata.xml | 1 + 5 files changed, 99 insertions(+), 6 deletions(-)}
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=726354281e331b53c25f336cf5b8b6a4b8e9beba commit 726354281e331b53c25f336cf5b8b6a4b8e9beba Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2018-05-26 11:06:51 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2018-05-26 11:07:30 +0000 sys-cluster/glusterfs: Drop old 3.12.3 Closes: https://bugs.gentoo.org/639838 Package-Manager: Portage-2.3.40, Repoman-2.3.9 sys-cluster/glusterfs/Manifest | 1 - .../files/glusterfs-3.12.3-libtirpc.patch | 45 ----- sys-cluster/glusterfs/glusterfs-3.12.3.ebuild | 218 --------------------- 3 files changed, 264 deletions(-)