Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 650030 - net-nds/rpcbind-0.2.4-r1 systemd unit dependencies cause deadlock at boot with glibc-2.26 and libnss-nis
Summary: net-nds/rpcbind-0.2.4-r1 systemd unit dependencies cause deadlock at boot wit...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-09 16:13 UTC by Joe Harvell
Modified: 2018-03-12 17:29 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
output of emerge --info (einfo.txt,6.67 KB, text/plain)
2018-03-09 16:13 UTC, Joe Harvell
Details
stack trace of systemd-networkd while problem is manifesting (gdb.networkd.txt,8.54 KB, text/plain)
2018-03-09 16:21 UTC, Joe Harvell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Harvell 2018-03-09 16:13:10 UTC
systemd-networkd blocks forever at boot in initgroups(3) because rpcbind.socket is allowed to start before network.target but rpcbind.service is not allowed to start until after network.target.  deadlock.  I was also able to reproduce this at boot time in a systemd debug_shell with a simple program that only calls initgroups().

Here is a graphical depection of the relevant before/after dependencies.  (a -> b means when both unit a and unit b are in a transaction, unit b is not activated until after unit a is activated).

rpcbind.socket -> rpcbind.service
systemd.networkd.service -> network.target -> rpcbind.service

Here is what is happening in order at boot:

1. systemd activates rpcbind.socket.  This means systemd itself is binding a socket to (among others) TCP port 111.  At this point systemd has performed a listen(2) on the socket.  Not sure if it also performs an accept(2).  But even if it has, it is not performing a read on the resulting connection socket.

2. systemd activates systemd-networkd.service, which starts process systemd-networkd.  Note that systemd-networkd is a notify service

3. The systemd-networkd process calls initgroups("systemd-network", 982).  This ends up trying to use NIS to get the group names using code in libnss-nis.  The logic in this code successfully connects to the TCP server socket and sends a message, waiting for a response, which never comes even though the socket is connected and will never get disconnected.

The way this is supposed to work is that when systemd is allowed to activate rpcbind.service, gets the connected socket passed to it and starts reading from the socket and responding to requests.  However, this will never happen because systemd-networkd is apparently blocked in initgroups() before it considers itself to be activated.  So it will never notify activation and systemd will never activate network.target, and therefore will never activate rpcbind.service.

I know this is what's happening by using gdb to see the stack traces of systemd-networkd while in this state, and also b stepping through my initgroups program.  I can also see during this time using netstat that the receive queue of the connected socket on the server side is nonzero, indicating that systemd is not reading from the socket.

I was able to work around this bug by removing the ( network.target -> rpcbind.service ) dependency as shown below:

harvell@wolfhound system$ diff -u /lib/systemd/system/rpcbind.service /etc/systemd/system/rpcbind.service 
--- /lib/systemd/system/rpcbind.service 2018-03-05 11:31:17.369211800 -0700
+++ /etc/systemd/system/rpcbind.service 2018-03-08 19:37:50.341803695 -0700
@@ -1,6 +1,6 @@
 [Unit]
 Description=RPC Bind
-After=network.target
+#After=network.target
 Wants=rpcbind.target
 Before=rpcbind.target

I think the correct solution to this problem is to for net-nds/rpcbind to make the same change in /lib/systemd/system/rpcbind.service
Comment 1 Joe Harvell 2018-03-09 16:13:48 UTC
Created attachment 523098 [details]
output of emerge --info
Comment 2 Joe Harvell 2018-03-09 16:16:31 UTC
jharvell@wolfhound system$ eix net-nds/rpcbind
[I] net-nds/rpcbind
     Available versions:  0.2.4-r1 **9999 {debug selinux systemd tcpd warmstarts}
     Installed versions:  0.2.4-r1(11:31:18 05/03/2018)(systemd tcpd -debug -selinux -warmstarts)
     Homepage:            https://sourceforge.net/projects/rpcbind/
     Description:         portmap replacement which supports RPC over various protocols

jharvell@wolfhound system$ eix sys-libs/glibc
[I] sys-libs/glibc
     Available versions:  (2.2) [M]2.17^s[1] [M](~)2.18-r1^s [M](~)2.18-r1^s[1] [M]2.19-r1^s [M]2.19-r1^s[1] [M]**2.19-r2^s [M]2.20-r2^s [M]2.20-r2^s[1] [M]2.21-r2^s [M]2.21-r2^s[1] [M]2.22-r4^s [M]2.22-r4^s[1] [M]2.23-r3^s[1] [M]2.23-r4^s [M]2.23-r4^s[1] [M](~)2.24-r3^s[1] [M](~)2.24-r4^s [M]**2.25-r2^s[1] 2.25-r9^s 2.25-r10^s **2.25-r11^s (~)2.26-r5^s (~)2.26-r6^s **2.27-r1^s **9999^s **9999^s[1]
       {audit caps compile-locales crosscompile_opts_headers-only debug doc gd hardened headers-only multilib nscd profile +rpc selinux suid systemtap vanilla}
     Installed versions:  2.26-r6(2.2)^s(19:30:32 08/03/2018)(caps multilib suid -audit -debug -doc -gd -hardened -headers-only -nscd -profile -selinux -systemtap -vanilla)
     Homepage:            https://www.gnu.org/software/libc/libc.html
     Description:         GNU libc6 (also called glibc2) C library

[1] "wolfhound" /opt/portage

jharvell@wolfhound system$ eix sys-auth/libnss-nis
[I] sys-auth/libnss-nis
     Available versions:  (~)1.4 {ABI_MIPS="n32 n64 o32" ABI_PPC="32 64" ABI_S390="32 64" ABI_X86="32 64 x32"}
     Installed versions:  1.4(19:34:42 08/03/2018)(ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32")
     Homepage:            https://github.com/thkukuk/libnss_nis
     Description:         NSS module to provide NIS support
Comment 3 Joe Harvell 2018-03-09 16:18:00 UTC
Contents of nsswitch.conf.  Note the bug exists regardless of whether I use the uncommented or commented line for groups.

jharvell@wolfhound system$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf:
# $Header: /var/cvsroot/gentoo/src/patchsets/glibc/extra/etc/nsswitch.conf,v 1.1 2006/09/29 23:52:23 vapier Exp $

#passwd:      compat
#shadow:      compat
#group:       compat

passwd:      files nis
shadow:      files nis
group:       files [success=merge] nis
#group:       files nis

hosts:       files dns
networks:    files dns

services:    db files
protocols:   db files
rpc:         db files
ethers:      db files
netmasks:    files
netgroup:    files nis
bootparams:  files

automount:   files nis
aliases:     files
Comment 4 Joe Harvell 2018-03-09 16:21:48 UTC
Created attachment 523100 [details]
stack trace of systemd-networkd while problem is manifesting
Comment 5 Larry the Git Cow gentoo-dev 2018-03-10 14:10:05 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fbaf911f4355d5c9992694288b586dcbc5f154cc

commit fbaf911f4355d5c9992694288b586dcbc5f154cc
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2018-03-10 14:09:43 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2018-03-10 14:09:43 +0000

    net-nds/rpcbind: use upstream rpcbind.service
    
    Closes: https://bugs.gentoo.org/650030
    Package-Manager: Portage-2.3.24, Repoman-2.3.6_p81

 net-nds/rpcbind/files/rpcbind.service                       | 13 -------------
 .../{rpcbind-0.2.4-r1.ebuild => rpcbind-0.2.4-r2.ebuild}    |  4 +---
 net-nds/rpcbind/rpcbind-9999.ebuild                         |  2 --
 3 files changed, 1 insertion(+), 18 deletions(-)
Comment 6 Mike Gilbert gentoo-dev 2018-03-10 14:10:43 UTC
Good analysis. It turns out we were installing the wrong file by accident here.
Comment 7 Timo Rothenpieler 2018-03-12 11:12:27 UTC
This fix is causing issues if you are using systemd and have built rpcbind without warmstarts.
The upstream systemd unit passes -w, but Gentoo by default builds rpcbind without support for warmstarts, so it just throws a usage error and never starts up due to not knowing the -w option.
Comment 8 Mike Gilbert gentoo-dev 2018-03-12 14:58:46 UTC
(In reply to Timo Rothenpieler from comment #7)

Ah. Do you see any problem with enabling warm starts unconditionally at build time? It looks like it is only enabled at runtime when the -w flag is passed anyway.
Comment 9 Larry the Git Cow gentoo-dev 2018-03-12 17:29:20 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=238eaeb1245f965ce01b4a9a7519bc135b7a410a

commit 238eaeb1245f965ce01b4a9a7519bc135b7a410a
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2018-03-12 17:27:46 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2018-03-12 17:28:58 +0000

    profiles: systemd: enable warmstarts by default for net-nds/rpcbind
    
    Bug: https://bugs.gentoo.org/650030#c7

 profiles/targets/systemd/package.use | 6 ++++++
 1 file changed, 6 insertions(+)

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=0bffded2ba7ff5c3c5660c19c829a6ffeedea353

commit 0bffded2ba7ff5c3c5660c19c829a6ffeedea353
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2018-03-12 17:24:07 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2018-03-12 17:28:58 +0000

    net-nds/rpcbind: require warmstarts for systemd
    
    Bug: https://bugs.gentoo.org/650030#c7
    Package-Manager: Portage-2.3.24, Repoman-2.3.6_p81

 net-nds/rpcbind/{rpcbind-0.2.4-r2.ebuild => rpcbind-0.2.4-r3.ebuild} | 1 +
 net-nds/rpcbind/rpcbind-9999.ebuild                                  | 1 +
 2 files changed, 2 insertions(+)}