Using NetworkManager to manage my eth0 interface, upower to suspend the system, and openrc, the revision bump to -r3 has found an interesting way of breaking suspend for me. Here's what happens: - I trigger suspend, either using gnome-shell's command for this or "dbus-send --system --print-reply --dest=org.freedesktop.UPower /org/freedesktop/UPower org.freedesktop.UPower.Suspend" - NetworkManager deactivates eth0. A few lines from /var/log/messages, I can include more if requested: NetworkManager[1242]: <info> sleep requested (sleeping: no enabled: yes) NetworkManager[1242]: <info> (eth0): deactivating device (reason 'sleeping') [37] NetworkManager[1242]: <info> (eth0): carrier now OFF (device state 10) - NetworkManager runs nm_dispatcher *after* it took down eth0: dbus[1229]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper) - openrc tries to unmount my network filesystems (nfsv4). This times out because the nfs server is now unreachable. - The kernel tries to suspend the system, but fails because a "umount.nfs4" process will not freeze: kernel: [ 1193.022110] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): kernel: [ 1193.022138] umount.nfs4 D ffff880215a1c2b0 0 2352 2351 0x00800004 - NetworkManager brings the interface back up: NetworkManager[1242]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete. - The attempt to bring down the "net" service finally times out: /etc/init.d/nfsmount[2345]: ERROR: nfsmount failed to stop chronyd[1755]: chronyd exiting /etc/init.d/NetworkManager[2322]: ERROR: cannot stop NetworkManager as nfsmount is still up nm-dispatcher.action: Script '/etc/NetworkManager/dispatcher.d/10-openrc-status' exited with error status 1. - Several network services are restarted, but I'm left without nfs mounts (this may be a bug in some nfs init script). Although I should probably find a way to get NetworkManager to not take down eth0 on suspend, I do believe shutting down services depending on "net" *after* my network interfaces have been disabled is a bug. If the network is going down by user request NetworkManager really should shut down those services first, and *then* disable interfaces. Let me know if you need more complete log output.
(assigned to tetromino as he committed this change, hoping that's not too specific.)
This is tricky to get right. I think the full solution needs to look something like the following: First, before networkmanager goes into NM_STATE_DISCONNECTING, it needs to dispatch an event that would bring /etc/init.d/NetworkManager into inactive state. At the moment, it does not do anything of the sort (nm_utils_call_dispatcher is called only after an interface had been brought down), so this will require a patch. Second, if networkmanager detects that it's running one of $BAD_SERVICES (is BAD_SERVICES="netmount nfsmount" reasonable?), before continuing to bring down an interface, it should wait for $N milliseconds (is N = 1000 reasonable?) to give the mounts time to go down. This will require another patch, one which will be not very easy to put into a form acceptable for upstream. Third, openrc's /etc/init.d/netmountand nfs-utils'/etc/init.d/nfsmount must call "umount -f" if there are no openrc services providing net. This is because (a) the $N millisecond timeout might not be enough under pathological conditions, and (b) networkmanager can bring down the interface with no warning due to hardware events, e.g. if the ethernet cable is unplugged or if the user flips the airplane-mode radio switch.
I have not looked into how event dispatch actually works, but ideally networkmanager would wait for the dispatcher process to exit before continuing. That might still need a timeout (continue anyway if that process gets stuck), but that timeout could sensibly be a few seconds, without blocking things for the common case of services shutting down quickly. > (b) networkmanager can bring down the interface with no warning > due to hardware events, e.g. if the ethernet cable is unplugged or if the user > flips the airplane-mode radio switch. I assume (b) here is why networkmanager does not already dispatch an event before interfaces go down (because services have to deal with the network going away unannounced anyway). I still think it'd be nice to provide a more clean shutdown process when user-requested, if it's at all possible to implement. Upstream might disagree, though. In which case this is probably not worth deviating from them over. For now I think I'll go poke around for a way to get networkmanager to just not touch my interfaces on suspend, as I'm pretty sure when openrc managed them they were just left untouched, and things pretty much worked.
There is an upstream bug open about adding this feature, and has patches attached, but upstream can't seem to come to an agreement about timeouts on pre-up and pre-down events :/ https://bugzilla.gnome.org/show_bug.cgi?id=387832 My take on this is that pre-up/pre-down events are useful enough to carry an out-of-tree patch, especially since Christian Becke has already done most of the hard work, but such events must have a hard timeout (e.g. to ensure that upower doesn't decide that the system has failed to suspend because networkmanager is taking too long to go to sleep), and therefore /etc/init.d/netmount and nfs-utils'/etc/init.d/nfsmount still need to switch to "umount -f".
In networkmanager-0.9.2.0-r4 should solve the most egregious part of the problem (the suspend issue) by implementing a "pre-sleep" event. After receiving a suspend message from upower, networkmanager-0.9.2.0-r4 will spend up to 5 seconds marking its openrc service as inactive, and stopping anything else depending on it (such as nfs), before starting to bring down interfaces. Unfortunately, I cannot come up with a sane, safe, and race-free way of marking the networkmanager openrc service as inactive based on individual pre-down events (e.g. when the user turns off wifi via the networkmanager applet) when nm is managing multiple interfaces. Therefore, openrc's /etc/init.d/netmount and nfs-utils'/etc/init.d/nfsmount still need to call "umount -f" if there are no openrc services providing net. >*networkmanager-0.9.2.0-r4 (20 Feb 2012) > > 20 Feb 2012; Alexandre Rostovtsev <tetromino@gentoo.org> > +files/10-openrc-status-r1, +networkmanager-0.9.2.0-r4.ebuild, > +files/networkmanager-0.9.2.0-ifnet-password-truncated.patch, > +files/networkmanager-0.9.2.0-init-provide-net-r1.patch, > +files/networkmanager-0.9.2.0-pre-sleep.patch: > Fix openrc service going inactive while active connections are present (bug > #402613, thanks to Thomas Witt). Try to be more user-friendly by waiting a > few seconds before marking the service as inactive. Dispatch a pre-sleep > event to unmount network filesystems before suspending (bug #402085, thanks > to Marien Zwart). Do not truncate WPA passwords at '#' character (bug > #402133, thanks to John Hardin).
I've written down some information on network connectivity dependency problems. I will appreciate any comments and help. https://fedoraproject.org/wiki/Networking/Dependencies I would like to set up a long-term stratagy for open source projects to cope with these dependency issues.
@tetromino, what more else can we do for this?
Should be solved with latest tetromino's change