Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 402085 - net-misc/networkmanager-0.9.2.0-r3 stops openrc services *after* shutting down the network interface (on suspend)
Summary: net-misc/networkmanager-0.9.2.0-r3 stops openrc services *after* shutting dow...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Linux Gnome Desktop Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-03 17:52 UTC by Marien Zwart (RETIRED)
Modified: 2013-09-29 09:11 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marien Zwart (RETIRED) gentoo-dev 2012-02-03 17:52:08 UTC
Using NetworkManager to manage my eth0 interface, upower to suspend the system, and openrc, the revision bump to -r3 has found an interesting way of breaking suspend for me. Here's what happens:

- I trigger suspend, either using gnome-shell's command for this or "dbus-send --system --print-reply --dest=org.freedesktop.UPower /org/freedesktop/UPower org.freedesktop.UPower.Suspend"

- NetworkManager deactivates eth0. A few lines from /var/log/messages, I can include more if requested:

NetworkManager[1242]: <info> sleep requested (sleeping: no  enabled: yes)
NetworkManager[1242]: <info> (eth0): deactivating device (reason 'sleeping') [37]
NetworkManager[1242]: <info> (eth0): carrier now OFF (device state 10)

- NetworkManager runs nm_dispatcher *after* it took down eth0:

dbus[1229]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)

- openrc tries to unmount my network filesystems (nfsv4). This times out because the nfs server is now unreachable.

- The kernel tries to suspend the system, but fails because a "umount.nfs4" process will not freeze:

kernel: [ 1193.022110] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: [ 1193.022138] umount.nfs4     D ffff880215a1c2b0     0  2352   2351 0x00800004

- NetworkManager brings the interface back up:

NetworkManager[1242]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.

- The attempt to bring down the "net" service finally times out:

/etc/init.d/nfsmount[2345]: ERROR: nfsmount failed to stop
chronyd[1755]: chronyd exiting
/etc/init.d/NetworkManager[2322]: ERROR: cannot stop NetworkManager as nfsmount is still up
nm-dispatcher.action: Script '/etc/NetworkManager/dispatcher.d/10-openrc-status' exited with error status 1.

- Several network services are restarted, but I'm left without nfs mounts (this may be a bug in some nfs init script).

Although I should probably find a way to get NetworkManager to not take down eth0 on suspend, I do believe shutting down services depending on "net" *after* my network interfaces have been disabled is a bug. If the network is going down by user request NetworkManager really should shut down those services first, and *then* disable interfaces.

Let me know if you need more complete log output.
Comment 1 Marien Zwart (RETIRED) gentoo-dev 2012-02-03 17:53:41 UTC
(assigned to tetromino as he committed this change, hoping that's not too specific.)
Comment 2 Alexandre Rostovtsev (RETIRED) gentoo-dev 2012-02-03 19:34:49 UTC
This is tricky to get right. I think the full solution needs to look something like the following:

First, before networkmanager goes into NM_STATE_DISCONNECTING, it needs to dispatch an event that would bring /etc/init.d/NetworkManager into inactive state. At the moment, it does not do anything of the sort (nm_utils_call_dispatcher is called only after an interface had been brought down), so this will require a patch.

Second, if networkmanager detects that it's running one of $BAD_SERVICES (is BAD_SERVICES="netmount nfsmount" reasonable?), before continuing to bring down an interface, it should wait for $N milliseconds (is N = 1000 reasonable?) to give the mounts time to go down. This will require another patch, one which will be not very easy to put into a form acceptable for upstream.

Third, openrc's /etc/init.d/netmountand nfs-utils'/etc/init.d/nfsmount must call "umount -f" if there are no openrc services providing net. This is because (a) the $N millisecond timeout might not be enough under pathological conditions, and (b) networkmanager can bring down the interface with no warning due to hardware events, e.g. if the ethernet cable is unplugged or if the user flips the airplane-mode radio switch.
Comment 3 Marien Zwart (RETIRED) gentoo-dev 2012-02-03 20:53:00 UTC
I have not looked into how event dispatch actually works, but ideally networkmanager would wait for the dispatcher process to exit before continuing. That might still need a timeout (continue anyway if that process gets stuck), but that timeout could sensibly be a few seconds, without blocking things for the common case of services shutting down quickly.

> (b) networkmanager can bring down the interface with no warning
> due to hardware events, e.g. if the ethernet cable is unplugged or if the user
> flips the airplane-mode radio switch.

I assume (b) here is why networkmanager does not already dispatch an event before interfaces go down (because services have to deal with the network going away unannounced anyway). I still think it'd be nice to provide a more clean shutdown process when user-requested, if it's at all possible to implement. Upstream might disagree, though. In which case this is probably not worth deviating from them over.

For now I think I'll go poke around for a way to get networkmanager to just not touch my interfaces on suspend, as I'm pretty sure when openrc managed them they were just left untouched, and things pretty much worked.
Comment 4 Alexandre Rostovtsev (RETIRED) gentoo-dev 2012-02-03 23:51:07 UTC
There is an upstream bug open about adding this feature, and has patches attached, but upstream can't seem to come to an agreement about timeouts on pre-up and pre-down events :/

https://bugzilla.gnome.org/show_bug.cgi?id=387832

My take on this is that pre-up/pre-down events are useful enough to carry an out-of-tree patch, especially since Christian Becke has already done most of the hard work, but such events must have a hard timeout (e.g. to ensure that upower doesn't decide that the system has failed to suspend because networkmanager is taking too long to go to sleep), and therefore /etc/init.d/netmount and nfs-utils'/etc/init.d/nfsmount still need to switch to "umount -f".
Comment 5 Alexandre Rostovtsev (RETIRED) gentoo-dev 2012-02-20 09:48:01 UTC
In networkmanager-0.9.2.0-r4 should solve the most egregious part of the problem (the suspend issue) by implementing a "pre-sleep" event. After receiving a suspend message from upower, networkmanager-0.9.2.0-r4 will spend up to 5 seconds marking its openrc service as inactive, and stopping anything else depending on it (such as nfs), before starting to bring down interfaces.

Unfortunately, I cannot come up with a sane, safe, and race-free way of marking the networkmanager openrc service as inactive based on individual pre-down events (e.g. when the user turns off wifi via the networkmanager applet) when nm is managing multiple interfaces. Therefore, openrc's /etc/init.d/netmount and nfs-utils'/etc/init.d/nfsmount still need to call "umount -f" if there are no openrc services providing net.

>*networkmanager-0.9.2.0-r4 (20 Feb 2012)
>
>  20 Feb 2012; Alexandre Rostovtsev <tetromino@gentoo.org>
>  +files/10-openrc-status-r1, +networkmanager-0.9.2.0-r4.ebuild,
>  +files/networkmanager-0.9.2.0-ifnet-password-truncated.patch,
>  +files/networkmanager-0.9.2.0-init-provide-net-r1.patch,
>  +files/networkmanager-0.9.2.0-pre-sleep.patch:
>  Fix openrc service going inactive while active connections are present (bug
>  #402613, thanks to Thomas Witt). Try to be more user-friendly by waiting a
>  few seconds before marking the service as inactive. Dispatch a pre-sleep
>  event to unmount network filesystems before suspending (bug #402085, thanks
>  to Marien Zwart). Do not truncate WPA passwords at '#' character (bug
>  #402133, thanks to John Hardin).
Comment 6 Pavel Šimerda 2012-11-23 09:46:25 UTC
I've written down some information on network connectivity dependency problems. I will appreciate any comments and help.

https://fedoraproject.org/wiki/Networking/Dependencies

I would like to set up a long-term stratagy for open source projects
to cope with these dependency issues.
Comment 7 Pacho Ramos gentoo-dev 2013-06-16 11:56:42 UTC
@tetromino, what more else can we do for this?
Comment 8 Pacho Ramos gentoo-dev 2013-09-29 09:11:30 UTC
Should be solved with latest tetromino's change