Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 297153 - net-misc/openvpn: init script finishes before openvpn is up, causing dependent services to fail
Summary: net-misc/openvpn: init script finishes before openvpn is up, causing dependen...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High major with 1 vote (vote)
Assignee: William Hubbs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-16 10:42 UTC by Navid Zamani
Modified: 2024-03-03 21:39 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
/etc/conf.d/net (net,1.76 KB, text/plain)
2009-12-16 10:44 UTC, Navid Zamani
Details
/etc/conf.d/openvpn (openvpn,920 bytes, text/plain)
2009-12-16 10:44 UTC, Navid Zamani
Details
Resulting /etc/resolv.conf after successful manual re-start. (resolv.conf,146 bytes, text/plain)
2009-12-16 10:45 UTC, Navid Zamani
Details
/etc/openvpn/vpn.conf (vpn.conf,627 bytes, text/plain)
2009-12-16 10:48 UTC, Navid Zamani
Details
unsuccessful-attempt.log (unsuccessful-attempt.log,3.60 KB, text/plain)
2009-12-16 11:12 UTC, Navid Zamani
Details
successful-manual-restart.log (successful-manual-restart.log,6.54 KB, text/plain)
2009-12-16 11:13 UTC, Navid Zamani
Details
Obligatory “emerge --info”. (emerge --info,5.00 KB, text/plain)
2009-12-16 11:17 UTC, Navid Zamani
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Navid Zamani 2009-12-16 10:42:51 UTC
I have a configuration, where a bridge depends on an openvpn tunnel as one of its parts. The dependencies are all properly set. Openvpn gets started before the bridge. Then the bridge gets started and waits for the openvpn init script to finish. But that one finishes too early, before its initialization is complete. As a result, the bridge can’t send out the dhcp request though the tunnel, which causes it to time out. In the end, no connection trough the tunnel is possible.

I will attach the network setup, and the log file parts of the unsuccessful attempt.

Ask me, if anything about the network structure is unclear. :)

Reproducible: Sometimes

Steps to Reproduce:
This one will be pretty hard to reproduce, unless you get access to a vpn server and are ready to twist your network setup until it’s hard to twist it back. :) (Two virtual machines would help in simulating this.)
One important thing to note, is that this is timing-dependent. So it happens every time, if openvpn takes longer after already having exited the script, than the bridge takes, to get to the point where dhcp is ran.
1. Set up a openvpn server in tap mode to be able to take connections from this client.
2. Set up your openvpn-2.1_rc21 client with the supplied configuration (or one that produces equivalent behavior to fit the behavior seen in the supplied log).
3. Set rc_parallel="YES" and rc_depend_strict="YES" in rc.conf (baselayout 2)
4. Set up networking with the supplied configuration files.
5. Create init script links: for i in br0 vpn vbox0; do ln -s net.lo net.$i; done; ln -s openvpn openvpn.vpn;
6. Add net.br0 to the default runlevel (the other ones are started as “needed“ dependencies.
7. Reboot, and hope the timing fits.
8. Make a ping trough the tunnel.
Actual Results:  
See the attached unsuccessful-attempt.log.

Expected Results:  
Something like successful-manual-restart.log, if it were started at init time.

I don’t think it’s required to go trough all this, to reproduce the bug. With a bit of luck, it’s enough to set up a simple openvpn, and look at the time the init script exits successfully, versus the time, the initialization is actually complete.

Please wait for me to upload the log attachments.
Comment 1 Navid Zamani 2009-12-16 10:44:10 UTC
Created attachment 213174 [details]
/etc/conf.d/net
Comment 2 Navid Zamani 2009-12-16 10:44:49 UTC
Created attachment 213176 [details]
/etc/conf.d/openvpn
Comment 3 Navid Zamani 2009-12-16 10:45:56 UTC
Created attachment 213177 [details]
Resulting /etc/resolv.conf after successful manual re-start.
Comment 4 Navid Zamani 2009-12-16 10:48:44 UTC
Created attachment 213179 [details]
/etc/openvpn/vpn.conf
Comment 5 Navid Zamani 2009-12-16 11:12:58 UTC
Created attachment 213180 [details]
unsuccessful-attempt.log
Comment 6 Navid Zamani 2009-12-16 11:13:16 UTC
Created attachment 213181 [details]
successful-manual-restart.log
Comment 7 Navid Zamani 2009-12-16 11:17:13 UTC
Created attachment 213187 [details]
Obligatory “emerge --info”.

OK, that’s all, folks! :)

As I said, most likely, you don’t need to go trough it all to understand this. I recommend starting to set up openvpn, check the startup timing, and compare it to “unsuccessful-attempt.log”.
Comment 8 Navid Zamani 2010-03-10 02:34:45 UTC
Hello. This bug still exists. Right now I have to restart net.br0 about every second time I boot. Veeery annoying. Nobody interested? :(
Comment 9 Navid Zamani 2010-07-15 13:23:07 UTC
Hey, I’m just wondering if there is a reason nobody reacts to this bug?
I doubt it’s a hard one. I bet the one who actually wrote the openvpn init script contraption would take about 5 minutes, to solve it.
Comment 10 Navid Zamani 2010-07-15 13:28:04 UTC
OK, please disregard the "This one will be pretty hard to reproduce," and "it’s hard to twist it back."

It’s not that hard to reproduce at all. All you need is to set up openvpn on a server, so that it does not do its own dhcp but leaves it to the bridging script later.
As then, openvpn exits as if it is done, before it actually is ready for the then starting br0 to do dhcp over it.

Which means, *actually* you don’t have to reproduce the whole thing at all. Just make the openvpn init script not signal being ready, before the connection actually stands. (Hint: I learned that this is not when openvpn exits, or calls that up script, but a bit later.)
Comment 11 Navid Zamani 2010-07-15 14:17:09 UTC
OK, I also tried  with RC_PARALLEL="NO". No difference.

I even tried moving the re-entrant code to be called on route-up. Which worked. But did not help.

The core problem is still, that br0 is started (and tried to do dhcp), BEFORE openvpn says "Initialization Sequence Completed".
I don’t know what’s still missing on route-up, but it’s not enough for the dhcp of the bridge to work.
Comment 12 Navid Zamani 2010-12-19 22:42:18 UTC
This is still a major problem here. Anybody??
Comment 13 Navid Zamani 2010-12-19 22:43:21 UTC
Raising it to major, because having to restart net.br0 on *every* reboot for months is getting extremely annoying.
Comment 14 Navid Zamani 2011-01-03 05:08:16 UTC
Hmm, Even --up-delay doesn’t help…
Comment 15 Dirkjan Ochtman (RETIRED) gentoo-dev 2012-02-17 09:46:40 UTC
Does this still happen with more recent versions? I'm not sure there's much I can do, perhaps you can consult upstream about how best to fix this.
Comment 16 Navid Zamani 2012-02-17 18:24:45 UTC
Yes, this would definitely need a code change to allow detection of when OpenVPN is really finished.
As a temporary hack, my only solution would be, to look at its log output and watch for a certain message that only appears when it is done… But of course that is unsatisfying, to say the least. ;)

I don’t use my VPN anymore, since I’m now always directly connected to the network. So I can’t say if it still happens. I’d bet money on it though.
(But be aware that it’s very much dependent on race conditions, and may well work for a long time, and suddenly stop working because one unrelated service loads one second quicker.)

I don’t know what to do with this bug… It’s still a bug, but it’s not a problem I come in touch with anymore. So… what do we do? I feel bad with both, letting it just float around, or closing it. ;)
Comment 17 Marcel Pennewiß 2012-02-19 09:30:57 UTC
OpenVPN allows access to Management-Interface (via Unix Sockets e.g.). You can query connection state using command "state". [1]

But waiting until connection established should not be default, cause' it could break startup (e.g. from suspend-to-ram) while waiting too login.

[1] http://www.openvpn.net/index.php/open-source/documentation/miscellaneous/79-management-interface.html
Comment 18 Marcel Pennewiß 2012-02-19 09:35:29 UTC
(In reply to comment #0)
> I have a configuration, where a bridge depends on an openvpn tunnel as one of
> its parts. The dependencies are all properly set. Openvpn gets started before
> the bridge. Then the bridge gets started and waits for the openvpn init script
> to finish. But that one finishes too early, before its initialization is
> complete. As a result, the bridge can’t send out the dhcp request though the
> tunnel, which causes it to time out. In the end, no connection trough the
> tunnel is possible.

Build/Extend bridging setup easily via up/down-script could solve all your problems. While your bridge-setup (dhcp) depends on openvpn is "up" it should be better to use the scripts which benefits your needs.
Comment 19 Navid Zamani 2012-02-19 19:44:48 UTC
(In reply to comment #18)
> Build/Extend bridging setup easily via up/down-script could solve all your
> problems. While your bridge-setup (dhcp) depends on openvpn is "up" it should
> be better to use the scripts which benefits your needs.

*double extended Picard facepalm*

If you had checked what I tried, you’d know that I opened this bug, because the up script is executed long before OpenVPN actually is up, causing dhcpcd to time out because it can’t connect to the dhcp server, causing everything depending on the network to fail too.

How about, you read the bug report first, next time? ;)
Comment 20 Navid Zamani 2012-02-19 19:52:24 UTC
(In reply to comment #17)
> OpenVPN allows access to Management-Interface (via Unix Sockets e.g.). You can
> query connection state using command "state". [1]

Hey, nice find! That could actually solve it! If I’d still use OpenVPN, I’d change my up script to wait until the management interface tells me it’s really up. :)

> But waiting until connection established should not be default, cause' it could
> break startup (e.g. from suspend-to-ram) while waiting too login.

Well, since the VPN is the only connection allowed by the firewall, and the box is (/was) useless without that connection (it mounts the sshfs that contains all the actual user data), that’s exactly what it should to. :)
If it can’t connect, there is no point in logging in, or even booting up the other services. If they have to wait forever for a connection, they should wait forever. Otherwise they will fail anyway. (Which they did before I added the forced dependencies. And it always caused a big mess that made it easier to just reboot the box than to fix it manually.)
Comment 21 Marcel Pennewiß 2012-02-19 20:09:01 UTC
(In reply to comment #19)
> If you had checked what I tried, you’d know that I opened this bug, because the
> up script is executed long before OpenVPN actually is up, causing dhcpcd to
> time out because it can’t connect to the dhcp server, causing everything
> depending on the network to fail too.

You're right. 

What about "--up-delay" or "--route-up"? Did you try this?
Comment 22 Marcel Pennewiß 2012-02-19 20:11:16 UTC
(In reply to comment #20)
> Well, since the VPN is the only connection allowed by the firewall, and the box
> is (/was) useless without that connection (it mounts the sshfs that contains
> all the actual user data), that’s exactly what it should to. :)
> If it can’t connect, there is no point in logging in, or even booting up the
> other services. If they have to wait forever for a connection, they should wait
> forever. Otherwise they will fail anyway. (Which they did before I added the
> forced dependencies. And it always caused a big mess that made it easier to
> just reboot the box than to fix it manually.)

This is your use case. But this is not default. So, such a functionality should not be default.
Comment 23 Navid Zamani 2012-02-19 20:25:01 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > If you had checked what I tried, you’d know that I opened this bug, because the
> > up script is executed long before OpenVPN actually is up, causing dhcpcd to
> > time out because it can’t connect to the dhcp server, causing everything
> > depending on the network to fail too.
> What about "--up-delay" or "--route-up"? Did you try this?

Believe me, I tried every possible trigger I could find. Even hackish stuff with adding a delay before the dhcpcd run, running multiple dhcpcds and killing all others when the first one succeeds, etc. It all either doesn’t work, or creates more of a mess than it solves. :(
That’s why I made this bug.
Comment 24 Navid Zamani 2012-02-19 20:39:01 UTC
(In reply to comment #22)
> This is your use case. But this is not default. So, such a functionality should
> not be default.

Hmm… why would anyone start OpenVPN, if he doesn’t plan on using it when the init script says it’s started?
The problem is that the init script says „I’m done, OpenVPN us up. Go ahead, use it.“. But when you do, it’s not actually up for another couple of seconds and all the stuff that’s trying to use it falls over and fails.

What would be the argument for having it like it is now? Just so it won’t stall forever? I don’t think that’s a valid argument.
Because all services that don’t depend on it, would start anyway. And all services that depend on it, can’t work without it being fully up.

But I’m of course open to hear reasons I missed.

I’m for having all services that need it to run, waiting for it to actually work before they try to use it… so they won’t fail. (With a timeout, of course.)
I’m sure everyone undoubtedly agrees with that.
Comment 25 Manuel Rüger (RETIRED) gentoo-dev 2016-09-01 12:40:06 UTC
This bug has gotten really old, can you please retry with openvpn-2.3.12 and see if the issue still exists?
Comment 26 Navid Zamani 2016-09-01 16:19:48 UTC
I am not surprised to say that it does.
Have you tried reproducing it as per my description?
If you can show that it doesn’t happen for you, then I’m curious about what’s different in your case.

I don’t understand why people believe that age would magically make bugs go away. It is distinctly not a fine wine. ^^
Comment 27 Manuel Rüger (RETIRED) gentoo-dev 2016-09-01 19:11:00 UTC
Thanks for your response! I took over maintenance recently and wanted to check which bugs are still valid and which not.