Summary: | supervisor plugin design and a runit example | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | Benda Xu <heroxbd> |
Component: | OpenRC | Assignee: | OpenRC Team <openrc> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | CasperVector, ccx, dlan, eivind, jakub, lu_zero, sorin.panca, tokiclover, xaionaro |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
runit.patch
runit.patch runit.patch |
Description
Benda Xu
2014-02-15 09:00:36 UTC
Created attachment 370460 [details, diff]
runit.patch
Created attachment 370462 [details, diff]
runit.patch
sorry, the patch comment should be (without 5.) 1. separate background and foreground execution modes 2. split start-stop-daemon template into an addon, add runit along with it 3. init script can just set command{,_args}, arg_{foreground,background} and use the default start and stop defined in the addons. 4. command_foreground does not need to be set when $command starts foreground by default. Same applies to command_background. How to test it out on rsync daemon: 1. install runit 2. runsvchdir /var/runit 3. runsvdir-start (manually or via inittab) 4. rc-service rsyncd stop 5. add these two lines to /etc/conf.d/rsyncd VIA=runit arg_foreground="--no-detach" 6. rc-service rsyncd start confirm rsyncd is started by runit. This changes the semantics of "started" to "scheduled to start"; scripts relying on this including dependencies will break, including dependencies. Sleeping for set timeout is silly. It's better to create the whole directory and move/symlink it to the watched directory atomically, otherwise you have race conditions. I don't think that having scripts and logs in the same place is the best idea. Ideally the watched directory would be in /run which is tmpfs so we don't get junk from crashed boot. You create run script that executes single command. That might or might not be sufficient. We quite often have cleanup / prepare in start() eg. postgresql notoriously leaves behind unix socket when killed and refuses to start if it is present, so you need to remove that. You could run specific exported function (eg. start_foreground) from the actual initscript so you could reuse the code for cleanup / prepare, but that'd require modification of runscript so you don't get superfluous fork() obscuring the way, eg. like I did in http://bpaste.net/show/37565/. Curiously, you force full shutdown on reload, instead of sending some kind of reload signal (most frequently SIGHUP). Also you don't regenerate the run file in that case. Sleeping for 5s and hoping it will be enough really isn't the way. Runit provides a mechanism for waiting for daemon to start, including custom check script that you can use to ascertain whether the daemon is running. It's slightly deficient though, so you really have to wait for supervise/ok fifo to appear (inotifywait can help where we have it, otherwise sleep loop with say 0.5s interval will be way nicer than just 5s plain) and then you issue "sv check foobar" to run whatever test you have on the service and wait up to configurable time until it succeeds. This would require writing check scripts for services we want supervised. This isn't as hard as it may sound, for a lot of daemons we can consider them up when some socket is open. We can extract this either from configfile/conf.d where we have this information already or we can (non-posix though!) check just /proc/pid/fd whether any unix or inet sockets are open. HTH PS: sv start also waits for ./check so if we precreate all service directories at once with ./down in place, runsv is going to be already running for all except the earliest scripts and we can just use "sv start" in place of "sv check". Ok, what I wrote above about requiring a patch to runscript itself is slightly incorrect. I just find it to be way cleaner approach. To make start_pre (aka prepare in above post) work, you might need to extract current values out of configuration file. If you do so you also need to have current $command and $command_args. If parts of ./run are hardcoded and parts call into initscript you will get into trouble whenever you change anything significant in configuration file and the daemon gets autorestarted. There are about three different ways you can call the initscript to avoid this problem and reuse the code. The ./run script can be either: rc-service foo start_pre && exec $(rc-service foo print_command) where the exported commands set up the daemon for running and print the command that is supposed to be executed respectively. You can merge it to one function but you have to be sure to not print any status/errors to stdout, which is what einfo/ebegin family of commands does. Or you can do what I wrote patch for: have your ./run be just exec rc-service foo start_foreground so you can have custom start_foreground command, which does actually handle some corner cases better, eg. have arguments with whitespace, which is not really possible with the way $command and $command_args are normally handled. Instead of sleeping 5 seconds, if you are waiting for a file to appear, you can use waitfile (see the openrc-run manpage if you are looking at openrc-git). Also, I highly discourage using the addon code and making these addons. I would put the start-stop-daemon.sh and runit.sh files in the sh directory. You import them into runscript.sh.in using some variant of the sourcex call. Hey Jan, Long time no see! exec rc-service foo start_foreground feels like a cool solution. If we can derive foo from environmental variable or directory name, there will an universal runit directory! (The cat >> run in my patch is really ugly) For other parts, I am sure I had done something suboptimal (sleep 5s, state, etc.). But I could not fully catch your words for I am not that experienced with process supervising and specifically runit. Could you please paste some code to aid your argument? (patch or git repo, perfect if based on my patch) Thanks a lot. @William, thanks for the input. I'll replace addons with sourcex in my next patch. Benda Created attachment 372786 [details, diff]
runit.patch
All,
here is a slightly updated version of this patch.
This includes updates to the openrc-run man page for the new variables
as well.
I had to remove the command_env variable, because it could only hold one
environment variable. Also I removed start_wait because that is specific
to start-stop-daemon.
Benda, your name will be listed as the primary author; I just
made some modifications.
I need input on this version, in particular, I'm not following how to
make sure a service was successfully started.
Thanks,
William
For the record, it seems we can signal to runsvdir that we want to reload the service directory by sending it the CONT signal. http://comments.gmane.org/gmane.comp.sysutils.supervision.general/99 we will ofc. still need to waitfile on the supervise directory's content but this should get us rid of unpleasant delay when adding a service I have some questions about this that are still not clear to me. 1. How do we start runsvdir to begin with? 2. How do we make sure runsvdir is restarted if it dies? I see only two ways we can do this. We can figure out a way to add this capability to the patch, or we can just write a document telling users how to set it up. (In reply to William Hubbs from comment #11) > I have some questions about this that are still not clear to me. > > 1. How do we start runsvdir to begin with? > 2. How do we make sure runsvdir is restarted if it dies? The present thinking is to start runsvdir from inittab, so that it gets persist. The demerit is that the user has to add it manually, we should document it. > I see only two ways we can do this. We can figure out a way to add this > capability to the patch, or we can just write a document telling users how > to set it up. Can we really achieve this inside OpenRC? I prefer documenting to the users how to start runsvdir from inittab. (In reply to William Hubbs from comment #9) > Created attachment 372786 [details, diff] [details, diff] > runit.patch > > All, > > here is a slightly updated version of this patch. > > This includes updates to the openrc-run man page for the new variables > as well. > > I had to remove the command_env variable, because it could only hold one > environment variable. Also I removed start_wait because that is specific > to start-stop-daemon. Well done, William. Thank you. I will base the next patch on this. > I need input on this version, in particular, I'm not following how to > make sure a service was successfully started. More on this later. On problem here, we don't know the pid of runsvdir to send the CONT signal. (In reply to Benda Xu from comment #13) > On problem here, we don't know the pid of runsvdir to send the CONT signal. One probability is to start runsvdir with a separate runsv, and use the runsv interface such as supervise/pid to send signals to runsvdir reliably. A bonus is that inittab needs not to be modified to respawn runsvdir. All, I just added the most recent version of runit to the tree, so now we can talk more about this bug. Upstream runit documents how runit should be used with sysvinit [1]. This sounds like we should put runsvdir-start in inittab and actually write runit scripts for the services we would want runit to handle instead of having openrc generate them dynamically. I would like some comments. What do people think? [1] http://www.smarden.org/runit/useinit.html (In reply to William Hubbs from comment #15) > This sounds like we should put runsvdir-start in inittab and actually > write runit scripts for the services we would want runit to handle > instead of having openrc generate them dynamically. That means we use OpenRC and runit separately, doesn't it? Then do we focus on a general interface for OpenRC to interact with the supervisors instead? After looking at runit and how it works, it seems to me that the domain that it and openrc work in are not exactly orthogonal, and it would be more work than it's worth to get them to play nicely together. That said, I'm working on a small init replacement (inspired by https://felipec.wordpress.com/2013/11/04/init/) and have it where it can boot and shut down the system reliably using OpenRC to do all the heavy lifting. It should be fairly easy to add functionality to it, like process supervision and the like, but in a way that interfaces cleanly with OpenRC. Just my 2¢. (In reply to James L. Hammons from comment #17) > After looking at runit and how it works, it seems to me that the domain that > it and openrc work in are not exactly orthogonal, and it would be more work > than it's worth to get them to play nicely together. > > That said, I'm working on a small init replacement (inspired by > https://felipec.wordpress.com/2013/11/04/init/) and have it where it can > boot and shut down the system reliably using OpenRC to do all the heavy > lifting. It should be fairly easy to add functionality to it, like process > supervision and the like, but in a way that interfaces cleanly with OpenRC. > > Just my 2¢. Great! Thanks for the link to very interesting article. The ball is now rolling! Check out http://forums.gentoo.org/viewtopic-p-7664146.html for more information. Testing is needed and appreciated; advice & code are always welcome. Hello James, (In reply to James L. Hammons from comment #19) > The ball is now rolling! Check out > http://forums.gentoo.org/viewtopic-p-7664146.html for more information. > > Testing is needed and appreciated; advice & code are always welcome. Very nice post! I am impressed by how short the Ruby code is. Benda I've just discovered this bug... when trying to fill a bug in OpenRC repository. (I've already forgot about that bug already;-) I've filled this bug #533418 after AntP. bug #521918 (shutdown is broken), bug #522204 (login shells are broken - patch obsoleted by above fix) and bug #522786 (2.1.2 version bump) in order to have an out of the box "Just Works(TM)" replacement of SysVinit `init' by `runit-init'. [ Parenthesis of the above: I've runned into an issue described here in topic #998478 (http://forums.gentoo.org/viewtopic-t-998478-start-25.html) when using `runit-init' as PID 1. In short, when a process or daemon (runned in the foreground) hangs in stage 1. What happen afterwards? `runit-init' wait forever with inactive C-ALT-DEL.] To make a _short_ story short, that topic and other topics related to PID 1 and SystemD have interesting discussions on service supervision. I think a sane approach is DO NOT SUPERVISE EVERY DAEMON/SERVICE because this can be very dangerous be it in desktop, server or what else. Nobody would want to have dead service/daemon restarted for everything. Supervision can be beneficial if used for particular daemon/service. So the choice of supervision _should_ be available in init service script but not set globaly in `rc.conf' because this is potentially dangerous. Another issue is with start/stop process/daemon itself. Taking runit case, `runsvdir' supervisor scans the root service directory every 5 seconds or so. So, making a symlink and waiting `runsvdir' to pick up the service later is not pratical in boot/shutdown cases because a service _should_ be started/stopped ASAP and not in 5--or whatever the delay is--because the need to start/stop a service right away is necessary. runit provide `sv' binary to do this. So, the current patch should implement a less flawed start/stop mechanism. -- ... # This `sv_dir="${RC_SVCDIR}/sv/${RC_SVCNAME}"' shoudl be replaced # by the following because separating runit/OpenRC service directory # makes sense rather than stuffing `init.d' sv_dir="${RC_SERVICE}/../sv/${RC_SVCNAME}" # Define this handy env variable to avoid unecessary heavyness sv_rundir="${RC_SVCDIR}/runit/${RC_SVCNAME}" start() { do_service ebegin "Starting runit supervised ${RC_SVCNAME}" ln -s "${sv_dir}" "${sv_rundir}" sv start "${sv_rundir} eend $? } stop() { ebegin "Stopping runit supervised ${RC_SVCNAME}" rc stop "${sv_rundir}" eend $? # Is this really needed? (tmpfs) #rm -f "${sv_rundir}" } status() { sv status "${sv_rundir}" } reload() { sv reload "${sv_rundir}" eend $? } -- Notice I renamed `make_sv_dir' by `do_service'. `make_sv_dir' sound a little... hard to grasp. This is just a cosmetic change. Another cosmetic change is the start/stop message. Thanks. Well, I played a little with a modified variant of William's patch and it is indeed impossible to get runit & OpenRC play nicely together because of... solely of the `start()' function. The other functions are no problem with `sv' around. It's just that `sv' is completly useless to start a new service which wasn't running before its invocation. It will sipmly fails in this case. [Note: I am actually using runit-init as PID 1, but other than getting getty supervised, I don't use it for other purpose at the moment.] So, I included a line to start a new instance of runsvdir in the start function for testing. Using `runsv' to start a service in a subshell (because it does run in the foreground) bring a great deal of race condition which can end up by launching dozen of daemons. And stopping them with `sv' would not be enough without a kill all command. And indeed, the log directory is more of a bother tan anything else and I had to remove it quickly to avoid useless hassles. s6 developer released an new 2.0 of s6 suite and he's going to simplify and clea a little of he code in this upcoming year. I took a look at it... and get issue after another to merge cleanly the package. Actually, 2.0 introduced a more standard build/installation (configure/Makefile) but it's not quite a neat for now. So I will wait a little to experiment this patch with s6. I agree that not every service should be supervised; that this way madness lies. But I can also see the use cases for having services be supervised, like agetty and sshd (if you are in a remote session and accidentally kill the root sshd process, for example). The funny thing is that Sys V init already does process supervision by setting the appropriate lines in /etc/inittab (otherwise, your login shells wouldn't respawn after you logged out of them!), so the mechanism is already there--the problem is that it doesn't integrate *at all* with OpenRC. So, basically what I'm doing with my small init replacement is adding the ability to monitor processes and relaunch them using OpenRC if they die (only if the user wants the service monitored). This requires a small patch to start-stop-daemon, but the impact on OpenRC overall is very minor and requires no changes to existing scripts. I have proof-of-concept code working right now, but need to refine it. The nice thing about all of this is that even with my small patches in start-stop-daemon, you can still run regular Sys V init with it without any problems. The patches only do anything if my small init is running. FWIW, my 2¢. Folks, I am back to working on this bug again, and commit abef2fc adds the ability to override the start, stop and status functions. there is another approach for making runit available to supervise services on an OpenRC system, without replacing init, which we haven't considered. The theory is that runit itself will be an OpenRC service, then any services that are supervised by runit will have a need dependency for the runit service. The down side of this is that runit will not be supervised. However, I think the runit process itself should be stable enough that we don't have to worry too much about it crashing. Also, I don't think we should be trying to automatically generate runit services; I think building the services should be left to the service authors. I will post a new patch shortly. The commit I just cited has an issue. It forces the supervisor to be set in /etc/conf.d/* or /etc/rc.conf. This is not really correct since we want script authors to control this, not users. I will make a change in git soon to deal with this issue. https://github.com/openrc/openrc/commit/f62253b This adds runit support that is very similar to the s6 support. This will be included in 0.22. |