948143 – sci-misc/boinc: excessive shutdown delays

Bug 948143 - sci-misc/boinc: excessive shutdown delays

Summary: sci-misc/boinc: excessive shutdown delays

Status:	CONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo Science Related Packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2025-01-15 11:47 UTC by peter@prh.myzen.co.uk
Modified:	2025-02-10 01:07 UTC (History)
CC List:	5 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Patch: Use standard $pidfile and $retry variables (0001-sci-misc-boinc-7.24.1-r2-Use-standard-pidfile-and-re.patch,8.98 KB, patch) 2025-01-20 07:18 UTC, Alexis	Details \| Diff
Patch: Use standard $pidfile and $retry variables (0001-sci-misc-boinc-7.24.1-r2-Use-standard-pidfile-and-re.patch,8.94 KB, patch) 2025-01-21 00:51 UTC, Alexis	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description peter@prh.myzen.co.uk 2025-01-15 11:47:43 UTC

Whenever I stop boinc, nothing happens for a full minute; then a second SIGTERM stops it immediately. /etc/init.d/boinc includes a custom shutdown command, which portage restores on update, overwriting any change I make. I have two machines running BOINC projects: one machine runs a single job at a time, the other runs 18-24 jobs concurrently. In both cases, waiting 60s after the first SIGTERM is excessive.

I asked on the linux-users email list how I could prevent portage from auto-updating /etc/init.d/boinc, and received the following reply:

------------
In this case the init script is using a custom variable for the
timeout, and setting that variable unconditionally:

  stop() {
    local stop_timeout="SIGTERM/60/SIGTERM/30/SIGKILL/30"
    ...
  }

What would be much nicer is if it

 1. Used the standard $retry variable for this (man openrc-run)
 2. Set $retry only if it's unset

Then you could simply provide your own $retry in boinc.conf. Going a
bit further, it could move the env_check into stop_pre(), and use
$pidfile instead of the custom $BOINC_PIDFILE. That would make the
entire stop() function redundant.
------------

Please implement this more elegant solution.

Comment 1 Alexis 2025-01-20 07:18:19 UTC

Created attachment 917118 [details, diff]
Patch: Use standard $pidfile and $retry variables

Initial patch attached. i'm not a sci-misc/boinc user myself, but i'd like to become more familiar with the workings of OpenRC, so here we are.

Although the comment suggests that the entire stop() function could be removed, i've not yet done so, because i don't understand how the value of $retry could be utilised otherwise. Would a non-existent stop() function cause start-stop-daemon to be called with `--stop --pidfile "${pidfile}" --retry $retry` (and maybe `--progress`) as arguments? Happy to be enlightened. :-)

Comment 2 Michael Orlitzky gentoo-dev

2025-01-20 14:32:09 UTC

(In reply to Alexis from comment #1)
> Although the comment suggests that the entire stop() function could be
> removed, i've not yet done so, because i don't understand how the value of
> $retry could be utilised otherwise. Would a non-existent stop() function
> cause start-stop-daemon to be called with `--stop --pidfile "${pidfile}"
> --retry $retry` (and maybe `--progress`) as arguments? Happy to be
> enlightened. :-)

There are a bunch of variables that OpenRC will supply to start-stop-daemon by default. They're all listed in the openrc-run man page, but it might be quicker to just look in /lib/rc/sh/start-stop-daemon.sh:

        eval start-stop-daemon --start \
                --exec $command \
                ${chroot:+--chroot} $chroot \
                ${directory:+--chdir} $directory \
                ${output_log+--stdout} $output_log \
                ${error_log+--stderr} $error_log \
                ${output_logger:+--stdout-logger \"$output_logger\"} \
                ${error_logger:+--stderr-logger \"$error_logger\"} \
                ${capabilities+--capabilities} "$capabilities" \
                ${secbits:+--secbits} "$secbits" \
                ${no_new_privs:+--no-new-privs} \
                ${procname:+--name} $procname \
                ${pidfile:+--pidfile} $pidfile \
                ${command_user+--user} $command_user \
                ${umask+--umask} $umask \
                $_background $start_stop_daemon_args \
                -- $command_args $command_args_background

If you set all of those variables carefully and if the daemon isn't too weird, you can usually omit the boilerplate start() and stop() functions entirely.

More on all of that here:

  https://github.com/OpenRC/openrc/blob/master/service-script-guide.md

Comment 3 Alexis 2025-01-20 23:58:00 UTC

(In reply to Michael Orlitzky from comment #2)

> There are a bunch of variables that OpenRC will supply to start-stop-daemon
> by default. They're all listed in the openrc-run man page, but it might be
> quicker to just look in /lib/rc/sh/start-stop-daemon.sh

Yeah, i looked at the man page, but it doesn't seem to specify exactly which of the variables are utilised by default in the various functions? (i might create a patch to address this.) Still, yes, sorry, i should have just looked at the source for start-stop-daemon.sh ....

Initially i incorrectly assumed the source you quoted was for stop(), but it's clearly for start(). :-P The source for stop() is:

  start-stop-daemon --stop \
          ${retry:+--retry} $retry \
          ${command:+--exec} $command \
          ${procname:+--name} $procname \
          ${pidfile:+--pidfile} $chroot$pidfile \
          ${stopsig:+--signal} $stopsig \
          ${_progress}

So i'll update the patch to remove the custom stop() function.

Comment 4 Alexis 2025-01-21 00:51:40 UTC

Created attachment 917192 [details, diff]
Patch: Use standard $pidfile and $retry variables

Comment 5 Alexis 2025-01-21 00:52:18 UTC

Patch updated.

Comment 6 peter@prh.myzen.co.uk 2025-02-06 15:09:57 UTC

(In reply to Alexis from comment #5)
> Patch updated.

Please see this post on the gentoo-users list on Friday 31 January 2025:

https://archives.gentoo.org/gentoo-user/23817797.6Emhk5qWAg@cube/

...with this correction:

> *  'tail -f /var/lib/boinc/stdoutdae.txt' showed boinc exiting instantly,
> and gkrellm showed CPU use dropping to zero. It's hard to be definite about
> what /bin/top shows, as it only updates every 3s, gkrellm every 2s. That
> caveat applies to all the times I've mentioned in this thread.

That should have been "gkrellm every second."

Comment 7 Sven Eden 2025-02-07 13:54:17 UTC

> waiting 60s after the first SIGTERM is excessive.

The shutdown isn't waiting. It is looking constantly whether the processes ended and terminates immediately when they did.
At least that is what is supposed to happen.

If BOINC shuts down before the 60 seconds, start-stop-daemon returns.

From its man page:
>       -R, --retry timeout | signal/timeout
>               The retry specification can be either a timeout in seconds or
>               multiple signal/timeout pairs (like SIGTERM/5).  If this option  is
>               not given, the default is SIGTERM/5.

The times are chosen that big, because some boinc projects, especially late in their calculations, tend to take a long time to end themselves gracefully, meaning without you losing progress or corrupting your data.

If the start-stop-daemon suddenly waits for the timeout although the boinc client and all its projects are already ended, then the bug lies in the start-stop-daemon.

openrc-run, unless you set another supervisor, uses start-stop-daemon,so its retry value would simply go to that, which would end up in changing nothing.

Here is the bug that caused the stop command to be like it is today.
https://bugs.gentoo.org/584386

However, if there is a good way to modernize it that makes sense, I am all for it.

Comment 8 peter@prh.myzen.co.uk 2025-02-07 14:28:25 UTC

(In reply to Sven Eden from comment #7)

> If the start-stop-daemon suddenly waits for the timeout although the boinc
> client and all its projects are already ended, then the bug lies in the
> start-stop-daemon.

That's the difficulty: how can we tell whether the client is reporting its completion properly?

Meanwhile, Alexis's patch enables me to evade the problem by setting my own values in boinc.conf to suit me and the physics projects I run.

PS. Since bug 584386 I've reverted to the standard layout under /var/lib/boinc and added myself to the boinc group. This choice of directory is not connected with the present bug.

Comment 9 Sven Eden 2025-02-07 14:38:05 UTC

(In reply to peter@prh.myzen.co.uk from comment #8)
> (In reply to Sven Eden from comment #7)
> 
> > If the start-stop-daemon suddenly waits for the timeout although the boinc
> > client and all its projects are already ended, then the bug lies in the
> > start-stop-daemon.
> 
> That's the difficulty: how can we tell whether the client is reporting its
> completion properly?

Easy, it is gone

> 
> Meanwhile, Alexis's patch enables me to evade the problem by setting my own
> values in boinc.conf to suit me and the physics projects I run.

Interesting, because if no different supervisor is set, openrc-run calls start-stop-daemon, which means that the patch does not actually change anything but putting openrc-run in between as a proxy.
If that alone helps, okay, but it doesn't make any sense to me.
(Unless I am overlooking some important detail here, of course.)

> 
> PS. Since bug 584386 I've reverted to the standard layout under
> /var/lib/boinc and added myself to the boinc group. This choice of directory
> is not connected with the present bug.

That bug ended up being about boinccmd not being suitable to end boinc in an init script, hence we changed to start-stop-daemon.

Comment 10 Sven Eden 2025-02-07 14:40:23 UTC

(In reply to Sven Eden from comment #9)
> start-stop-daemon, which means that the patch does not actually change
> anything but putting openrc-run in between as a proxy.

It does make the retry string configurable, and that is a good thing that we absolutely should add.

Just be sure you know what you are doing, because killing boinc projects while they save their data... Well, let's just say I have bad experience with that one. ;-)

Comment 11 Alexis 2025-02-07 21:59:23 UTC

(In reply to Sven Eden from comment #9)
 
> Interesting, because if no different supervisor is set, openrc-run calls
> start-stop-daemon, which means that the patch does not actually change
> anything but putting openrc-run in between as a proxy.
> If that alone helps, okay, but it doesn't make any sense to me.
> (Unless I am overlooking some important detail here, of course.)

Further to Peter's most recent comment, all i have done is make some modifications to the existing setup of boinc.init and boinc.conf in -r1: primarily to allow user configuration of the `retry` variable in the boinc.conf file, but also to make use of the standard `pidfile` variable, and then to remove the unnecessary `stop()` function. i have not somehow added "openrc-run in between as a proxy" where that wasn't being done before, and i'm not sure which of the proposed changes in boinc.init and boinc.conf led you to that conclusion?

Comment 12 Michael Orlitzky gentoo-dev

2025-02-08 14:54:23 UTC

Does boinc let you attach to more than one project at the same time? Or in other words, is there any reason you would want to run two instances of the boinc service at the same time? (If you can attach to more than one project using the same instance, it's a lot less likely that you would want to run two or more instances simultaneously.)

I'm asking because there are more simplifications that can be made to the init script if we optimize for the common case (one instance):

  * We can set the ownership of /var/lib/boinc in the ebuild.
  * Likewise, we can symlink the cert bundle into /var/lib/boinc in the ebuild.
  * RUNTIMEDIR no longer needs to be configurable.
  * The whole create_work_directory() function can then be deleted.
  * The BOINC_USER and BOINC_GROUP can be hard-coded to "boinc" to match
    the acct-user and acct-group packages that we install for it.

In any case, I think it's pretty unlikely that BOINCBIN and BOINCCMD are needed. We install those programs from the same ebuild that installs the init script. If they ever change in a new version of boinc, we would just change them in a new revision of the init script to match the new version of boinc. (It may still make sense to use a variable for them in the init script, but hard-coded rather than set to $(which ...), and without the corresponding entries in boinc.conf.)

Afterwards, I'm pretty sure we could put

  : ${NICELEVEL:="19"}

at the top of the script, and the whole env_check() function would be redundant, too.

Comment 13 peter@prh.myzen.co.uk 2025-02-08 16:32:14 UTC

(In reply to Michael Orlitzky from comment #12)
> Does boinc let you attach to more than one project at the same time? Or in
> other words, is there any reason you would want to run two instances of the
> boinc service at the same time? (If you can attach to more than one project
> using the same instance, it's a lot less likely that you would want to run
> two or more instances simultaneously.)

Boinc will not start if another instance is already running. The client manages the attachment of projects and the running of fheir tasks, together with requesting data batches and uploading results.

> I'm asking because there are more simplifications that can be made to the
> init script if we optimize for the common case (one instance):
> 
>   * We can set the ownership of /var/lib/boinc in the ebuild.
>   * Likewise, we can symlink the cert bundle into /var/lib/boinc in the
> ebuild.
>   * RUNTIMEDIR no longer needs to be configurable.
>   * The whole create_work_directory() function can then be deleted.
>   * The BOINC_USER and BOINC_GROUP can be hard-coded to "boinc" to match
>     the acct-user and acct-group packages that we install for it.
>
> In any case, I think it's pretty unlikely that BOINCBIN and BOINCCMD are
> needed. We install those programs from the same ebuild that installs the
> init script.

I'm inclined to agree with that point.

> If they ever change in a new version of boinc, we would just
> change them in a new revision of the init script to match the new version of
> boinc. (It may still make sense to use a variable for them in the init
> script, but hard-coded rather than set to $(which ...), and without the
> corresponding entries in boinc.conf.)
> 
> Afterwards, I'm pretty sure we could put
> 
>   : ${NICELEVEL:="19"}
> 
> at the top of the script, and the whole env_check() function would be
> redundant, too.

I think all this flexibility is built into BOINC (the application) as part of the need to be installed on several OSes, and it's a matter of judgement how much of the upstream flexibility should be propagated into the Gentoo case.

Comment 14 Michael Orlitzky gentoo-dev

2025-02-10 01:07:47 UTC

(In reply to peter@prh.myzen.co.uk from comment #13)
> 
> Boinc will not start if another instance is already running. The client
> manages the attachment of projects and the running of fheir tasks, together
> with requesting data batches and uploading results.

You can probably trick it into running two instances with enough effort, by giving them separate users, separate data dirs, separate pidfiles...

OpenRC makes it easy to run multiple instances of a service, each with a different name, by creating symlinks to the main service. (Your net.<iface> services are most likely symlinks to net.lo). If there is a good reason for someone to do that, you can take some extra care in the service script to make sure that the user, data dir, pidfile, etc all depend on the service name... but in this case it doesn't sound like there is a good reason to do that.