Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 584386 - sci-misc/boinc-7.6.31-r3: * start-stop-daemon: boinccmd [ !! ], * ERROR: boinc failed to stop
Summary: sci-misc/boinc-7.6.31-r3: * start-stop-daemon: boinccmd [ !! ], * ERROR: boin...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Sven Eden
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-28 10:18 UTC by peter@prh.myzen.co.uk
Modified: 2017-01-09 09:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info (emerge.info,5.54 KB, text/plain)
2016-06-09 08:00 UTC, peter@prh.myzen.co.uk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description peter@prh.myzen.co.uk 2016-05-28 10:18:03 UTC
# /etc/init.d/boinc stop
 * Caching service dependencies                                       [ ok ]
 * Stopping boinc ...
 * start-stop-daemon: boinccmd                                        [ !! ]
 * ERROR: boinc failed to stop
#

Boinc does end up stopped, but before I can start it again I have to zap it.


Reproducible: Always
Comment 1 Sven Eden 2016-06-08 15:31:13 UTC
I haven't seen this yet.

>  * Caching service dependencies                                       [ ok ]

What did you change that triggered the update message?

for me it always looks like this:

# /etc/init.d/boinc start
 * Starting boinc ...                                                   [ ok ]
# /etc/init.d/boinc stop
 * Stopping boinc ...                                                   [ ok ]

What is the content of /etc/conf.d/boinc ?
Comment 2 peter@prh.myzen.co.uk 2016-06-09 07:59:00 UTC
I've always run boinc as myself, in a directory below my home - actually, in a separate partition which I mount there at boot time.

$ cat /etc/conf.d/boinc
# Config file for /etc/init.d/boinc
# Owner of BOINC process (must be existing)
USER="prh"
GROUP="prh"
# Directory with runtime data: Work units, project binaries, user info etc.
RUNTIMEDIR="/home/prh/boinc"
# Location of the boinc command line binary
BOINCBIN="/usr/bin/boinc_client"
# Allow remote gui RPC yes or no
ALLOW_REMOTE_RPC="no"
# nice level
NICELEVEL="19"

----------------------

$ grep boinc /etc/fstab
/dev/nvme0n1p8  /home/prh/boinc         ext4    relatime        1 3

----------------------

In case it matters, these are the USE flags I set:

$ cat /etc/portage/package.use/boinc
app-emulation/virtualbox        additions extensions java python
x11-libs/wxGTK                  webkit

This is now a ~amd64 setup. I did that because my graphics card was only released last November and its drivers and microcode are developing rapidly, and that gets complex to manage without the blanket ~arch.

I should have included emerge --info the first time; I'll attach it now.
Comment 3 peter@prh.myzen.co.uk 2016-06-09 08:00:16 UTC
Created attachment 436926 [details]
emerge info
Comment 4 peter@prh.myzen.co.uk 2016-06-10 12:02:02 UTC
I've just noticed another thing. I had shut boinc down and zapped it, both via the init script, then I had cause to reboot. During system shutdown, the init script tried to start boinc and failed. And now I find, after /etc/init.d/boinc start, no boinc entry under /run. Doesn't start-stop-daemon write a pid file there?
Comment 5 peter@prh.myzen.co.uk 2016-10-17 10:12:09 UTC
New behaviour: during system shutdown, I get an error to the effect that a null character or field is being dropped at line 62 of /etc/init.d/boinc, which is the first line of the function need_passwd_arg():

local vers=$(${BOINCBIN} --version | cut -d '.' --output-delimiter='' -f 1,2)

Does that help? Boinc is at 7.6.33 here, and has been so since my record began on 1 Sept 16.
Comment 6 Sven Eden 2016-10-26 12:16:45 UTC
I'll be taking a look at the init script next week, as there is a feature request that is still to be added.
Hopefully I'll find out what the issue is though...
Comment 7 peter@prh.myzen.co.uk 2016-10-26 12:40:07 UTC
I've just noticed something that might help, Sven. I had "tail -f ~/boinc/stdoutdae.txt" running in one Konsole, then as root in another I called "/etc/init.d/boinc stop".

The init script returned this straight away:
 * Stopping boinc ...
 * start-stop-daemon: boinccmd died                      [ !! ]
 * ERROR: boinc failed to stop

But stdoutdae.txt didn't report "Exiting" until several seconds later. Should the script wait for a return code from boinccmd before deciding between success and failure?

Good luck!
Comment 8 Sven Eden 2016-11-04 13:16:02 UTC
(In reply to Peter Humphrey from comment #5)
> New behaviour: during system shutdown, I get an error to the effect that a
> null character or field is being dropped at line 62 of /etc/init.d/boinc,
> which is the first line of the function need_passwd_arg():

I just got this myself. Finally I have managed to reproduce at least something!
Comment 9 Sven Eden 2016-11-04 13:31:54 UTC
Okay, this is evil.

The original idea of that line was to have a two digit number to compare against.

but "cut --output-delimiter ''" actually sets a 0-Byte. The output of the command reads:

--------
 $ hexdump -C foo
00000000  37 00 36 0a                                       |7.6.|
--------

Nasty.

However, I have come up with a different solution.
 1) just use "tr -d ." to delete the dots.
 2) Then compare against $(expr substr "$vers" 1 2)

This should be fully /bin/sh compatible.
Comment 10 Sven Eden 2016-11-07 08:46:37 UTC
I got the fix for the "null byte input" in a commit for a pull request:

https://github.com/gentoo/gentoo/pull/2768/commits/5866ce9875cfde0e7ff7f2ef0dd5a0bb83bcd85e

I do hope I can find out what it is with the initial matter, too.
Comment 11 peter@prh.myzen.co.uk 2016-11-07 09:03:42 UTC
In case it isn't clear, I should point out that I only get the failed-to-stop error on this ~amd64 box which has boinc under my home directory, with me as user and group. On another box, x86, with boinc in its standard place, I don't get the error.
Comment 12 Sven Eden 2016-11-07 09:37:52 UTC
(In reply to Peter Humphrey from comment #11)
> In case it isn't clear, I should point out that I only get the
> failed-to-stop error on this ~amd64 box which has boinc under my home
> directory, with me as user and group. On another box, x86, with boinc in its
> standard place, I don't get the error.

Oh hell! That's exactly what I never tried to do. So here we've got a good chance we can get this fixed eventually. I'll try this out once my PR is through.
Comment 13 Sven Eden 2016-11-09 11:43:12 UTC
Okay, so here is what I tried:

 1: Copy /var/lib/boinc to $HOME/boinc_test
 2: chown -R <user>:<group> $HOME/boinc_test (was boinc:boinc after rsync)
 3: Changed USER, GROUP and RUNTIMEDIR in /etc/conf.d/boinc to reflect the move
 4: /etc/init.d/boinc start

This works so far, my projects are running.

 5: Started "optirun boincmgr"
 6: Connected to local, showing all running projects as ever.
    (Well, my nvidia GPU could not be found, because I forgot (as always)
    to start something with opti/primusrun before starting boinc.)

So until now, everything is normal. Let's see whether I can (eventually!) reproduce the stop bug.

 7: Shut down boincmgr
 8:  ~ # /etc/init.d/boinc stop
 * Stopping boinc ...                                                    [ ok ]

Argh... no. No error. dammit!

Okay. To use boincmgr I have set ALLOW_REMOTE_RPC to "yes". Let's switch to "no" and see what happens.

 9: Disable ALLOW_REMOTE_RPC
10: Start boinc and wait for projects to start.
    (boincmgr can still connect btw.)
11:  ~ # /etc/init.d/boinc stop
 * Stopping boinc ...                                                    [ ok ]

There must be something else. I just can not reproduce this.

Once my PR is through, should be due today, boinc-7.6.33-r1 (which I use for testing) becomes available.
Please test with that version whether you still get the error.
Comment 14 Sven Eden 2016-11-14 08:19:43 UTC
The most recent version sci-misc/boinc-7.6.33-r1 is in the tree now.

Could you please try it out whether stopping boinc is still a problem with that version?
Comment 15 peter@prh.myzen.co.uk 2016-11-18 11:02:24 UTC
Sorry, Sven, but it's made no difference here.
Comment 16 Sven Eden 2016-11-24 09:42:47 UTC
(In reply to Peter Humphrey from comment #15)
> Sorry, Sven, but it's made no difference here.

That is quite unfortunate...

Well, I guess the line

> * start-stop-daemon: boinccmd died

means, that boinccmd segfaulted. Maybe we are better off trying to find out how and where?

Could you please try the following: (as root)

> ~ # ulimit -c unlimited
> ~ # /etc/init.d/boinc start
(... wait for some projects to start ...)
> ~ # cd /home/prh/boinc
> boinc # boinccmd --quit
> boinc # echo $?
> 0

See whether it segfaults, and if so, whether you get a core dump. I want that dump. ;-)

And the exit value would be good to know, too.
Comment 17 peter@prh.myzen.co.uk 2016-11-24 16:38:50 UTC
Okay, I did as you asked. I got no misbehaviour at all :-(

peak ~ # /etc/init.d/boinc stop
 * Stopping boinc ...
 * start-stop-daemon: boinccmd died                              [ !! ]
 * ERROR: boinc failed to stop
peak ~ # /etc/init.d/boinc zap
 * Manually resetting boinc to stopped state
peak ~ # ps ax | grep boinc
 1875 ?        SNsl   2:25 /usr/bin/boinc_client --daemon --dir /home/prh/boinc --redirectio
peak ~ # ps ax | grep boinc
peak ~ # ulimit -c unlimited
peak ~ # /etc/init.d/boinc start
 * Starting boinc ...                                                             [ ok ]
peak ~ # cd /home/prh/boinc
peak boinc # boinccmd --quit
peak boinc # echo $?
0

I'll try testing with different values of boinc user and boinc directory when I get a minute.
Comment 18 Sven Eden 2016-12-08 09:47:45 UTC
(In reply to Peter Humphrey from comment #7)
> Should the script wait for a return code from boinccmd before deciding
> between success and failure?

This is what I have in my current PR. Once it got through, you'll have to upgrade to the -r2 revision, which installs an updated init script.

The new behaviour is to not use boinccmd to send a quit signal, which returns at once, but to have start-stop-daemon to patiently make the boinc_client to quit.

I do hope that this eventually fixes this issue. I really do not want to carray this over to 2017! ;-)
Comment 19 peter@prh.myzen.co.uk 2017-01-07 15:19:34 UTC
Seems to have worked!

It arrived here a couple of days ago and I've watched lots of reboots; the change just adds an extra 6 to 13 seconds to the shut-down, mostly at the low end of the range.

Thanks for your efforts Sven, and I hope I didn't spoil your Christmas!
Comment 20 Sven Eden 2017-01-09 09:05:15 UTC
I am very glad it worked out for you as well!

The delay is necessary for boinc to really shut down all running projects and to safe all data. My longest delay so far was almost 70 seconds. (There was a lot going on with some pending up- and downloads, too)

Really, you didn't spoil anything. I am just happy it worked out eventually.

And because of your bug report, the init script is now safe. The previous method of just calling the client to quit and then moving on *might* have caused data loss or even corruption. rarely, I believe, but nevertheless highly unpleasant!

So thank you very much!