I have a laptop which I often connect via wireless. After rebooting I discovered that boinc did not start but was pending a net connection. At that point I had two tasks processing and another downloaded waiting; so I had hours of work I could be doing. To work around this I simply commented out "need net" in the depend section of the /etc/init.d/boinc init script. It is clear why this dependency was added, but it breaks the ability of boinc to work with an intermittent connection. Running boinc without this connection causes the scheduler to ping once a minute to see if a connection has been established. I am not sure what the best way to resolve this situation is, but personally I would like to be able to continue processing while the machines are not connected. EBo --
Well understand your point. But the problem with boinc and no net are more common so the dep was added. You can start ANY net interface to get the boinc running (for example i have net.eth0 (cable) started all the time even if there is no cable pluged in so i can "boinc")
Actually, that is not working for me. I have eth0 started up and waiting for a connection (via ifplugd), and lo with a loop-back. This condition holds true as long as one of the devices does not have a valid ipv4/6 net address (or at least that is the way it appears). As a note, I do not start the address with a fall-back IP number since I have had that interfere with ifplugd's (re)connection. It makes sense for you to leave it the way it is, but it does behave differently than I saw advertised somewhere. Maybe this can be handled with some configuration (a REQUIRE_NET, or ALLOW_DISCONNECTED_RUN...) Should we go ahead and change this to WONTFIX, or is there room for further discussion?
Nono, dont close it, i will try to think up something, but i cant promise it will be really soon (few weeks or so).
Fair enough... I'll see if I can think of something too. Thanks for considering this ;-)
Tomáš, More info... I decided to take a look at what was going on with my WCG statistics since they dd not seem quite right. I have been running DDDT in the background for the last week on my core2duo machine (with SMP running and two tasks basically running constantly). I should be returning 6 to 8 results/day. As it is I am returning 2 to 4... Looking at my message log I see: Tue Feb 24 20:48:42 2009|World Community Grid|Restarting task dddt0902b0075_100479_0 using dddt version 606 Tue Feb 24 20:48:42 2009|World Community Grid|Scheduler request failed: Couldn't resolve host name Tue Feb 24 20:48:43 2009|World Community Grid|Task dddt0902b0084_100038_0 exited with zero status but no 'finished' file Tue Feb 24 20:48:43 2009|World Community Grid|If this happens repeatedly you may need to reset the project. Tue Feb 24 20:49:44 2009|World Community Grid|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks Tue Feb 24 20:49:49 2009|World Community Grid|Scheduler request completed: got 0 new tasks It appears that I am dropping between 50-75% of all my results, and the problem appears to be related to dropped network connection. My current network connection is flaky at best, and will continue to be so until the new cable is installed to the pole (due to hurricane damage). I expect this problem to continue for maybe another month. But this may provide us a stress-test situation of computing on intermittent connectivity. Anyway, I thought I would mention this for something else to add to your head-scratching. It might help point to other scheduling scenarios and tests. Hope this helps, and best regards. EBo --
Well, one could just edit /etc/rc.conf (baselayout-2 !) this way: # Do we allow any started service in the runlevel to satisfy the depedency # or do we want all of them regardless of state? For example, if net.eth0 # and net.eth1 are in the default runlevel then with rc_depend_strict="NO" # both will be started, but services that depend on 'net' will work if either # one comes up. With rc_depend_strict="YES" we would require them both to # come up. rc_depend_strict="NO" Then net.lo is enough to satisfy the net dependency. Downside is: Some other net depending services may fail to start (e.g. ntp-client)
baselayout-2 is currently marked as unstable on my distro (gentoo). If there is no other solutions then I guess I will not participate when my laptop is connected wirelessly (since I appear to be loosing roughly 50% of the results), and/or wait until baselayout-2 becomes stable. I have to many things going on at the moment to comfortably try that change. Thanks, EBo --
There is a similar approsch in baselayout 1, but I don't recall the exact file and conf var.
Sorry for the long delay... (In reply to comment #8) > There is a similar approsch in baselayout 1, but I don't recall the exact file > and conf var. RF_NET_STRICT_CHECKING="NO" appears to be the equivalent. I have not fully tested it though, but it seems to work as expected. I am still loosing maybe 50% of my tasks though, but it is unrelated to this. I will see if I can track it down and either post another bug or see if I can discuss it on IRC or something. Thanks EBo --
I was going through my open issues and came back across this. For the last couple of months I discontinued using boinc because I am still dropping 50-75% of my results. I have spent a little more time trying to figure out what is going on and have the following to add. As a note, I have only briefly looked at the code and cannot offer a patch, but can discuss some of the overall behaviour: 1) Some time within the last hour or so, the system is notified to download the next packet. 2) as soon as a job is finished, the new job is loaded and run. 3) when the new job is started the server is contacted and initial handshaking is done to announce that a potential result is done -- however the result is not uploaded. 4) the job continues to run until it is ready to load another job (see 1 above). At this point the previous results are finally loaded. Now if I lost my net connection any time in the 4 to 6 hours between steps 3) and 4), I seem to loose that job entirely. My guess at this point is that the scheduler needs a loop around the hand-shake/upload portion of step 3) which runs until all solutions which are ready are uploaded. Hope this helps. EBo --
I just reviewed this again, and the thing I forgot to mention before is that I have a dual core machine and allow 100% usage when not otherwise occupied. The problem appears to surround the fact that there are two processes running simultaneously. I am running a new test case with only 49% of the processing space (ie 90% of a single processor). I'll report the behaviour I find... EBo --
Ok... that's weird. at 49% I boinc_gui tells me that I "Won't get new processes". We will see what I get with 51%...
Ok... 51% runs, but only after I cancel one of the two processes it wants to run on this two core machine... I think that the scheduler needs to be re-examine in respect to multi core/processor machines. I'll report back what I see with 51% utilization... EBo --
Just quick note, i am reading throught this, but still have no clue why it hate you.
(In reply to comment #14) > Just quick note, i am reading throught this, but still have no clue why it hate > you. When I first read this I thought you meant that "you" hated me not "boinc", and I was REALLY confused?!?!?! I was trying to figure out how I had offended you... Ok I get it now ;-) When I set the multi-use CPU to >50.00 (set to 51%) it seems to work correctly running only a single process. It was strange when I gave it 49% that it choked like it did. My guess is that there is a bug in the scheduler which marks a result as bad if it cannot confirm each step (send, waiting for result, waiting for confirmation, etc.). The problem of dropping results seems to only happen when I loose net connection in the middle of the reporting the results and getting credit for it. BTW, the fan on my laptop is about to die, so until I can fix or replace it I will be shutting off Boinc -- to keep the heat down. I'll let the current job run and post tonight, but I need to keep things cool... Thanks for all your help, and with any luck we can get this sorted out sometime...
What exactly are those problems with Boinc and no net? I've run perfectly fine for years with "after net" instead of "need net" in the init script.
Martin, It has been over a year since I looked at this. After net should work fine, but last time I checked I was still dropping work done packets more than 50% of the time. This had something to do with handshaking failure to report the results when the net was down, and the results were being thrown away instead of being uploaded later. I do consider the dropped results a bug. I mean, why spend the electricity/ware-and-tare when the results are just being pitched. Unfortunately, the laptop I was using for this has developed *issues* with the fan, and I have since throttled the processor to keep it from overheating (the cost of replacing the fan is 25% of replacing the entire laptop), so I am not running boinc at the moment... Hopes this helps.
Isthis bug still relevant?
Is this still present in version 7.2.0?
(In reply to Justin Lecher from comment #19) > Is this still present in version 7.2.0? It will take a little while to test this. Will try to report back in the next week or two.