Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 69854 - parallel startup scripts is not parallel enough
Summary: parallel startup scripts is not parallel enough
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: All Linux
: High enhancement (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: InVCS
: 73566 83903 (view as bug list)
Depends on:
Blocks: 69579
  Show dependency tree
 
Reported: 2004-11-02 09:39 UTC by Paul Pacheco
Modified: 2006-07-09 23:45 UTC (History)
16 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
parallel_startup.patch (parallel.patch,7.91 KB, patch)
2004-11-09 07:41 UTC, Paul Pacheco
Details | Diff
parallel_startup.patch (parallel.patch,7.55 KB, patch)
2004-11-10 08:35 UTC, Paul Pacheco
Details | Diff
parallel_startup.patch (parallel.patch,8.42 KB, patch)
2004-11-17 07:40 UTC, Paul Pacheco
Details | Diff
parallel_startup.patch (parallel.patch,8.82 KB, patch)
2004-11-30 09:59 UTC, Paul Pacheco
Details | Diff
parallel patch for baselayout 1.11.9-r1 (parallel.patch,8.87 KB, patch)
2005-02-17 06:41 UTC, Paul Pacheco
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Pacheco 2004-11-02 09:39:38 UTC
I understand there is a variable RC_PARALLEL_STARTUP="yes" that supposedly makes the init scripts start in parallel as long as there is no dependency between them.

But looking at the code, this is what I found in /lib/rcscripts/sh/rc-services.sh:

In line 557:

# void schedule_service_startup(service)
#
#   Schedule 'service' for startup, in parallel if possible.
#
schedule_service_startup() {
 local count=0
 local current_job=

 if [ "${RC_PARALLEL_STARTUP}" = "yes" ]
 then
  set -m +b

  if [ "$(jobs | grep -c "Running")" -gt 0 ]
  then
   if [ "$(jobs | grep -c "Running")" -eq 1 ]
   then
    if [ -n "$(jobs)" ]
    then
     current_job="$(jobs | awk '/Running/ { print $4}')"
    fi
    
    # Wait if we cannot start this service with the already running
    # one (running one might start this one ...).
    query_before "$1" "${current_job}" && wait

   elif [ "$(jobs | grep -c "Running")" -ge 2 ]
   then
    count="$(jobs | grep -c "Running")"

    # Wait until we have only one service running
    while [ "${count}" -gt 1 ]
    do
     count="$(jobs | grep -c "Running")"
    done

    if [ -n "$(jobs)" ]
    then
     current_job="$(jobs | awk '/Running/ { print $4}')"
    fi

    # Wait if we cannot start this service with the already running
    # one (running one might start this one ...).
    query_before "$1" "${current_job}" && wait
   fi
  fi

  if iparallel "$1"
  then
   eval start_service "$1" \&
  else
   # Do not start with any service running if we cannot start
   # this service in parallel ...
#   wait
   
   start_service "$1"
  fi
 else
  start_service "$1"
 fi

 # We do not need to check the return value here, as svc_{start,stop}() do
 # their own error handling ...
 return 0
}



Note if there is two or more processes running, a busy wait starts until only one remains, this effectivelly means that up to 2 init scripts can run at the same start. Also, the busy wait eats CPU and slows down startup killing any benefit from parallel startup.

A better approach would be to start them all, and let each one start and wait for their dependencies to finish, with no busy waits. The complexity here is that two scripts might depend on another common script so it should not start twice. I don't know how to wait for something to happen in bash in a suitable manner. A few pointer would be appreciated.


Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1 Gustavo Sverzut Barbieri 2004-11-02 19:02:01 UTC
Any idea on how to solve this?

Since I plan to planify the whole thing into one file you can think into this if it helps.

The hard part IMO is how to start up the services when the requisites are done. There is any way to make them wait for a signal or for a file and them start? Dbus? Something like that?

I'm sleepy now, but one thing is to call a "satisfied( prerequisite )" which will write to a file that need this prerequisite and if it's the last one, start it.
   So the process is:

- start script with every dependency satisfied.
- when script stops, check for every service that depends on this, write to its list of satisfied prerequisites.
- check if every prerequisite is satisfied (of files depending on this), if it's and service is not started (or starting), start it.

Any better solution? Tomorrow (after some sleep today) I'll think better about it.


Comment 2 Gustavo Sverzut Barbieri 2004-11-02 19:33:54 UTC
For the impatient (like me):

the service could start and check for its dependencies, if one of them is not satisfied it could:
   echo $$
   kill -19 $$   # SigSTOP

that will hang it (no busy wait) until SIGCONT is sent. Just need to figure out a mechanism to start those services when need... so every service will send SIGCONT to services that depends on it after it is up. With this we could start ALL need scripts at once and them will block themselves waiting for dependencies. Something like:

# net.eth0. An example!
DEPENDS_ON_ME="$(depends_on net)"
DEPENDENCIES="hotplug pcmcia"
for dependency in $DEPENDENCIES; do
   if ! dependencySatisfied "$dependency"; then
      kill -19 $$    #blocks and wait for dependency.
   fi
done

# do things...

for d in $DEPENDS_ON_ME; done
   kill -18 $(pidof $d)   #unblock services that depends on this.
done
Comment 3 Paul Pacheco 2004-11-03 07:20:33 UTC
The idea is good, but I see a race condition:

Script A executes the if

 if ! dependencySatisfied "$dependency"; then

which is successfull because a dependency B is still running,

Right there, the dependency B finishes and sends the signal to A
  kill -18 $(pidof $d)   #unblock services that depends on this.

Because A is still not sleeping, the signal will get lost.

Now A script goes ahead and executes:
 kill -19 $$    #blocks and wait for dependency.

and A will sleep for ever.

I honestly don't know how to solve the problem yet.
Comment 4 Paul Pacheco 2004-11-03 07:31:00 UTC
Another issue is this:

Script A depends on B and C

A sleeps on B

C finishes and sends signal to A

A wakes up and checks for C which finished, and A continues.

So we end up with A and B running in parallel.
Comment 5 Paul Pacheco 2004-11-03 07:38:33 UTC
I think we are going to have to do a busy wait with a small sleep inside.

something like this:

dependenciesdone=false
while ! dependenciesdone
  
  dependenciesdone=true
  for dependency in dependencies; do
      if ! finished $dependency; then
           dependenciesdone=false
           sleep 1
      fi
  done
done
          
# do stuff

mark me as finished

I hate busy waits and unnecesary sleeps, but I don't see a way around it do you?
Comment 6 Gustavo Sverzut Barbieri 2004-11-03 07:42:41 UTC
Oh god... dealing with race conditions in shell script is no good :(

A dirty solution is to have a mechanism "unblock( service )" that is called instead of "kill -18 service", this will check if service is unblocked (maybe a file) and put it in pending_unblock list if it's still blocked. Then this could sleep for some microseconds (ugly!!!) and iterate over the list again.

Ugly points:
   - parse this pending_unblock every time. In the common case it will be small.
   - how to wakeup the "process" that will unblock process.
Comment 7 Gustavo Sverzut Barbieri 2004-11-03 14:23:06 UTC
A friend of mine is writing a C program that will take care of this.

The method is:
   - launch every program that has no left dependency.
   - when this program exits:
      * Sucess: go through services that depends on this and remove their counter, if this counter is now 0, launch the service.
      * Failed: fail every service that depends on this.

Also, some things will have to change, like the messages. Now we print "ebegin ..." do the things and print "eend ...", this will become a mess with parallel starts, so if in parallel we'll have to change to: ebegin starting service X; eend; possible build a unique message/string.
Comment 8 Gustavo Sverzut Barbieri 2004-11-03 15:40:11 UTC
I have a question that many will probably shoot me, but I'm bulletproof! :)

Would it hurt that much to write the init scripts in other language than bash? Like python, since gentoo already needs it for portage, perl or other _real_ programing language? That would solve _MANY_ problems, like threads, protection, compile, optimizations, ... Implementing those things in bash is a pain, it's not a real language and some things are really annoying, like dealing with regular expressions, many sed's, ...

Posible problems and solutions:
 - Problem: python interpreter at startup. Solution: pyfreeze.
 - Problem: python is slower than b?ash. Solution: none so far.

I tried a simple list that do a for file in /etc/init.d/*; and print the file, and python is not that slower:

#!/bin/bash
for f in /usr/share/doc/*; do
        echo $f
done


#!/usr/bin/python2.3
import os
for f in os.listdir( "/usr/share/doc" ):
        print "/usr/share/doc/" + f

Times:
     | Python -OO  | Python   | Bash     | Ash
-----+-------------+----------+----------+------
real |  0m0.063s   | 0m0.068s | 0m0.065s | 0m0.020s
user |  0m0.013s   | 0m0.017s | 0m0.023s | 0m0.004s
sys  |  0m0.008s   | 0m0.009s | 0m0.004s | 0m0.003s

Ash is the clear winner, but port things to it is a pain :(
Comment 9 Paul Pacheco 2004-11-04 06:34:06 UTC
/usr is not mounted at that time on all computers. So when doing init scripts python may not be available. That was a show stopper for bug 55329.

C would work however.
Comment 10 Paul Pacheco 2004-11-04 07:13:32 UTC
IMHO, I believe the approach you are describing in the C program is a bit complicated and requires a lot of changes because now we have to calculate which are the scripts that depend on the current one. Also all the scripts will have to be calculated ahead of time and start running them. This will require a lot of changes.

A posible simpler approach can be to let each script start the dependencies (as it is today) and wait for each one of them to finish. What would be required is a mecanism where a script can wait for another to finish, and a way for the script to notify everybody that is waiting on it.

So we would require 2 programs:

waitForDependency <dependency name>

waits until a dependency finishes and gives the exit status of the dependency. It would block until the dependency finishes executing. If the dependency already finished, it would not block and it would still return the exit status of the dependency.

notifyDependants <dependency name> <exitstatus>

Notify all the scripts that are waiting on <dependency name> that it has finished. Exit status would get stored in disk so if someone calls waitForDependency after this, it will still get the exit status. All the calls to waitForDependency that were blocked will wake up and return the exit status.

If two scripts try to start the same dependency, only one of them should succeed, the other one should just wait until the dependency finishes. This can be done safelly using ln -sn <dependency name>, which can atomically create a link and fail if one already exists.

Can you talk to your friend about this idea to see what he thinks? Maybe you can add him to the cc of this bug. I can do this, but since he already offered and I am a little busy ...
Comment 11 Gustavo Sverzut Barbieri 2004-11-04 07:36:19 UTC
Reply to Comment #9:
That's why I mentioned pyfreeze or something that builds a package with everthing need, so you just need this binary.

C would be ok also, but there I see more problems, like lost pointers, every basic bash operation is painful to do in C and stuff like that. Python/Perl and other extra-high-level languages are as easy as shell, but C you need to deal with pointers and memory allocation and that's source for problems.

Anyway, after we have our launcher done in C it will easy things a lot. It will fork() itself and them exec the other app. I already have a very basic (just 'need' flag support) in python, after it's complete we'll convert it to pure C and otimize it.

Some points so far:
   - http://www.linuxbase.org/spec/refspecs/LSB_2.0.1/LSB-generic/LSB-generic/sysinit.html is almost as good as Gentoo, I'll try to talk to them and ask for the After and Before flags, With these we can eliminate those non-sense numbers really easy. The LSB standard is easier to parse and adopting it will open doors to it be used in other distros, like Debian, Red Hat, Suse and others.
  - Also to propose to LSB is the adoption of functions instead of the 'case' statement. It's cleaner IMHO.
  - Using the parallel startup will need to rethink message system. We should think it in ways that it allows external displays (ie: powerpc/pSeries have a LCD that can be used for this purpose). For that I have some ideas:
     * Another LSB keyword: Service-Name. It will purpose the name to show with "Starting service:". Defatuls to Provides that defaults to service name.
     * Some extra functions:
        # msg_starting service
        # msg_stopping service
        # msg_started  service
        # msg_stopped  service
        # msg_failed   service
        # msg_status   status service # This can replace started, stopped, failed.
       In parallel builds msg_{starting,stopping} would keep those until the msg_{started,stopped,failed} (or msg_status status service). Then it locks the display and outputs it. Maybe those functions should go into the launcher, this is easier IMHO. Maybe this should go into runscript and our new runscript will provide these functions, together with the "case" statement said before.


About Comment #10:
   I think the C version is still more reliable and fast. How to implement those waitForDepedency and friends are not easy. I'll talk to him but I think we'll implement it anyway, just to check how good it works.
Comment 12 Paul Pacheco 2004-11-04 08:38:44 UTC
I agree python is better for this kind of task. I was not aware of pyfreeze, and I could not find anything in google. Can you give a link?
Comment 13 Gustavo Sverzut Barbieri 2004-11-04 10:41:49 UTC
http://www.python.org/moin/Freeze

But will not take too long on this, let's solve it with bash and leave this for a future discussion.
Comment 14 Seemant Kulleen (RETIRED) gentoo-dev 2004-11-04 11:31:09 UTC
listen, instead of coding up your own stuff from scratch, maybe you guys could look into simpleinit (from util-linux tarball) or simpleinit-msb (google it) and start there.
Comment 15 Paul Pacheco 2004-11-04 15:09:59 UTC
It would take a lot of work to adapt to gentoo init scripts. And it would take a miracle for gentoo developers to accept something so disruptive.

simpleinit-msb uses runlevels like sysv 1, 2, 3, 4, ... ( gentoos runlevels are named )

simpleinit uses the case based init scripts and a daemon based controller. Gentoo uses function based init scripts that simplify considerably the amount of work required.

Also, in both systems the runlevel thing is very different to gentoo's and rc-update would have to be rewritten along with many of the init scripts.

IMHO Gentoo init scripts are better except in the parallel aspect, which I think can be solved without major surgery. I thought of a simple and safe way to do it using fifos and plain bash. It will not require too many changes to the current scripts, and it won't contain any busy wait. I will have something working soon.
Comment 16 Thomas Eckert 2004-11-06 03:47:55 UTC
guys, I only did a quick scan over the comments and read things like "code some parts of the init-process in C", "use python for the init-process", ...

well, speeding up the init-process is one thing but it should remain _maintainable_ and should depend on only few things. if we can speed up the init-process by a factor of 2 .. 3 (which saves me on my box/setup not more than ~10 secs!) we should carefully weight if this relatively small improvement is it worth complicating the overall init-process. additionally in the server-world booting of a machine is done once or twice a year!
Comment 17 Gustavo Sverzut Barbieri 2004-11-06 08:56:30 UTC
Since I read the LSB standard I want to make the boot process compilant, there's no reason to keep it non-standard. If it's standard we just have to make it once! Debian will be able to use it as Red Hat and Suse, and gentoo will be able to run scripts from these distros without a problem.

I know that sticking to some standard can keep us from doing cool things, but this is not the case, the standard fulfils our needs. (Almost, I need to ask them for a  'Before' keyword, but we can do it with the X- prefix). I just think it's stupid gentoo use different system-service names if lsb already provides some and things like that.

About C code, it's just the launcher. Scripts won't have to know its existence. I'm doing it in C since its fast and have some data structures we don't need in bash (linked lists for example). Actually this launcher will remove code from /sbin/runscripts.sh, which is a mess right now.

If you want to have a preview, I have a prototype in python here:
   http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_5.py
It's just to test the algorithm, it uses the dependency list I found in my /etc/init.d but doesn't cover provide, after and before, as these can be done using preprocessors. In my _simulations_ with scripts that sleeps for 1s I reduced  from 45s to 9s, 5x!

Code itself is in http://ltc08.ic.unicamp.br/~gustavo/pboot/src/, but there's no core algorithm, just a still unfinished parser of the preprocessed file.
Comment 18 Paul Pacheco 2004-11-07 06:34:04 UTC
Reply to Comment #16 :

This is not targeted for servers, Like you say, they are rebooted twice and this type of improvement is irrelevant. We do not care about servers in this one. This is for people who dual boot, people who turn the machine off at night, or people with laptops (like me). 

I do agree however that avoiding C is desirable from the maintenance POV. 

Reply to Comment #17: 

I like LSB, they seem to have their stuff together and they provide pretty much everything we need (except before like you say, but it can be solved). The major work here is converting all the existing scripts.

That said, I don't think we need to have an implementation in C either. It can all be done in bash (yes, even the parallel stuff for which I intend to have a patch some time this week. You can consider it a friendly race :)  ). 

IMHO, whether the scripts are LSB compliant or not is orthogonal to the parallel stuff. The parallel algorithm can work the same if dependencies are specified in a depend() function or in comments, the only change is how to read the dependencies. I think maybe it should be a separate bug/feature. I would suggest you open another bug and we can continue to talk about it there, just let me know so I can help.
Comment 19 Gustavo Sverzut Barbieri 2004-11-07 11:24:13 UTC
Ok.

I'll open the bug as soon as I have something ready :)

About doing everything in bash, yet better! Easier to get accepted.

I just disagree about converting scripts to LSB. We don't need to convert anything, almost every package has RH initscripts and they're already LSB compilant, maybe they don't have the dependency info (never checked), but this can be done based on current gentoo scripts.

Ok, LSB topic is over... now just parallel.
Comment 20 Paul Pacheco 2004-11-09 07:41:34 UTC
Created attachment 43606 [details, diff]
parallel_startup.patch

apply like this:
cd /
patch -p0 < /tmp/parallel_startup.patch
Comment 21 Paul Pacheco 2004-11-09 07:48:31 UTC
That is a first draft. There are a few things I would like to clean up a bit, but the important parts are there. No deadlocks, no busy waits, fully parallel (except for critical services such as mounting filesystems), and 100% plain bash.

Gustavo can you give it a try?

From grub to kdm, I get the following results  (minutes:seconds)

RC_PARALLEL_STARTUP="no"                1:41
old RC_PARALLEL_STARTUP="yes"           1:34
RC_PARALLEL_STARTUP="yes"               1:25

The last one is using this patch and setting RC_PARALLEL_STARTUP to true.

YMMV especially because mine takes a long time for the kernel to find the scsi drives and do some audit which can not be parallelized.

As can be expected, the ebegin and eend messages get totally screwed up. We need to think about how to solve this.
Comment 22 Gustavo Sverzut Barbieri 2004-11-09 08:09:20 UTC
Sure, I'll test and report it ASAP.
Comment 23 Gustavo Sverzut Barbieri 2004-11-09 08:13:39 UTC
Paul,

I'm using =sys-apps/baselayout-1.11.6 and it doesn't apply correctly.

(I'll look at rejects Thursday, I have a test tomorrow (wednesday))
Comment 24 Paul Pacheco 2004-11-09 08:18:48 UTC
crap.

It seems the patch is good for baselayout-1.9.4-r6 (the last stable one) but not newer ones.

I'll have to upgrade to the latest one and redo the patch.
Comment 25 Paul Pacheco 2004-11-10 08:35:49 UTC
Created attachment 43665 [details, diff]
parallel_startup.patch

This one is applied against baselayout-1.11.6-r1

from grub to kdm my times are (minutes:seconds) :
Before				  2:00
RC_PARALLEL_STARTUP="yes" (old)   1:59
RC_PARALLEL_STARTUP="yes" (new)   1:40

Gustavo, can you give it a try and put your times here (before and after)?
Comment 26 Gustavo Sverzut Barbieri 2004-11-11 17:18:23 UTC
It worked here and solved a problem: with the old parallel I got stuck many times, killing (C-c) the script always yield errors on rc-services.sh, line 622. But your  new parallel startup fixed that. Wonderful!

My boot time are:
 - serial: 55s
 - old parallel: ? I can't measure, it's broken.
 - new parallel: 48s
 - new parallel + xdm fast: 45s

I tried readahead but it make things worst. I ensured every file in readahead.*files were in my system, but it didn't help.

My machine is a Pentium IV 2.66Ghz, 512Mb of RAM and an IDE HD.
Comment 27 Paul Pacheco 2004-11-17 07:40:03 UTC
Created attachment 44157 [details, diff]
parallel_startup.patch

Now I properly check if a critical service fails.
I would like to see this included in cvs, I will gladly fix any problem that
arises to see that happen. If you have any problem with the patch, please let
me know.
Comment 28 Paul Pacheco 2004-11-30 09:59:45 UTC
Created attachment 45004 [details, diff]
parallel_startup.patch

use local service="$1" instead of $1 everywhere for better readability as
suggested by UberLord
Comment 29 SpanKY gentoo-dev 2004-12-06 08:45:04 UTC
*** Bug 73566 has been marked as a duplicate of this bug. ***
Comment 30 Heinrich Wendel (RETIRED) gentoo-dev 2004-12-29 10:37:40 UTC
base-system: i use this patch since it's available and it works fine, any chance to include it soon?
Comment 31 Paul Pacheco 2005-02-17 06:41:14 UTC
Created attachment 51435 [details, diff]
parallel patch for baselayout 1.11.9-r1

Same patch but for baselayout 1.11.9-r1
Comment 32 César Fernández 2005-02-28 04:20:09 UTC
It works for me.
Comment 33 Heinrich Wendel (RETIRED) gentoo-dev 2005-05-16 15:30:21 UTC
uberlord: what about this?
Comment 34 Heinrich Wendel (RETIRED) gentoo-dev 2005-05-16 15:33:16 UTC
and what about making it the default even?
Comment 35 Roy Marples (RETIRED) gentoo-dev 2005-05-19 08:30:23 UTC
I've merged this into baselayout CVS

Regarding ebegin/eend not lining up - this has been solved by hiding output from
the init scripts and just outputting "Service ${myservice}
starting/started/stopping/stopped/failed" lines instead.
Not as pretty, but it works :)

Will be in baselayout-1.12.0-alpha3
Comment 36 Ed Catmur 2005-06-28 12:49:42 UTC
*** Bug 83903 has been marked as a duplicate of this bug. ***
Comment 37 Prakash Punnoor 2005-07-20 01:43:37 UTC
I am now trying new baselayout-1.12.0_pre1-r1 and I am not sure whether this  
parallelizing patch was merged properly, because I see something with I had  
with the old parallel mod but *not* with above patch and old baselayout:  
  
Some services print a warning that they are already starting.  
  
I this expected behaviour or are these races? 
Comment 38 Roy Marples (RETIRED) gentoo-dev 2005-07-20 01:55:25 UTC
(In reply to comment #37)
> I am now trying new baselayout-1.12.0_pre1-r1 and I am not sure whether this  
> parallelizing patch was merged properly, because I see something with I had  
> with the old parallel mod but *not* with above patch and old baselayout:

The problem is that we also have some new status levels - here's a summary

starting - started - stopping - stopped - inactive

Whereas we just had

started - stopped

Yes, you are seeing a race - the difference is that with this version you are
seeing a warning and the old baselayout + patch it just bailed out without a
warning.

If you think that's an issue then feel free to open a new bug as this one is now
fixed :)
Comment 39 Kai Krakow 2006-07-09 15:08:00 UTC
Other distribution like SuSE use "make" to start up services in parallel. "make" should be available on every Gentoo box per standard, plus it resolves all parallelism questions in this issue.

One should only "compile" a Makefile from the depends found in the init scripts. Would this be an option?
Comment 40 Roy Marples (RETIRED) gentoo-dev 2006-07-09 23:45:39 UTC
(In reply to comment #39)
> One should only "compile" a Makefile from the depends found in the init
> scripts. Would this be an option?

It's always an option, but we have this fixed in baselayout-1.12 now, so why bother?