Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 112161 - baselayout-1.12.0_pre10 (and earlier) doesn't change init levels correctly
Summary: baselayout-1.12.0_pre10 (and earlier) doesn't change init levels correctly
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: AMD64 Linux
: High major (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: InVCS
: 115032 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-11-11 04:22 UTC by Duncan
Modified: 2005-12-23 11:26 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info output (einfo,3.04 KB, text/plain)
2005-11-11 04:25 UTC, Duncan
Details
/etc/inittab (inittab,1.63 KB, text/plain)
2005-11-11 08:41 UTC, Duncan
Details
/etc/profile (profile,447 bytes, text/plain)
2005-11-11 08:45 UTC, Duncan
Details
/etc/(jed)bashrc (jedbashrc,267 bytes, text/plain)
2005-11-11 08:48 UTC, Duncan
Details
Only check interactive if we have a keyboard (rc.patch,391 bytes, patch)
2005-12-16 06:39 UTC, Roy Marples (RETIRED)
Details | Diff
Checks stty for icanon (rc-icanon.patch,630 bytes, patch)
2005-12-19 03:13 UTC, Roy Marples (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2005-11-11 04:22:25 UTC
OK, I thought sure this was a known issue and someone had bugged it long before   
now, but it's still here and a search on ALL baselayout-1.12 didn't turn up   
anything that looked relevant, so time for me to get off my butt and file this   
bug b4 we get out of the pre stage!   
   
Due to an issue with my router, I have initdefault set to init level 2 (nonet)   
in inittab.  After booting to that and manually resetting my router if   
necessary, I issue the command "init 3" (with-net default) as root, to   
initialize the network and everything depending on it (bind/named, ntp-client,   
ntpd, privoxy, etc).  Occasionally, if I forgot to reset the router and I don't   
get network (ntp-client fails), I have to init 2, reset the router, and init 3.   
   
The issue is that after issuing the init X command, nothing appears to happen   
unless I hold down the enter key.  If I hold it down, getting repeated command   
prompts, each item in turn will start (or stop, as appropriate) as it should.    
Within a multi-step item, such as within ntp-client, I can release the enter   
key and it will continue running the init-script for that item to   
success/failure (with ntp-client, finding or failing on each step-ticker in   
turn, eventually succeeding/failing on the ntp-client script itself), but will   
then stop until I hit enter a bunch more times to start the next init-script on   
the level.   
   
Hitting enter works, but needless to say, isn't ideal! =8^(   
  
This worked correctly with baselayout-1.11, or whatever it was.  
 
Setting "major" as this is broken behavior that shouldn't make it past the pre 
stage. 
 
I see pre10-r1 is out, but don't see anything in the changelog indicating this   
may be fixed, and decided I better quit ignoring this bug and file it while I  
was thinking about it.  Now that it's filed, I'll confirm one way   
or the other whether pre10-r1 has the issue or not, after I merge it and   
reboot.  
  

Reproducible: Always
Steps to Reproduce:
Switch init levels with "init X" (where X is an init level number). 
Actual Results:  
Won't start/stop anything unless I hold down the enter key. 

Expected Results:  
init X should just do it, without holding down the enter key until everything 
is done. 

emerge info to be attached.
Comment 1 Duncan 2005-11-11 04:25:06 UTC
Created attachment 72638 [details]
emerge info output
Comment 2 Roy Marples (RETIRED) gentoo-dev 2005-11-11 07:29:49 UTC
init 2

Works For Me .....
Care to post your /etc/inittab ?
And have you ever emerged init-ng?
Comment 3 Duncan 2005-11-11 08:41:47 UTC
Created attachment 72665 [details]
/etc/inittab

Here's inittab.  Nothing special except that I maintained the runlevel 1
agettys when they were removed at some point.

I did confirm that pre10-r1 still has the same behavior.

Possibly of interest, I have a somewhat customized bash initialization setup. 
Basically, /etc/profile and all the bashrc's including root's do little more
than source the collection of shell-scripts in /etc/profile.d, each of which
does its own little task such as setting up $EDITOR (excerpted from the Gentoo
scripts) or setting noclobber (set -o noclobber, from a GWN tips-N-tricks
segment) or setting ls color and the like (a bit of a combination of Gentoo's
solution, and my former distrib Mandrake's solution).  /etc/profile only
sources the profile.d files, while the individual bashrcs have various user
specific stuff in addition to sourcing the profile.d files.  However, aren't
the startup scripts supposed to be entirely agnostic to how they are called,
now?  I guess they aren't, tho, as the stuff in init level 2 starts just fine
as the default, automatically run, but switching levels from the command line
doesn't go just fine.  Still, the individual initscripts run fine.  It's the
level switch, starting each initscript in order, that doesn't.

I don't know if /etc/profile is even run, but I guess I'll post it too.  If you
need the profile.d files, I'll post them, too, but not unless asked, as there
are several of them altho they are small, and I'm not sure they are needed yet,
so don't want to spam the bug.

I'm also running the "extreme" setup with a tmpfs svcdir, parallel startup
turned OFF, due to issues earlier in the 1.12.0-preX cycle.  (I really should
try turning that back on, I guess, to see if that's been fixed, but not until
after this one's fixed.)

Duncan
Comment 4 Duncan 2005-11-11 08:45:50 UTC
Created attachment 72666 [details]
/etc/profile

OK, this just sources jedbashrc (named to avoid overwrite), so I guess I post
that too. =8^)
Comment 5 Duncan 2005-11-11 08:48:25 UTC
Created attachment 72668 [details]
/etc/(jed)bashrc

As mentioned, if you need the files from /etc/profile.d, just holler!
Comment 6 Duncan 2005-11-11 08:53:41 UTC
Doh!  Forgot to answer your question about init-ng! 
 
No, never (IIRC) tried it.  BTW, according to emerge --pretend, no such ebuild 
init-ng, it's initng (no dash), and yes, it says it would be a "new" emerge, so 
it's not merged. 
 
Duncan 
Comment 7 Roy Marples (RETIRED) gentoo-dev 2005-12-10 04:12:52 UTC
*** Bug 115032 has been marked as a duplicate of this bug. ***
Comment 8 Duncan 2005-12-11 02:08:10 UTC
I noticed something the other day that may be of help tracing this. 
 
If I exec init X, such that it runs in the same process as my bashprompt, it 
often (but not always) seems to run on its own, no waiting for enters.  I 
haven't quite got it 100% repeatable, so there's something else entering the 
picture as well, but 100% just entering "init X" (with X a number, of course), 
it waits for enters, while somewhat  /less/ dependably, but /usually/, if I 
exec in the same process, it goes right ahead without further prompts. 
 
I suspect the failure to do it /all/ the time with exec has something to do 
with whether I'm logged in as root, or su-ed to root, or run sudo init, or run 
exec init X in a script, or...  Apparently, execing from some of those work 
reliably, some "go interactive", but I haven't yet figured out which do what.  
Still, it's a clue.   
 
Perhaps it's the presence or absence of something in the environment, 
non-exported so it isn't there when I run in a new process, but there when I 
exec in, say, a login shell (only?)? 
 
I'll see about doing some more testing... particularly of exported variables 
and login vs. non-login shells. 
 
Duncan 
Comment 9 Roy Marples (RETIRED) gentoo-dev 2005-12-16 06:39:01 UTC
Created attachment 74876 [details, diff]
Only check interactive if we have a keyboard

This should enable RC to only check for interactive if we have a keyboard
attached.

Sorry about this taking time - it's something I'm finding very hard to
replicate.
Comment 10 Duncan 2005-12-17 08:11:27 UTC
(In reply to comment #9)
> attachment (id=74876)
> This should enable RC to only check for interactive if we have a keyboard
> attached.

Unfortunately, doesn't help.

> Sorry about this taking time - it's something I'm finding very hard to
> replicate.

Not a big problem, as once I figured it out I don't boot that often so it's not a big deal.  Just something that should be fixed before it goes out of rc or into stable.

OK, I've done a bit more testing re my last comment (exec init X often working), and I think it *IS* login shell related.

exec init X appears to reproducably work from a login shell (of course, logging me off in the process).  If I su to root and try exec init X from there, it's back to the old pausing behavior, just as if I execute the command in a separate shell (without using exec).  Confirming the login shell idea, if I su -, so it runs as login, I get a working exec init X once again.

So, it looks to be environmental, related to login shell or not.  Naturally, running it in its own process won't give me a login shell environment, so it pauses.

There are VERY few differences between the two environments.  For each environment, login and su, I captured to files the output of declare, export, and shopt.c  I then ran diff on the outputs, to see how the environments differed.  Here are the results (short enough to post without using attachments), :

#diff shopt.root.login shopt.root.su
4c4
< checkwinsize          off
---
> checkwinsize          on
22c22
< login_shell           on
---
> login_shell           off

#diff declare.root.login declare.root.su
44d43
< MAIL=/var/mail/root
54c53
< PPID=5729
---
> PPID=9461
66c65
< SHLVL=1
---
> SHLVL=2

#diff export.root.login export.root.su
23d22
< declare -x MAIL="/var/mail/root"
37c36
< declare -x SHLVL="1"
---
> declare -x SHLVL="2"

Other than login_shell and SHLVL, and the proc numbers of course...

** Do the init scripts (or runscript) require a non-empty mail variable now?  /That/ would explain things, if they do!

I'm going to set that specifically in my environment, and test to see if that fixes things.

Duncan
Comment 11 Duncan 2005-12-17 09:24:39 UTC
(In reply to comment #10)
> ** Do the init scripts (or runscript) require a non-empty mail variable now? 
> /That/ would explain things, if they do!
> 
> I'm going to set that specifically in my environment, and test to see if that
> fixes things.

No such luck!

However, another strange clue to add to the mix.  When I go from init 3 with network to init 2 without, it will shutdown local, shut down the network stuff (several scripts), then, after a pause (either something isn't quite shutting down right or it's the keyboard pause thing again), it'll start up local again.  This is without the "exec".  So, it seems to shutdown stuff just fine, all without pauses as it should, but it has trouble /starting/ scripts -- unless I'm using "exec init X" from a login shell, in which case it does them normally.

Do you get the feeling we are both missing something big, right in front of our noses?  I do!

Could it be something to do with readline?  I've noticed at times, it will take only a single letter to unstiick it.  Other times it takes a line, any or no content, but one has to hit that enter key.  I haven't a clue as to why the behavior would differ, but it seems to.  In any case, it single-steps on each script, but within a script, it runs the entire script to completion, then pauses until I hit either  a key (if it's acting that way that day) or specifically the "enter" key (if it's being particular), before it will start the next script.

Duncan
Comment 12 Roy Marples (RETIRED) gentoo-dev 2005-12-18 03:23:30 UTC
Could you add this at line 94 in /sbin/rc
return 1

Just before the line user_want_interactive() {
Comment 13 Benno Schulenberg 2005-12-18 06:01:49 UTC
At #12: adding "return 1" just after the "user_want_interactive() {" line indeed makes the problem go away.
Comment 14 Duncan 2005-12-18 07:11:30 UTC
(In reply to comment #13)
> At #12: adding "return 1" just after the "user_want_interactive() {" line
> indeed makes the problem go away.

Confirmed here.  Of course, that's just shorting out the problem, but at least the location of the problem is now verified!

Duncan

Comment 15 Roy Marples (RETIRED) gentoo-dev 2005-12-19 03:13:20 UTC
Created attachment 75077 [details, diff]
Checks stty for icanon

OK, this patch should fix things proper this time.
Comment 16 Duncan 2005-12-20 02:38:44 UTC
(In reply to comment #15)
> attachment (id=75077) Checks stty for icanon
> 
> OK, this patch should fix things proper this time.

Testing results so far are positive.  =8^)

Duncan
Comment 17 Roy Marples (RETIRED) gentoo-dev 2005-12-20 06:03:24 UTC
OK, I've comitted the patch to our SVN, will be in baselayout-1.12.0_pre12
Comment 18 Benno Schulenberg 2005-12-20 11:03:24 UTC
Ad comment #15: yes, this solves the problem for me too.  Thanks.
Comment 19 Roy Marples (RETIRED) gentoo-dev 2005-12-21 12:42:14 UTC
pre12 is now out
Comment 20 Duncan 2005-12-23 06:11:33 UTC
(In reply to comment #15)
> Created an attachment (id=75077) Checks stty for icanon

The comment in the attachment asks if there's a better way.  I used to run Mandrake, and that got me thinking back to how they did it.  I learned bash scripting by taking apart their rc.sysinit script, so I know a bit about their setup.  Here's the way they do it, which is probably quite close to the way Red Hat does it as well, given they are supposed to be compatible.  Whether it's better... but anyway, it's different.

Unfortunately back then, I didn't quite grok what they did to get the interaction, so I downloaded the rc.sysinit file out of their viewCVS, and went to work figuring it out again.  Turns out I had misparsed a backgrounding "&" as a "continue of no error "&&".  Once I figured /that/ out, the rest suddenly became clear!

URL for their viewCVS for rc.sysinit:
http://cvs.mandriva.com/cgi-bin/cvsweb.cgi/soft/initscripts/rc.d/rc.sysinit

Twice in their rc.sysinit, they do something like this:

--snip--
{
#do a bunch of stuff
.
.
kill -TERM `/sbin/pidof getkey` >/dev/null 2>&1
} &
if [ "$PROMPT" != "no" ]; then
	/sbin/getkey i && touch /var/run/confirm
fi
wait
--snip--

So, that does several things in the background, while running a getkey in the foreground.  If getkey returns an i, it sets interactive, otherwise, the last task on the backgrounded list is to kill the getkey.  The wait keeps the main script from continuing if a key has been entered causing getkey to exit before the background stuff is finished.

The first time they run this is while / is still ro mounted, so all they do is set a variable.  (The backgrounded tasks run the first time include printing a banner and the "hit i to enter interactive mode" prompt, mounting devfs/pts/shm if necessary, and checking to see if brltty for braille needs started.)  Then all the lvm/raid/fsck/mounting/quota stuff runs.  Then they run the getkey with backgrounded other tasks (cleaning up /tmp and various system lock files) again, touching the confirm file if interactive is triggered.

The biggest thing to note about the process is that getkey is a separate executable, so they don't have to worry about saving and restoring terminal state, because at that point, the getkey executable, not bash, is handling keyboard input, and it simply gets killed if the background stuff gets done before it gets a key.  Of course, that means the backgrounded tasks can't require any interactivity of their own, but they arrange it so nothing requiring interactivity is run in the backgrounded sections.

Note that I don't have a getkey on my path, so such a solution would apparently require another system dependency on Gentoo (getkey is part of the initscripts package on Mandriva).  getkey is a C program (GPLed, copyright Red Hat), getkey.c being all of 130 lines long and stable for over two years, so it should introduce few problems, and might even solve some if there are packages that expect getkey to exist, maybe because they were developed on Red Hat/Mandriva.  URL to the viewCVS page:
http://cvs.mandriva.com/cgi-bin/cvsweb.cgi/soft/initscripts/src/getkey.c

Something else to note, their solution won't pickup an "i" entered outside their backgrounded task sections, during the volume handling (raid/lvm/fsck/mounting/etc) stuff, for instance.

Finally, note that their confirm file sticks around until reboot and removal by the cleanup step, or until manually removed.  Thus, if in interactive mode at boot, further init level changes will be interactive as well, by default.  Further, setting interactive mode is as easy as touching that file manually, and vwalla!

FWIW.  As I said, I don't know if it's better, but it's certainly a different option to consider.

Duncan
Comment 21 Duncan 2005-12-23 11:26:25 UTC
re comment #20:

What about using the backgrounded processing with a parallel read (bash built-in)?  read has switches to take only a single letter, if desired, and/or timeout, and puts the reply in a variable.  Of course, one doesn't want to kill the process doing the read, if it's the main rc script.  However, run a helper script that sets exit status based on the reply, then use the exit status to set a var in the main script, and we're pretty close to doing with bash alone what RH/Mandriva use getkey for, and Gentoo is currently using tty and stty for.

Duncan