Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 47111 - More severe depcache problems with hotplug
Summary: More severe depcache problems with hotplug
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High major (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 46954 47109 47334 47420 47993 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-04-07 08:52 UTC by dswhite42
Modified: 2004-11-10 14:10 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Contents of /var/lib/init.d/depcache (depcache,135.67 KB, text/plain)
2004-04-07 12:43 UTC, dswhite42
Details
Contents of /var/lib/init.d/deptree (deptree,22.32 KB, text/plain)
2004-04-07 12:44 UTC, dswhite42
Details
Contents of /sbin/depscan.sh (depscan.sh,1.25 KB, text/plain)
2004-04-14 10:15 UTC, dswhite42
Details
Output from "bash /var/lib/init.d/depcache" (Output of bash-depcache.txt,5.41 KB, text/plain)
2004-04-14 10:22 UTC, dswhite42
Details
Environment during bootime (env-output.txt,1.62 KB, text/plain)
2004-04-14 10:37 UTC, dswhite42
Details
Debugging output of "bash -x /sbin/depscan.sh" running at bootime (debug.txt,5.81 KB, text/plain)
2004-04-14 15:31 UTC, dswhite42
Details
Patch for /sbin/depscan.sh (depscan.sh_patch,367 bytes, patch)
2004-04-15 19:10 UTC, dswhite42
Details | Diff
Patch for /lib/rcscripts/awk/cachedepends.awk (cachedepends.awk_patch,6.18 KB, patch)
2004-04-15 19:11 UTC, dswhite42
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description dswhite42 2004-04-07 08:52:54 UTC
I thought it might be best to separate this error out from bug #47109 (http://bugs.gentoo.org/show_bug.cgi?id=47109).  While similar, it manifests more severe problems that make the machine pretty much unusable.  (However, they can be merged into a single bug if that's more helpful). 

I rebooted the machine yesterday and got this sequence:
-----------------------------------------------------------------------------
 * Checking all filesystems...
/dev/hda1: clean, 35/34136 files, 11554/136521 blocks
 * Mounting local filesystems
 * Activating (possibly) more swap
 * Caching service dependencies
/var/lib/init.d/depcache: line 824: /conf.d/basic: No such file or directory 
/var/lib/init.d/depcache: line 1023: /conf.d/basic: No such file or directory
/var/lib/init.d/depcache: line 2003: [: missing ']'
/var/lib/init.d/depcache: line 2199: [: missing ']'
/var/lib/init.d/depcache: line 2592: [: too many arguments
/var/lib/init.d/depcache: line 2789: [: too many arguments
/var/lib/init.d/depcache: line 3182: ec: command not found
/var/lib/init.d/depcache: line 3378: ec: command not found
/var/lib/init.d/depcache: line 4954: syntax error near unexpected token ')'
/var/lib/init.d/depcache: line 4954: ')'
 * Service 'alsasound' already provide 'alsa-modules'!;
 * Not adding service 'alsasound'...
 * Service 'alsasound' already provide 'alsa-modules'!;
 * Not adding service 'apmd'...
 * Service 'vixie-cron' already provide 'cron'!;
 * Not adding service 'apmd'...
 * Service 'vixie-cron' already provide 'cron'!;
 * Not adding service 'apmd'...
 * Can't find service 'hostname' needed by 'domainname'; continuing...
 * Can't find service 'hostname' needed by 'domainname'; continuing...
 * Can't find service 'hostname' needed by 'domainname'; continuing...
 * Can't find service 'portmap' needed by 'famd'; continuing...
 * Can't find service 'hostname' needed by 'modules'; continuing...
 * Can't find service 'portmap' needed by 'pwcheck'; continuing...  
 * Can't find service 'hostname' needed by 'modules'; continuing... 
 * Can't find service 'portmap' needed by 'pwcheck'; continuing...  
 * Can't find service 'hostname' needed by 'modules'; continuing... 
 * Can't find service 'logger' needed by 'vixie-cron'; continuing...
 * Can't find service 'logger' needed by 'amavisd'; continuing...
 * Can't find service 'logger' needed by 'apmd'; continuing...      
 * Can't find service 'logger' needed by 'amavisd'; continuing...   
 * Can't find service 'logger' needed by 'apmd'; continuing...      
 * Can't find service 'logger' needed by 'amavisd'; continuing...   
 * Can't find service 'hostname' needed by 'bootmisc'; continuing...
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
 * Bringing eth0 down...
 * Bringing eth1 down...
 * Remounting root filesystem read-only (if necessary)...
 * Checking root filesystem...
Reiserfs super block in block 16 on 0x303 of format 3.6 with standard journal
Blocks (total/free): 43795198/39118106 by 4096 bytes
Filesystem marked as cleanly unmounted
Filesystem seems mounted read-only. Skipping journal replay.
Checking internal tree..finished
 * Remounting root filesystem read/write...
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
/sbin/rc: line 1220: 330DEPEND_TREE: value too great for base (error token is "330DEPEND_TREE")
 * Checking all filesystems...
/dev/hda1 is mounted.  e2fsck: Cannot continue, aborting.


 * Fsck could not correct all repairs, manual repair needed

Give root password for maintenance
(or type Control-D for normal startup):
-----------------------------------------------------------------------------

The weirdness about the depcache errors are in bug 47109.  What's new here is all the 330DEPEND_TREE errors apparently occurring somewhere in a script called by /sbin/rc.  If you skip the maintenance step at the bottom (by hitting Ctrl-D), the error message referring to 330DEPEND_TREE will occur almost everytime you try to do something (ls, cd, etc.)  About all you can sucessfully do at that point is "/sbin/poweroff".

Rebooting the machine had no effect - the same problems occurred.  However, shutting down the computer entirely, waiting a while, then booting it up again at least let me get back to the relatively less severe problems of bug 47109, without giving all these 330DEPEND_TREE errors.

I'm relatively sure it's not a hardware problem with my equipment, but I don't know enough to figure out the source of these error messages.

My system:

> emerge info

Portage 2.0.50-r3 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.4.23_pre8-gss-r2)
=================================================================
System uname: 2.4.23_pre8-gss-r2 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz
Gentoo Base System version 1.4.3.13p1
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.3
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-O3 -mcpu=pentium4 -march=pentium4 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.1/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -mcpu=pentium4 -march=pentium4 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://www.gtlib.cc.gatech.edu/pub/gentoo ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acl alsa apache2 apm arts avi berkdb cdr crypt cups curl dga directfb dvd dvdr encode esd fbcon flash foomaticdb gd gdbm ggi gif gnome gphoto2 gpm gstreamer gtk gtk2 imap imlib java jpeg kde ldap libg++ libwww mad maildir matrox mbox mcal mikmod mmx motif mozilla mpeg mysql nas ncurses nls odbc oggvorbis opengl oss pam pdflib perl png ppds python qt quicktime readline samba sasl scanner sdl slang snmp spell sse ssl svga tcltk tcpd tiff truetype usb x86 xml xml2 xmms xv zlib"

I'll be happy to provide any other info you might need.
Comment 1 Tuan Van (RETIRED) gentoo-dev 2004-04-07 09:49:36 UTC
> Portage 2.0.50-r3 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.4.23_pre8-gss-r2)
May be because your profile is outdated. I suggest that you update your default profile to default-x86-2004.0
Comment 2 dswhite42 2004-04-07 10:39:18 UTC
I have to confess ignorance here on how to "update my default profile".  

Google'ing around, I found this page: (http://www.gentoo.org/proj/en/releng/release/2004.0/x86/x86-release-notes.xml)  At the bottom, it says "If you already have a working installation of Gentoo Linux (Version 1.4) there is no need to reinstall. You will automatically get Gentoo 2004.0 if you sync your Portage tree and run emerge --update world. If you still have an installation with a Gentoo 1.2 profile it might make sense -- in some cases -- that you do a new installation."

Since I regularly do "update -uD world" commands, is there still something else I need to do?

Comment 3 Tuan Van (RETIRED) gentoo-dev 2004-04-07 11:42:45 UTC
Yes you are correct. 1.4 and 2004.0 is not much different. Looking around bug.gentoo.org I found bug #34478 which is similar with yours. You should look for something strange in /etc/init.d. The other reporter got a *.merged in there.
Comment 4 dswhite42 2004-04-07 12:42:15 UTC
The symptoms of bug 34478 does sound similar to my problem, but fortunately (or unfortunately, I guess) I don't seem to have the same root causes.  Here's my init.d files:

> cd /etc/init.d
> file *
acpid:        a /sbin/runscript script text executable
alsasound:    a /sbin/runscript script text executable
amavisd:      a /sbin/runscript script text executable
apache2:      a /sbin/runscript script text executable
apmd:         a /sbin/runscript script text executable
bootmisc:     a /sbin/runscript script text executable
checkfs:      a /sbin/runscript script text executable
checkroot:    a /sbin/runscript script text executable
clock:        a /sbin/runscript script text executable
consolefont:  a /sbin/runscript script text executable
crypto-loop:  a /sbin/runscript script text executable
cupsd:        a /sbin/runscript script text executable
depscan.sh:   symbolic link to `../../sbin/depscan.sh'
dhcp:         a /sbin/runscript script text executable
dhcrelay:     a /sbin/runscript script text executable
domainname:   a /sbin/runscript script text executable
esound:       a /sbin/runscript script text executable
famd:         a /sbin/runscript script text executable
firewall:     a /sbin/runscript script text executable
functions.sh: symbolic link to `../../sbin/functions.sh'
gpm:          a /sbin/runscript script text executable
halt.sh:      ASCII English text
hdparm:       a /sbin/runscript script text executable
helixserver:  a /sbin/runscript script text executable
hostname:     a /sbin/runscript script text executable
hotplug:      a /sbin/runscript script text executable
iptables:     a /sbin/runscript script text executable
keymaps:      a /sbin/runscript script text executable
lisa:         a /sbin/runscript script text executable
local:        a /sbin/runscript script text executable
localmount:   a /sbin/runscript script text executable
modules:      a /sbin/runscript script text executable
mysql:        a /sbin/runscript script text executable
nas:          a /sbin/runscript script text executable
net.eth0:     a /sbin/runscript script text executable
net.eth1:     symbolic link to `net.eth0'
net.lo:       a /sbin/runscript script text executable
net.ppp0:     a /sbin/runscript script text executable
netmount:     a /sbin/runscript script text executable
nscd:         a /sbin/runscript script text executable
ntp-client:   a /sbin/runscript script text executable
ntpd:         a /sbin/runscript script text executable
numlock:      a /sbin/runscript script text executable
popa3d:       a /sbin/runscript script text executable
portmap:      a /sbin/runscript script text executable
postfix:      a /sbin/runscript script text executable
prefirewall:  a /sbin/runscript script text executable
pure-ftpd:    a /sbin/runscript script text executable
pwcheck:      a /sbin/runscript script text executable
reboot.sh:    ASCII English text
reslisa:      a /sbin/runscript script text executable
rmnologin:    a /sbin/runscript script text executable
rsyncd:       a /sbin/runscript script text executable
runscript.sh: symbolic link to `../../sbin/runscript.sh'
samba:        a /sbin/runscript script text executable
saslauthd:    a /sbin/runscript script text executable
serial:       a /sbin/runscript script text executable
shutdown.sh:  ASCII English text
slapd:        a /sbin/runscript script text executable
slurpd:       a /sbin/runscript script text executable
snmpd:        a /sbin/runscript script text executable
spamd:        a /sbin/runscript script text executable
squid:        a /sbin/runscript script text executable
sshd:         a /sbin/runscript script text executable
switch:       a /sbin/runscript script text executable
syslog-ng:    a /sbin/runscript script text executable
upsd:         a /sbin/runscript script text executable
upsdrv:       a /sbin/runscript script text executable
upsmon:       a /sbin/runscript script text executable
urandom:      a /sbin/runscript script text executable
vixie-cron:   a /sbin/runscript script text executable
winbind:      a /sbin/runscript script text executable
xdm:          a /sbin/runscript script text executable
xfs:          a /sbin/runscript script text executable
xinetd:       a /sbin/runscript script text executable

Looks pretty normal - no binaries there.  And no *.merged files either.

I'll attach my /var/lib/init.d/deptree and /var/lib/init.d/depcache (as suggested in that bug) in case it's useful.

One other possible clue: bug 34478 had a link to a URL where someone was having a similar problem.  (http://forums.gentoo.org/viewtopic.php?t=103042) .   In that thread, someone suggested running the command:

/bin/gawk -v SVCDIR=/var/lib/init.d/ -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/cachedepends.awk  | bash 

to see if it produced any errors.  It took some tweaking to get it to work, but here's what I did:

> /bin/gawk -v SVCDIR=/var/lib/init.d/ -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/cachedepends.awk  | bash
bash: line 1:  command not found
bash: line 1: 01m*: command not found
> /bin/gawk -v SVCDIR=/var/lib/init.d/ -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/cachedepends.awk 
 * Could not get SVCDIR!
> SVCDIR=/var/lib/init.d
> export SVCDIR
> /bin/gawk -v SVCDIR=/var/lib/init.d/ -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/cachedepends.awk 
> 

So, once I define and export SVCDIR, the script ran with no output (meaning, I presume, no errors?)
Comment 5 dswhite42 2004-04-07 12:43:32 UTC
Created attachment 28843 [details]
Contents of /var/lib/init.d/depcache
Comment 6 dswhite42 2004-04-07 12:44:03 UTC
Created attachment 28844 [details]
Contents of /var/lib/init.d/deptree
Comment 7 Martin Holzer (RETIRED) gentoo-dev 2004-04-08 01:41:27 UTC
*** Bug 47109 has been marked as a duplicate of this bug. ***
Comment 8 Tuan Van (RETIRED) gentoo-dev 2004-04-10 09:58:20 UTC
*** Bug 47420 has been marked as a duplicate of this bug. ***
Comment 9 dswhite42 2004-04-14 10:15:15 UTC
Created attachment 29259 [details]
Contents of /sbin/depscan.sh
Comment 10 dswhite42 2004-04-14 10:22:53 UTC
Created attachment 29262 [details]
Output from "bash /var/lib/init.d/depcache"

This attachment shows what I see when I cd to /etc/init.d and run "bash
/var/lib/init.d/depcache"
Comment 11 Tuan Van (RETIRED) gentoo-dev 2004-04-14 10:29:03 UTC
dswhite42:
Have you tried baselayout-1.8.9? 
Comment 12 dswhite42 2004-04-14 10:37:45 UTC
Created attachment 29268 [details]
Environment during bootime

This may or may not help the bug wranglers, but here is what is displayed when
I insert an "/bin/env" debug command into /sbin/depscan.sh and capture the
output while the system is booting (and calculating dependencies).
Comment 13 dswhite42 2004-04-14 10:38:09 UTC
Through a series of debug "echo" statements, tee commands and "bash -x" commands in /sbin/rc and /sbin/depscan.sh, I've narrowed down the point where the errors take place.  It occurs when /sbin/depscan.sh runs this command:

bash "${svcdir}/depcache" | \
\
/bin/gawk \
  -f /lib/rcscripts/awk/functions.awk \
  -f /lib/rcscripts/awk/gendepends.awk || \
  retval=1

The error messages that are displayed during the boot process have been detailed in my first entry to this bug.  I CANNOT reproduce those error messages by logging in as root and doing any of the following, however:

1) 
"/sbin/depscan.sh"
(Displays " * Caching service dependencies..." and exits (return code 0))

2)
svcdir=/var/lib/init.d
SVCDIR=/var/lib/init.d
export svcdir SVCDIR
bash "${svcdir}/depcache" | /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/gendepends.awk
(Exits with no output (return code 0))

3) 
svcdir=/var/lib/init.d
SVCDIR=/var/lib/init.d
export svcdir SVCDIR
cd /etc/init.d
bash /var/lib/init.d/depcache > /tmp/foo.txt
cat /tmp/foo.txt | /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/gendepends.awk
(Exits with no output (return code 0))

So it seems that something catastrophic is happening during the depscan/depcache interaction ONLY at boottime (and it DOES give the error messages I've detailed consistently each time I reboot).  But running the commands manually as root by the time the system is accepting logins produces no errors.

Is there some low-level problem here?  Maybe some gawk bug?  (I have sys-apps/gawk 3.1.3-r1 installed)  I only suggest that because the author of this thread (http://forums.gentoo.org/viewtopic.php?t=103042&highlight=depscan) suggests that maybe a segfault related to the "split" command might be the culprit in his case).

Comment 14 Aron Griffis (RETIRED) gentoo-dev 2004-04-14 11:13:46 UTC
dswhite42, you're doing an awesome job debugging this, keep it up!  if you can narrow it all the way down to a patch, that would be excellent.  if you can't, I'll look into it ASAP (which might be a few days yet, unfortunately)
Comment 15 dswhite42 2004-04-14 15:29:56 UTC
Lang, thanks for your continued suggestions.  I tried upgrading from baselayout-1.8.8 to 1.8.9 today and rebooted, but sadly the errors persist.  

Aron, thanks for the encouragement - I'm more inclined to keep working on this if I don't think I'm just whistling in the dark.  I'm rapidly reaching the limits of my debugging capabilities, and it may take someone with a more in-depth knowledge of the Gentoo base system (and/or gawk) to solve this.

I will list a few other things I've tried, though, most of which involved hacking /sbin/depscan.sh.  Here's a little bit more narrowing down of the problem, and it's still totally bizarre.

Here's the setup: 

I modified this line in /sbin/rc:

  /sbin/depscan.sh

and changed it to this:

  (/sbin/depscan.sh | /bin/tee -a /tmp/debug.txt) 3>&1 1>&2 2>&3 | /bin/tee -a /tmp/debug.txt

(The syntax is weird, but it's the only way I know to redirect both stdout and stderr to a file while still displaying everything on the screen).  Anyway, this sends all the output of /sbin/depscan.sh (which is set to "#!/bin/bash -x" on line 1) to /tmp/debug.txt

Next, I modified a section of /sbin/depscan.sh so it now looks like this:
------------------------------------------------------------------------
echo "######### START DEBUG TEST #########"
(bash "${svcdir}/depcache" | /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/gendepends.awk) > /tmp/depcache-run-output 2>&1
echo "########## END DEBUG TEST ##########"

bash "${svcdir}/depcache" | \
\
/bin/gawk \
        -f /lib/rcscripts/awk/functions.awk \
        -f /lib/rcscripts/awk/gendepends.awk || \
        retval=1
------------------------------------------------------------------------

See what the setup is here?  I'm essentially running the same "bash depcache | gawk" command TWICE.  The first time it's run, its output will go to /tmp/depcache-run-output .  The second time it will send output to the screen (in the normal manner).

And now here's the kicker.  When I rebooted the machine (and saw the usual errors), then looked at /tmp/depcache-run-output, I saw only these lines:

+ bash /var/lib/init.d/depcache
+ /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/gendepends.awk

Those lines are simply debugging lines from depscan.sh (running "bash -x").  Which means there was no other output.  Which means the first time the command ran, there were apparently NO ERRORS.  

The second time the command ran, it displayed all the usual boot errors on the screen.

So now we have this bizarre scenario where the Bad Command is run twice in /sbin/depscan.sh, the first time sending its output to a text file, the second time running normally.  And in the first case, it appears to run normally without errors... and in the second case, it produces all the same errors as usual!

I'll attach the contents of my /tmp/debug.txt (which shows all the processing output of /depscan.sh, including the errors when the command runs the second time) in a separate file.



Comment 16 dswhite42 2004-04-14 15:31:48 UTC
Created attachment 29289 [details]
Debugging output of "bash -x /sbin/depscan.sh" running at bootime
Comment 17 dswhite42 2004-04-14 16:11:36 UTC
Hmmmm.  Looking at that debug.txt file I attached again, I see these lines:

...
 *  Services 'firewall' and 'reslisa' have circular
 *  dependency of type 'ibefore';  continuing...
...

That probably comes from my attempt to get my own init.d shell script ("firewall") to run as early as possible after the "net" service had launched.  After trying several things that didn't work, I eventually ended up with this in the firewall script:

depend() {
    need net checkroot
    before *
}

I don't know why it's complaining that "firewall" and "reslisa" have circular dependencies, since /etc/init.d/reslisa just contains this:

depend() {
    use net
}

just like other /etc/init.d files (e.g. /etc/init.d/rsyncd) .

Just for fun, though, I changed the depend() statement in /etc/init.d/firewall to just "need net", and rebooted.  This time, I get a debug.txt file that looks like this (picking it up near the bottom:)

+ echo -e ' * Caching service dependencies...'
+ return 0
+ '[' '!' -d /var/lib/init.d ']'
+ '[' '!' -d /var/lib/init.d/softscripts ']'
+ mkdir -p -m 0755 /var/lib/init.d/softscripts
+ '[' '!' -d /var/lib/init.d/snapshot ']'
+ mkdir -p -m 0755 /var/lib/init.d/snapshot
+ '[' '!' -d /var/lib/init.d/options ']'
+ mkdir -p -m 0755 /var/lib/init.d/options
+ '[' '!' -d /var/lib/init.d/started ']'
+ mkdir -p -m 0755 /var/lib/init.d/started
+ rm -rf /var/lib/init.d/depcache /var/lib/init.d/deptree '/var/lib/init.d/broken
/*' '/var/lib/init.d/snapshot/*'
+ retval=0
+ SVCDIR=/var/lib/init.d
+ DEPTYPES=need use
+ ORDTYPES=before after
+ export SVCDIR DEPTYPES ORDTYPES
+ cd /etc/init.d
+ /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/cachedepend
s.awk
######### START DEBUG TEST #########
+ echo '######### START DEBUG TEST #########'
########## END DEBUG TEST ##########
+ echo '########## END DEBUG TEST ##########'
+ bash /var/lib/init.d/depcache
+ /bin/gawk -f /lib/rcscripts/awk/functions.awk -f /lib/rcscripts/awk/gendepends.
awk
 *  Can't find service 'localmount' needed by 'consolefont';  continuing...
 *  Can't find service 'checkroot' needed by 'crypto-loop';  continuing...
 *  Can't find service 'modules' needed by 'crypto-loop';  continuing...
/var/lib/init.d/depcache: line 3163: syntax error near unexpected token `('
+ exit 0


So.... different errors this time.  But look!  Look at this line from debug.txt that I just quoted above:

/var/lib/init.d/depcache: line 3163: `  [ -e /etc/rc.coal depend() function ...'

/etc/rc.coal ???  I don't have a file by that name, nor can I find any references to "rc.coal" in any grep search throughout /etc, /sbin, or /var/lib   

In fact it looks like that somehow the line(s) in /var/lib/init.d/depcache that say "[ -e /etc/conf.d/<whatever>]" and "# Actual depend() function ..."  are getting munged together in that error message.  Is gawk the culprit somehow??

Man, this is a weird bug...
Comment 18 dswhite42 2004-04-14 20:16:31 UTC
I want to draw attention to bug 47420 and bug 47334 , both of which share similar symptoms as this one.  One submitter suggests that maybe two processes are simultaneously trying to write to /var/lib/init.d/depcache.  If that's true, then it would perhaps explain why in some cases depcache ends up with duplicate entries (bug 47420), missing chunks of script (bug 47334) or two lines combined into a single nonsensical line (as in my case).  

I'll work on this more tomorrow, but can anyone closer to the Gentoo source think of a case in which more than one process (at boot time) would be working on /var/lib/init.d/depcache simultaneously?
Comment 19 Andy Wang 2004-04-14 23:20:02 UTC
I think i found the problem. It's half caused by hotplug and half caused by the boot process. There are indeed the possibility of two or more processes doing a depscan.sh at the same time.

In my modules.autoload.d files I have my network adapter modules autoloaded (2 of them). Once the modules load, it triggers hotplug to go through it's chain and eventually runs net.[interface]. Since the net.[interface] script uses runscript.sh which calls depscan.sh, this is doing a depscan. Hotplug performs this scan in the background, so the rc script continues processing and it ends up running depscan.sh on its own. Therefore multiple processes are running depscan.sh screwing up the file. This problem goes away as soon as I remove all my network devices from the modules.autoload.d file. Unfortunately, this is not an ideal solution for me since using hotplug to autodetect my interfaces is sketchy (realtek 8139 chip and hotplug insists on using the 8139cp driver which doesn't work. I need the 8139too. grrr)

This problem is slightly related to http://forums.gentoo.org/viewtopic.php?t=113383&highlight=

I think some sort of lock needs to be implemented on depscan.sh

I also plan on opening a seperate bug requesting that hotplug not execute on initial module load.  This just doesn't seem like a good idea
Comment 20 Thomas Bellman 2004-04-15 05:47:31 UTC
*** Bug 47334 has been marked as a duplicate of this bug. ***
Comment 21 Thomas Bellman 2004-04-15 06:07:59 UTC
Andy,

I think that a simpler, and more robust, solution than introducing locking, would be to change /lib/rcscripts/awk/cachedepends.awk (which is the script actually generating the depcache file) to write to a unique temporary file, and then rename the file to the correct name when it has finished.  That way there will never be a half-written depcache file.
Comment 22 Andy Wang 2004-04-15 07:25:49 UTC
I'm thinking worst case scenario here :) but what happens if the two processes decide to rename the file at the exact same time? I'm thinking aloud here now :).  Never tried that.  The probability of it happening should be absolutely tiny, but   if it can break, it will somewhere.  Locking seems a bit overkill though. 

Theoretically, this situation shouldn't even be happening though.  I really dislike the fact that any scripts even get run this early on in the boot process.  With udev becoming the standard, I think these types of problems potentially could get worse. Maybe the solution isn't fix the bug, but rather prevent it from happening by changing hotplug so the scripts aren't run during that phase of boot.

I really don't like the idea of this "black magic" thing that hotplug is doing stuff so early on in the boot process.
Comment 23 Thomas Bellman 2004-04-15 08:12:11 UTC
There is a reason I wrote "more robust". :-)  The beauty of it is
that the rename(2) system call is atomic.  Basically you let the
kernel handle the locking -- that is much more reliable than
doing it in userspace, especially from shell and/or awk scripts.

If the two processes try to rename their files at the same time,
the kernel will handle one rename first, and then, when the first
has finished, the other.  That means one of the processes will
replace the other's file, but that should be OK.  Both will
generate a correct depcache file by themselves, and the problem
only stems from them *writing* to the same file at the same time.
Comment 24 Andy Wang 2004-04-15 08:29:44 UTC
aaahh.. that's good to know.  if the kernel truly does handle the locking, then yeah, i like that solution :)
Comment 25 dswhite42 2004-04-15 19:09:49 UTC
Success!!

Thomas and Andy, thanks for helping with this bug.  I patched /sbin/depscan.sh and /lib/rcscripts/awk/cachedepends.awk so that the contents of depcache are first stored in a temporary file (with a unique filename), then renamed to /var/lib/init.d/depcache.  With that in place, I tried rebooting a couple of times, and voila!  No error messages!

This probably doesn't really address the issue Andy raised about hotplug running scripts at such an early level.  But at least this may serve as a temporary remedy, if nothing else.

As this is my first awk script ever :-) I would suggest that it only be taken as a starting point for whatever the final patch is (if indeed this is the best way to address to the problem).

First, depscan.sh generates a temporary number.  I wasn't sure what the best way to guarantee a unique number was.  I have "mktemp", but since I can't be sure everyone has sys-apps/debianutils installed, I opted instead to use a (large) number created by combining "date +%s" (seconds since the epoch) "date +%N" (current time's nanoseconds) and $$ (pid).  Seems unique enough for these purposes, but suggestions are welcome.

Next, depscan.sh passes the temporary number in as a variable to cachedepends.awk .  cachedepends.awk then does its regular thing, except sending its output to "/tmp/depcache_<tempnum>.tmp".  Finally, gawk moves the temporary file to /var/lib/init.d/depcache .

I'll attach the patches for /sbin/depscan.sh and /lib/rcscripts/awk/cachedepends.awk, both based on the original files as they appear in sys-apps/baselayout-1.8.9

I also added Greg Kroah-Hartman (gregkh@gentoo.org) to the CC: list since he's been doing a lot of hotplug work lately - hope you don't mind, Greg.  Feel free to drop your address from the list if you prefer.
Comment 26 dswhite42 2004-04-15 19:10:51 UTC
Created attachment 29388 [details, diff]
Patch for /sbin/depscan.sh
Comment 27 dswhite42 2004-04-15 19:11:22 UTC
Created attachment 29389 [details, diff]
Patch for /lib/rcscripts/awk/cachedepends.awk
Comment 28 Aron Griffis (RETIRED) gentoo-dev 2004-04-15 20:55:23 UTC
You guys are awesome.  I made a minor change... moved the tempfile creation entirely into the awk script just in case it ever gets called without the depscan.sh wrapper.  I tested it like this:

  sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh & sudo /sbin/depscan.sh

It worked like a charm.  I'll get this into baselayout in a minute...
Comment 29 Aron Griffis (RETIRED) gentoo-dev 2004-04-15 20:55:40 UTC
*** Bug 47993 has been marked as a duplicate of this bug. ***
Comment 30 Aron Griffis (RETIRED) gentoo-dev 2004-04-15 21:05:43 UTC
It's in baselayout-1.8.11.  Enjoy!  Thanks again for all the help.
Comment 31 dswhite42 2004-04-16 08:48:01 UTC
Installed baselayout-1.8.11 and rebooted - everything seems to be great!  Thanks to Lang, Andy and Thomas for their help and Aron for developer support.

One last question - I see that you're using /bin/mktemp to make the temporary file.  Since mktemp is provided by sys-apps/debianutils, is it risky to assume everyone will have that package installed?

Comment 32 Andy Wang 2004-04-16 11:47:54 UTC
debianutils is part of the system profile, so an emerge system should have installed it.  Given that I think it's probably safe to assume that it should be installed.
Comment 33 Thomas Bellman 2004-04-16 14:36:58 UTC
Unfortunately, or something, the symtoms have disappeared for me, as I during this time have upgraded to new kernel, converted my system to use LVM, installed a number of other packages, et.c.  The timings of my boot process have apparently changed enough to not trigger the bug, and I am therefore unable to test and verify that the new baselayout version solves the problem.

But looking at the changes, it *looks* good, at least. :-)  (And I seem to have no problems with the new baselayout, so it is at the very least not any worse.)
Comment 34 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-04-16 16:51:57 UTC
Hm, it looks like the invocation of hotplug early in the system, when the
devices are loaded is causing some problems.  I originally had a test to
see if the network had started up before trying to actually bring up the
network devices, yet that made other people's systems broken :(

Anyway, we just need to be aware of the fact that hotplug can get called
at pretty much any moment in time in the boot process (or even in the shutdown
one too.)
Comment 35 Andy Wang 2004-04-16 17:03:15 UTC
Yup, Bug 46954 is also another example of an issue caused by hotplug being called early. 
Comment 36 Marcel Meckel 2004-10-13 01:56:53 UTC
Hi,

first: why can't i reopen the bug? is this only possible for admins?

second: the bug still exists at this time. I emerged yesterday so all installed packages are as recent as possible and then rebooted because of the new kernel. While booting some boot services were run twice (like modprobing, setting hostname) and at the point where fsck'ing was run the second time it stops with a rescue shell cause /boot was already mounted at this time.

/etc/modules.autoload.d/kernel-2.6 contained e1000 module to autoload on boot, hotplug is compiled into the kernel and installed as well in gentoo.

Portage 2.0.50-r11 (default-x86-2004.2, gcc-3.3.4, glibc-2.3.4.20040808-r1, 2.6.8-gentoo-r7)
=================================================================
System uname: 2.6.8-gentoo-r7 i686 Intel(R) Xeon(TM) CPU 2.80GHz
Gentoo Base System version 1.4.16
Autoconf: sys-devel/autoconf-2.59-r5
Automake: sys-devel/automake-1.8.5-r1
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=pentium4 -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
COMPILER=""
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=pentium4 -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="ftp://localhost/gentoo ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage"
USE="berkdb bitmap-fonts crypt curl devfs26 dvd f77 ftp gdbm ipv6 libg++ mime ncurses nls nocd nocstrike nodod pam perl posix python readline rtc samba sse ssl tcpd usb uudeview x86 xml2 xprint zlib"

sys-apps/baselayout 1.9.4-r3
sys-apps/hotplug 20040401
sys-apps/debianutils 1.16.7-r4

Marcel.
Comment 37 Seemant Kulleen (RETIRED) gentoo-dev 2004-10-13 13:45:17 UTC
reopening -- only the submitter or a gentoo dev can reopen
Comment 38 SpanKY gentoo-dev 2004-10-13 15:36:51 UTC
do you use the hotplug init script ?
Comment 39 Marcel Meckel 2004-10-20 05:15:46 UTC
> do you use the hotplug init script ?

Yes, i'm using the hotplug init script in default runlevel as suggested by gentoo install instructions.

Marcel.
Comment 40 SpanKY gentoo-dev 2004-10-20 05:38:50 UTC
so stop using it
Comment 41 SpanKY gentoo-dev 2004-10-31 01:38:54 UTC
*** Bug 46954 has been marked as a duplicate of this bug. ***
Comment 42 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-11-10 11:31:04 UTC
Not a hotplug specific bug.

Oh, and there's no hotplug startup service anymore :)
Comment 43 SpanKY gentoo-dev 2004-11-10 14:10:04 UTC
old bug