As I ran catalyst, I was encountering problems that occasionally left /dev in the chroot permenantly mounted, forcing me to reboot to try again. Now that I've gotten everything to build, /dev still won't umount. Below is the output at the end of the emerge cycle automated by catalyst: * Regenerating GNU info directory index... * Processed 203 info files. Running command "/bin/bash /usr/lib/catalyst/targets/livecd-stage1/livecd-stage1.sh preclean" umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy !!! catalyst: Couldn't umount bind mount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev !!! catalyst: Couldn't umount one or more bind-mounts; aborting for safety. !!! catalyst: could not complete build icebox catalyst # lsof tmp/default/livecd-stage1-i686-methat-0/dev/ COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME devfsd 954 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13086 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13087 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13088 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13089 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13090 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ agetty 13091 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/ I can't finish building a livecd like this now can I? :(
Created attachment 45283 [details] lsof /dev output Here's the output of lsof /dev for comparison.
I may have isolated the problem. Further tests show the problem manifesting and being worked around. * Regenerating GNU info directory index... * Processed 203 info files. Running command "/bin/bash /usr/lib/catalyst/targets/livecd-stage1/livecd-stage1.sh preclean" umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy !!! catalyst: Couldn't umount bind mount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev !!! catalyst: Couldn't umount one or more bind-mounts; aborting for safety. !!! catalyst: could not complete build icebox catalyst # umount tmp/default/livecd-stage1-i686-methat-0/dev umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy icebox catalyst # umount -f tmp/default/livecd-stage1-i686-methat-0/dev umount2: Device or resource busy umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy umount2: Device or resource busy umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy icebox catalyst # ps -a PID TTY TIME CMD 13817 pts/0 00:00:00 bash 13863 pts/0 00:00:00 screen 15663 pts/10 00:00:00 modprobe 9970 pts/2 00:00:00 man 9973 pts/2 00:00:00 sh 9974 pts/2 00:00:00 sh 9979 pts/2 00:00:00 less 563 pts/6 00:00:00 vim 7808 pts/8 00:00:00 gconfd-2 19729 pts/8 00:00:00 gconfd-2 21689 pts/8 00:00:00 ps icebox catalyst # kill 7808 19729 icebox catalyst # umount tmp/default/livecd-stage1-i686-methat-0/dev
my temporary fix is going to be to freeze catalyst (^Z is Jesus or something) near the end and kill off gconf-2. Seems to work.
OK... so your *host system* is doing something to open files in the chroot, and this is supposed to be catalyst's fault? Honestly, the gconf-2 is running on your host machine, not within catalyst, so it cannot be a problem with catalyst. I would suggest doing catalyst runs outside of gnome, or see if there is a way to keep gconf-2 from opening that device.
Bug Wranglers: This is not a catalyst problem.
OK... found the right people
Try it without hald running (/etc/init.d/hald stop).
i think comment #4 is on track.. i can't see why gconf would be running unless you are running something gnome inside the chroot.
Right now I have a 'catalyst' making a GRP. Here are the gconf processes: 17321 pts/2 S 0:00 /usr/libexec/gconfd-2 19 19903 pts/2 S 0:00 /usr/libexec/gconfd-2 21 I get into /proc/17321. Here is the output of "ls -l": -r-------- 1 root root 0 dic 11 17:34 auxv -r--r--r-- 1 root root 0 dic 11 17:34 cmdline lrwxrwxrwx 1 root root 0 dic 11 17:34 cwd -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2 -r-------- 1 root root 0 dic 11 17:34 environ lrwxrwxrwx 1 root root 0 dic 11 17:34 exe -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2/usr/libexec/gconfd-2 dr-x------ 2 root root 0 dic 11 17:34 fd -r--r--r-- 1 root root 0 dic 11 17:34 maps -rw------- 1 root root 0 dic 11 17:34 mem -r--r--r-- 1 root root 0 dic 11 17:34 mounts lrwxrwxrwx 1 root root 0 dic 11 17:34 root -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2 -r--r--r-- 1 root root 0 dic 11 17:29 stat -r--r--r-- 1 root root 0 dic 11 17:34 statm -r--r--r-- 1 root root 0 dic 11 17:34 status dr-xr-xr-x 3 root root 0 dic 11 17:34 task -r--r--r-- 1 root root 0 dic 11 17:34 wchan So, the process is run IN the chroot environment. Why ? The contents of 'cmdline' are: /usr/libexec/gconfd-2 19 The processes are 'sons' of init.... How can I know the father, who born them? Am I misunderstanding Comment #4?
Re comment #4: My "host system" is apparently creating gconfd-2 processes when certain packages (haven't tracked which ones) are installed by portage.
Sorry, sarcasm wasn't obvious enough in that last post. What's going on is that when certain packages are installed by portage, gconfd-2 is run. This holds true inside catalyst's chroot, even when just untaring tbz2 packages. Catalyst doesn't know about this, and doesn't track its own children (can it?), so it doesn't kill off whatever it left running. This leads to having the chroot filesystem open--for some odd reason /dev is open in this case--and leaving those branches impossible to unmount. The solution would be to make catalyst track and term-sleep5-kill-sleep5 all its children before doing the umount; however, I don't know how to do this.
Mmmm well, what is "the host machine"? I understood that "the host machine" is the one which runs catalyst (running in the real root, not in the chrooted environment of the system in creation). Isn't it? Does anybody know if that started happening due to a new catalyst version? Maybe that's related to a portage version. That 'gconfd-2' is never run in my "host system" (as I understand it), only in the catalyst chroot env. Re Comment #11: Catalyst isn't anytime the father of the gconfd-2 processes. If the father dies, Init adopts the sons, NOT the grandfather. It would be insane -IMO- to keep track of the 'family' created by Catalyst. :) You say "even when just untaring tbz2 packages.". Which kind of situation is that? I don't understand.
gconf-2 does get run by gnome ebuilds, but it should shutdown after some time of inactivity.
foser: thanks for your help so far in looking into this Strangely enough, I started getting bitten by this myself. I am not running hald, so I'm sure that cannot be it. However, I am not sure what is causing gconfd-2 to even start. I, too, have a completed build now, so the packages are simply being untarred. I have, however, added gnome to my build, so I am just wondering if there is any gnome package that is running gconfd-2 for any reason. Also, how long does it take to timeout and die off? I haven't had it die off yet and I've left it overnight. What I am really wondering is what is causing gconfd-2 to latch onto /dev/null within the chroot?
it gets run to set up the base gconf values in /etc off the gconf schemas installed by a pack in postinst, any gconf using pack should run it (any gnome app). I'm not sure how catalyst does its thing, but i assumed based off of this that it installed packs inside a chroot & so gconftool gets run there & opens /dev/null for itself. The problem really is gconfd-2 not dieing out after this, i cannot reproduce this with my current version 2.8.1 . I do remember seeing it not dieing quickly not so long ago, so it might be fixed in more recent versions (please test) or otherwise it might be in a lower-level lib, which may be harder to track.
Well, I am building with the current stable (gconf-2.8.0.1) version, so it could be something that has been fixed. As a temporary solution, I have added a "killall -9 gconfd-2" into the catalyst scripts, so it kills gconfd-2 before trying the umount. This is not acceptable, however, to go into production, as it will kill the copy running for any user who uses Gnome as their desktop. I'll test it with 2.8.1 shortly to see if that solves the problem and will give feedback here.
foser: According to Robert Paskowitz on the gentoo-catalyst mailing list, this is still happening with gconf-2.8.1-r1 and catalyst. Personally, I am just running a killall to clean it up. Is this the best solution? What caused the change in behavior?
I have added the killall back into catalyst pending further response from gnome@
This is in catalyst 1.1.4
I really don't know what exactly caused this & as said I don't see the process lingering here anymore. killall is not good, I'm not sure how harsh it is, but it might lead to corruption or otherwise maybe not correctly updated databases. Better would be to use 'gconftool-2 --shutdown' if it works ok. Even better would be to really try and solve it, but I'm quite sure we won't have time to look into it in the near future.
I'm using foser's method for now. I really don't know why this is happening, and I won't have time to proerly look at it until well after the release.
hmmm. fuser -m -k `catalyst build dir` ( not the mount/dev but the mount/ ) Should in fact terminate all such lingering child process'.
I'm testing that now, Spider... let's hope it works, as that would be a very valid solution.
Yeah... so "fuser -k -m ${clst_chroot_path}" killed every process that was accessing any file on my root partition, including my shell and X. Let's not do that one again, shall we? *grin* Anyway, it looks like the search for a proper solution still lives.
It looks like the only thing that will work is a kill. I've added the kill back into catalyst and it appears to work. If this has any ill effects on gnome, there's not much I can do without a better solution. As a gnome user myself, I have never noticed a problem, so I'm going to say that it must be "safe enough" to use.
OK... catalyst 1.1.8 is now in portage
here's a local change I'm using, from /usr/lib/modules/generic_stage_target.py : def unbind(self): ouch=0 mypath=self.settings["chroot_path"] myrevmounts=self.mounts[:] myrevmounts.reverse() + os.system("fuser -kv " +mypath) # unmount in reverse order for nested bind-mounts for x in myrevmounts: if not os.path.exists(mypath+x): continue Does exactly what you want, if not in the prettiest of ways. :)
*** Bug 80573 has been marked as a duplicate of this bug. ***
...
There's a new fix for this in CVS...
This is in the catalyst 1.1.10_pre-series in portage...