73363 – Gconf-2 keeps catalyst from umount dev in the chroot

Bug 73363 - Gconf-2 keeps catalyst from umount dev in the chroot

Summary: Gconf-2 keeps catalyst from umount dev in the chroot

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Hosted Projects
Classification:	Unclassified
Component:	Catalyst (show other bugs)
Hardware:	All All

Importance:	Normal normal (vote)
Assignee:	Gentoo Catalyst Developers

URL:
Whiteboard:	InCVS
Keywords:

Duplicates (1):	80573 (view as bug list)
Depends on:
Blocks:

Reported:	2004-12-04 11:13 UTC by John Richard Moser
Modified:	2005-07-05 11:00 UTC (History)
CC List:	5 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
lsof /dev output (lsofdev,8.45 KB, text/plain) 2004-12-04 11:13 UTC, John Richard Moser	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Richard Moser 2004-12-04 11:13:24 UTC

As I ran catalyst, I was encountering problems that occasionally left /dev in the chroot permenantly mounted, forcing me to reboot to try again. Now that I've gotten everything to build, /dev still won't umount. Below is the output at the end of the emerge cycle automated by catalyst:

* Regenerating GNU info directory index...
* Processed 203 info files.
Running command "/bin/bash /usr/lib/catalyst/targets/livecd-stage1/livecd-stage1.sh preclean"
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
!!! catalyst: Couldn't umount bind mount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev

!!! catalyst: Couldn't umount one or more bind-mounts; aborting for safety.
!!! catalyst: could not complete build
icebox catalyst # lsof tmp/default/livecd-stage1-i686-methat-0/dev/
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
devfsd 954 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13086 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13087 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13088 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13089 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13090 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/
agetty 13091 root cwd DIR 0,12 0 1 tmp/default/livecd-stage1-i686-methat-0/dev/

I can't finish building a livecd like this now can I? :(

Comment 1 John Richard Moser 2004-12-04 11:13:54 UTC

Created attachment 45283 [details]
lsof /dev output

Here's the output of lsof /dev for comparison.

Comment 2 John Richard Moser 2004-12-04 18:11:04 UTC

I may have isolated the problem.  Further tests show the problem manifesting and being worked around.


 * Regenerating GNU info directory index...
 * Processed 203 info files.
Running command "/bin/bash /usr/lib/catalyst/targets/livecd-stage1/livecd-stage1.sh preclean"
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
!!! catalyst: Couldn't umount bind mount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev

!!! catalyst: Couldn't umount one or more bind-mounts; aborting for safety.
!!! catalyst: could not complete build
icebox catalyst # umount tmp/default/livecd-stage1-i686-methat-0/dev
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
icebox catalyst # umount -f tmp/default/livecd-stage1-i686-methat-0/dev
umount2: Device or resource busy
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
umount2: Device or resource busy
umount: /home/shared/livecd/catalyst/tmp/default/livecd-stage1-i686-methat-0/dev: device is busy
icebox catalyst # ps -a
  PID TTY          TIME CMD
13817 pts/0    00:00:00 bash
13863 pts/0    00:00:00 screen
15663 pts/10   00:00:00 modprobe
 9970 pts/2    00:00:00 man
 9973 pts/2    00:00:00 sh
 9974 pts/2    00:00:00 sh
 9979 pts/2    00:00:00 less
  563 pts/6    00:00:00 vim
 7808 pts/8    00:00:00 gconfd-2
19729 pts/8    00:00:00 gconfd-2
21689 pts/8    00:00:00 ps
icebox catalyst # kill 7808 19729
icebox catalyst # umount tmp/default/livecd-stage1-i686-methat-0/dev

Comment 3 John Richard Moser 2004-12-04 21:16:12 UTC

my temporary fix is going to be to freeze catalyst (^Z is Jesus or something) near the end and kill off gconf-2.

Seems to work.

Comment 4 Chris Gianelloni (RETIRED) gentoo-dev

2004-12-07 15:34:52 UTC

OK... so your *host system* is doing something to open files in the chroot, and this is supposed to be catalyst's fault?  Honestly, the gconf-2 is running on your host machine, not within catalyst, so it cannot be a problem with catalyst.  I would suggest doing catalyst runs outside of gnome, or see if there is a way to keep gconf-2 from opening that device.

Comment 5 Chris Gianelloni (RETIRED) gentoo-dev

2004-12-07 16:25:35 UTC

Bug Wranglers:  This is not a catalyst problem.

Comment 6 Chris Gianelloni (RETIRED) gentoo-dev

2004-12-07 16:27:27 UTC

OK... found the right people

Comment 7 Mike Gardiner (RETIRED) gentoo-dev

2004-12-07 17:40:30 UTC

Try it without hald running (/etc/init.d/hald stop).

Comment 8 foser (RETIRED) gentoo-dev

2004-12-11 07:26:51 UTC

i think comment #4 is on track.. i can't see why gconf would be running unless you are running something gnome inside the chroot.

Comment 9 Lluís Batlle i Rossell 2004-12-11 08:41:01 UTC

Right now I have a 'catalyst' making a GRP.

Here are the gconf processes:
17321 pts/2    S      0:00 /usr/libexec/gconfd-2 19
19903 pts/2    S      0:00 /usr/libexec/gconfd-2 21

I get into /proc/17321. Here is the output of "ls -l":
-r--------  1 root root 0 dic 11 17:34 auxv
-r--r--r--  1 root root 0 dic 11 17:34 cmdline
lrwxrwxrwx  1 root root 0 dic 11 17:34 cwd -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2
-r--------  1 root root 0 dic 11 17:34 environ
lrwxrwxrwx  1 root root 0 dic 11 17:34 exe -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2/usr/libexec/gconfd-2
dr-x------  2 root root 0 dic 11 17:34 fd
-r--r--r--  1 root root 0 dic 11 17:34 maps
-rw-------  1 root root 0 dic 11 17:34 mem
-r--r--r--  1 root root 0 dic 11 17:34 mounts
lrwxrwxrwx  1 root root 0 dic 11 17:34 root -> /stuff/catalyst/tmp/desktop/grp-i686-snod-20041206-r2
-r--r--r--  1 root root 0 dic 11 17:29 stat
-r--r--r--  1 root root 0 dic 11 17:34 statm
-r--r--r--  1 root root 0 dic 11 17:34 status
dr-xr-xr-x  3 root root 0 dic 11 17:34 task
-r--r--r--  1 root root 0 dic 11 17:34 wchan

So, the process is run IN the chroot environment. Why ?
The contents of 'cmdline' are:
/usr/libexec/gconfd-2 19

The processes are 'sons' of init.... How can I know the father, who born them?

Am I misunderstanding Comment #4?

Comment 10 John Richard Moser 2004-12-11 10:39:35 UTC

Re comment #4:

My "host system" is apparently creating gconfd-2 processes when certain packages (haven't tracked which ones) are installed by portage.

Comment 11 John Richard Moser 2004-12-11 10:43:44 UTC

Sorry, sarcasm wasn't obvious enough in that last post.

What's going on is that when certain packages are installed by portage, gconfd-2 is run.  This holds true inside catalyst's chroot, even when just untaring tbz2 packages.  Catalyst doesn't know about this, and doesn't track its own children (can it?), so it doesn't kill off whatever it left running.  This leads to having the chroot filesystem open--for some odd reason /dev is open in this case--and leaving those branches impossible to unmount.

The solution would be to make catalyst track and term-sleep5-kill-sleep5 all its children before doing the umount; however, I don't know how to do this.

Comment 12 Lluís Batlle i Rossell 2004-12-11 10:51:19 UTC

Mmmm well, what is "the host machine"? I understood that "the host machine" is the one which runs catalyst (running in the real root, not in the chrooted environment of the system in creation). Isn't it?

Does anybody know if that started happening due to a new catalyst version? Maybe that's related to a portage version. That 'gconfd-2' is never run in my "host system" (as I understand it), only in the catalyst chroot env.

Re Comment #11:
Catalyst isn't anytime the father of the gconfd-2 processes. If the father dies, Init adopts the sons, NOT the grandfather. It would be insane -IMO- to keep track of the 'family' created by Catalyst. :)

You say "even when just untaring tbz2 packages.". Which kind of situation is that? I don't understand.

Comment 13 foser (RETIRED) gentoo-dev

2004-12-11 11:47:30 UTC

gconf-2 does get run by gnome ebuilds, but it should shutdown after some time of inactivity.

Comment 14 Chris Gianelloni (RETIRED) gentoo-dev

2004-12-16 06:00:39 UTC

foser: thanks for your help so far in looking into this

Strangely enough, I started getting bitten by this myself.  I am not running hald, so I'm sure that cannot be it.  However, I am not sure what is causing gconfd-2 to even start.  I, too, have a completed build now, so the packages are simply being untarred.  I have, however, added gnome to my build, so I am just wondering if there is any gnome package that is running gconfd-2 for any reason.  Also, how long does it take to timeout and die off?  I haven't had it die off yet and I've left it overnight.  What I am really wondering is what is causing gconfd-2 to latch onto /dev/null within the chroot?

Comment 15 foser (RETIRED) gentoo-dev

2004-12-16 06:17:45 UTC

it gets run to set up the base gconf values in /etc off the gconf schemas installed by a pack in postinst, any gconf using pack should run it (any gnome app).

I'm not sure how catalyst does its thing, but i assumed based off of this that it installed packs inside a chroot & so gconftool gets run there & opens /dev/null for itself. The problem really is gconfd-2 not dieing out after this, i cannot reproduce this with my current version 2.8.1 . I do remember seeing it not dieing quickly not so long ago, so it might be fixed in more recent versions (please test) or otherwise it might be in a lower-level lib, which may be harder to track.

Comment 16 Chris Gianelloni (RETIRED) gentoo-dev

2004-12-16 06:46:23 UTC

Well, I am building with the current stable (gconf-2.8.0.1) version, so it could be something that has been fixed.  As a temporary solution, I have added a "killall -9 gconfd-2" into the catalyst scripts, so it kills gconfd-2 before trying the umount.  This is not acceptable, however, to go into production, as it will kill the copy running for any user who uses Gnome as their desktop.  I'll test it with 2.8.1 shortly to see if that solves the problem and will give feedback here.

Comment 17 Chris Gianelloni (RETIRED) gentoo-dev

2005-01-27 10:05:02 UTC

foser: According to Robert Paskowitz on the gentoo-catalyst mailing list, this is still happening with gconf-2.8.1-r1 and catalyst.

Personally, I am just running a killall to clean it up.  Is this the best solution?  What caused the change in behavior?

Comment 18 Chris Gianelloni (RETIRED) gentoo-dev

2005-01-28 10:38:38 UTC

I have added the killall back into catalyst pending further response from gnome@

Comment 19 Chris Gianelloni (RETIRED) gentoo-dev

2005-01-28 21:30:03 UTC

This is in catalyst 1.1.4

Comment 20 foser (RETIRED) gentoo-dev

2005-01-31 11:08:29 UTC

I really don't know what exactly caused this & as said I don't see the process lingering here anymore.

killall is not good, I'm not sure how harsh it is, but it might lead to corruption or otherwise maybe not correctly updated databases. Better would be to use 'gconftool-2 --shutdown' if it works ok. Even better would be to really try and solve it, but I'm quite sure we won't have time to look into it in the near future.

Comment 21 Chris Gianelloni (RETIRED) gentoo-dev

2005-03-04 21:10:14 UTC

I'm using foser's method for now.  I really don't know why this is happening, and I won't have time to proerly look at it until well after the release.

Comment 22 Spider (RETIRED) gentoo-dev

2005-03-23 16:12:19 UTC

hmmm. 

fuser -m -k `catalyst build dir` 
( not the mount/dev but the mount/  )    Should in fact terminate all such lingering child process'.

Comment 23 Chris Gianelloni (RETIRED) gentoo-dev

2005-03-23 16:37:17 UTC

I'm testing that now, Spider... let's hope it works, as that would be a very valid solution.

Comment 24 Chris Gianelloni (RETIRED) gentoo-dev

2005-03-23 20:00:41 UTC

Yeah... so "fuser -k -m ${clst_chroot_path}" killed every process that was accessing any file on my root partition, including my shell and X.  Let's not do that one again, shall we?  *grin*

Anyway, it looks like the search for a proper solution still lives.

Comment 25 Chris Gianelloni (RETIRED) gentoo-dev

2005-03-24 06:14:13 UTC

It looks like the only thing that will work is a kill.

I've added the kill back into catalyst and it appears to work.  If this has any ill effects on gnome, there's not much I can do without a better solution.  As a gnome user myself, I have never noticed a problem, so I'm going to say that it must be "safe enough" to use.

Comment 26 Chris Gianelloni (RETIRED) gentoo-dev

2005-03-24 12:36:23 UTC

OK... catalyst 1.1.8 is now in portage

Comment 27 Spider (RETIRED) gentoo-dev

2005-03-29 12:56:29 UTC

here's a local change I'm using, from /usr/lib/modules/generic_stage_target.py :

        def unbind(self):
                ouch=0
                mypath=self.settings["chroot_path"]
                myrevmounts=self.mounts[:]
                myrevmounts.reverse()
+                os.system("fuser -kv " +mypath)
                # unmount in reverse order for nested bind-mounts
                for x in myrevmounts:
                        if not os.path.exists(mypath+x):
                                continue


Does exactly what you want, if not in the prettiest of ways. :)

Comment 28 Chris Gianelloni (RETIRED) gentoo-dev

2005-04-14 19:04:50 UTC

*** Bug 80573 has been marked as a duplicate of this bug. ***

Comment 29 Chris Gianelloni (RETIRED) gentoo-dev

2005-05-19 06:23:41 UTC

...

Comment 30 Chris Gianelloni (RETIRED) gentoo-dev

2005-05-19 06:24:00 UTC

There's a new fix for this in CVS...

Comment 31 Chris Gianelloni (RETIRED) gentoo-dev

2005-07-05 11:00:17 UTC

This is in the catalyst 1.1.10_pre-series in portage...