Recently I updated baselayout - and my system stopped halting and rebooting. After the message: Unmounting filesystems... INIT: no more processes left in this runlevel it just stalled. After endless headscratching I updated my baselayout to ~x86 (note, I have an exclusively x86 system) and things now work. Turns out the "stable" x86 baselayout update no longer shuts down my lvm2 partitions before halting; baselayout 1.11.14-r7 and r8 are broken from this point of view.I think the last 'working' 'stable' (x86) baselayout was 1.11.13 something. sys-apps/baselayout-1.12.0_pre17-r2 which is ~x86 works.
which version of lvm are you using? Are there any other errors? bootlogs anything else that might indicate what is happening? the latest stable baselayout is baselayout-1.11.14-r8.ebuild
lvm2 is version 2.01.09 I saw nothing useful in boot messages. But with unstable baselayout (and with old stable baselayout)at shutdown I saw the shutdown message of each lvm partition exactly at the point were the bugged system hangs and says: Unmounting filesystems... INIT: no more processes left in this runlevel
baselayout-1.12.0_pre18 breaks unmounting at shutdown with lvm2-2.02.04-r1, though my system reboots fine. Reverting to baselayout-1.12.0_pre17-r3 fixed the unmounting problem.
(In reply to comment #3) > baselayout-1.12.0_pre18 breaks unmounting at shutdown That's fixed in pre18-r1. Can we mark this as closed as the whole issue is fixed now in pre18-r1?
It would be sort of neat though if it were fixed in latest stable package - or will baselayout-1.12.0_pre18-r1 be marked stable ? Rather annoying to have a completely x86 system in which only baselayout has to be unstable for things to work.
FWIW, I agree with the last comment about the latest x86 baselayout (ie, it should work properly with lvm2 and current kernels). Also, sys-apps/baselayout-1.12.0_pre18-r1 is not fixed (for me on amd64 anyway). Prior to that version, all 5 of my logical volumes failed to shutdown properly (at least according to the messages from the baselayout function). After updating to the above version, I saw all of them unmount properly at least once, but since then the last 2 (/dev/vg/tmp and /dev/vg/usr) always have always failed, while the first 3 volumes unmout properly. Go figure... This is all on a brand-spanking-new amd64 install (2006.0 profile) with several kernels in the 2.615 to 2.6.16.11 range (currently the latter). Although my arch is amd64, I have a large set of ~amd64 packages, including baselayout, portage, the toolchain, xorg 7, and other stuff. Portage 2.1_pre10-r2 (default-linux/amd64/2006.0, gcc-3.4.5, glibc-2.4-r1, 2.6.16.11 x86_64) ================================================================= System uname: 2.6.16.11 x86_64 AMD Athlon(tm) 64 Processor 3000+ Gentoo Base System version 1.12.0_pre18 ccache version 2.3 [enabled] dev-lang/python: 2.4.2 dev-util/ccache: 2.3 dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.12 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r2 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r3 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=k8 -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/fax /usr/lib64/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control /var/spool/fax/etc" CONFIG_PROTECT_MASK="/etc/eselect/compiler /etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=k8 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig buildpkg ccache cvs distlocks metadata-transfer multilib-strict sandbox sfperms strict userpriv usersandbox" GENTOO_MIRRORS="http://kuroshin.arnolds.bogus/gentoo/" LDFLAGS="-Wl,-O1" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://kuroshin.arnolds.bogus/gentoo-portage" USE="amd64 X a52 aac aalib accessibility acl acpi alsa ansi artworkextra avi bitmap-fonts bonobo browserplugin bzip2 cairo cdparanoia cdr cli crypt cups dbus dga directfb divx4linux dri dv dvd dvdr dvdread dynagraph eds emacs emboss encode esd evo f77 fam fame fbcon ffmpeg firefox fits flac foomaticdb fortran freetype freetype2 gb gcj gd gdbm geos gif gimp gmp gnome gphoto2 gpm gps graphviz grass gs gstreamer gtk gtk2 gtkhtml guile hal howl icq ieee1394 imagemagick imap imlib ipv6 isdnlog jabber jasper java jbig jikes jpeg jpg junit lame lapack lcms lesstif lirc lm_sensors logrotate lzw lzw-tiff mozilla mp3 mpeg mysql nas nautilus ncurses netcdf nfs nls nolvmstatic nptl nptlonly nsplugin numeric ogdi ogg oggvorbis opengl oss pam pcre pda pdflib perl plotutils png postgres pppd python qt quicktime readline reflection rtc samba sasl sdl session slp spamassassin spell spl ssl subtitles svg tcltk tcpd tetex tiff truetype truetype-fonts type1-fonts unicode usb v4l v4l2 vorbis wifi xanim xext xine xml xml2 xmms xorg xpm xv xvid xvmc zeo zlib zvbi elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux userland_GNU video_cards_via video_cards_radeon video_cards_vesa" Unset: ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LINGUAS
I also experience problems with baselayout 1.11.14-r8. The halt/reboot process stalls when unmounting the filesystems. I do not use lvm btw. In my case the culprit is the '/bin/fuser -k -m -9 "${x}" &>/dev/nul' command in halt.sh. This kills the rc process when unmounting the /usr partition. After commenting out this line the reboot and halt procedure works as it should. BTW, I don't know how the bugzilla interface works, so if I am making a mess of things I apologize in advance.
Updated info: Still using the following on the new amd64 install: sys-fs/lvm2 2.01.09 sys-fs/mdadm 1.12.0 sys-fs/device-mapper 1.02.02 sys-apps/baselayout 1.12.0_pre19-r2 After some experimenting, I've found that the volume shutdown only works correctly if a do a shutdown from the console (ie, a 3-fingered salute, a reboot command, etc) with only gdm running (meaning I've logged out of X and switched to a vt). If I do anything else, such as the above without logging out of X, or using either the gnome or gdm "Shutdown" menu option, the last two of my 5 logical volumes fail to unmount properly (the culprits being /usr and /var). This points to some X-related and/or gdm-spawned process(s) holding onto some file handle (or whatever) until the RC_VOLUME stop-scripts kick in and generate the "Failure to shutdown volume blah" messages. At least it sounds good to me after a few beers...
Sounds as if your problem is different from mine. In my case shutdown or reboot simply di not unmount lvm modules, even if I shut down X and went to single user mode before giving the shutdown command from console. In any case, things work with ~x86 baselayout so I am now using an x86 system with ~x86 baselayout.
Ok this is definitely BAD ! The bug has now been introduced into ~x86 baselayout, too. Baselayout-1.12.0-r1 has the same problem. Luckily before installing I created a quickpkg version of baselayout-1.12.0_pre19-r2, which of course works. As of writing therefore lvm volumes are not unmounted at shutdown (which thus stalls indefinitely until a ctrl-alt-canc) both with x86 and ~x86 baselayout. Could this be something in /etc/init.d/halt.sh ? This script has changed a lot between 1.12.0_pre19-r2 and 1.12.0-r1 : by the way, I think maybe 1.12.0 worked, but I'm not sure.
Could someone describe their fs layout so I can try to simulate this? Thanks
Here is my /etc/fstab (without considering comments); /dev/vg is is the lvm2 volume, divided into several partitions: /dev/hda1 /boot ext3 /dev/hda5 / reiserfs /dev/hda6 none swap /dev/hdb6 none swap /dev/vg/usr /usr reiserfs /dev/vg/local /usr/local reiserfs /dev/vg/mike /home/mike reiserfs /dev/vg/gabry /home/gabry reiserfs /dev/vg/giovy /home/giovy reiserfs /dev/vg/giacomo /home/giacomo reiserfs /dev/vg/caterina /home/cate reiserfs /dev/vg/pietro /home/pietro reiserfs /dev/vg/paolo /home/paolo reiserfs /dev/mapper/crypto /mnt/crypto reiserfs /dev/vg/opt /opt reiserfs /dev/vg/tmp /tmp reiserfs /dev/vg/var /var reiserfs /dev/hdc /mnt/dvd iso9660 /dev/hdd /mnt/cdrw iso9660 /dev/hdb1 /mnt/win98sys vfat /dev/hdb9 /mnt/winwrite vfat /dev/fd0 /mnt/floppy vfat /dev/sda1 /mnt/camera vfat none /proc proc none /dev/shm tmpfs
/dev/sda1 /boot ext3 noauto,noatime 1 1 /dev/sda2 / ext3 noatime 0 0 /dev/sda3 /var ext3 noatime 0 0 /dev/sda5 /opt ext3 noatime 0 0 /dev/sda6 /usr ext3 noatime 0 0 /dev/sda7 /usr/portage ext3 noatime 0 0 /dev/sda8 /var/tmp/portage ext3 noatime0 0 /dev/sda9 /home ext3 noatime 0 0 /dev/hdb2 /home/share ext3 noatime 0 0 /dev/hdb1 none swap sw 0 0 /dev/cdrom /mnt/cdrom0 iso9660 noauto,ro,user 0 0 /dev/cdrom1 /mnt/cdrom1 iso9660 noauto,ro 0 0 none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0
Just a comment: baselayout-1.12.1 is still broken, I tried it today. Last working baselayout is 1.12.0_pre19-r2 (~x86)
(In reply to comment #14) > Just a comment: baselayout-1.12.1 is still broken, I tried it today. > Last working baselayout is 1.12.0_pre19-r2 (~x86) I have setup vg-usr and vg-portage mounted as /usr and /usr/portage respectively and cannot simulate your shutdown issue at all :( Questions What is RC_VOLUME_ORDER set to in /etc/conf.d/rc? Do you see any LVM messages when shutting down? Does it work if you insert this on line 119 in /etc/init.d/halt.sh (after if and before /bin/fuser) [[ ${x} == "/usr" ]] && continue
RC_VOLUME_ORDER="raid evms lvm dm" I don't see any LVM messages Inserting this on line 119 in /etc/init.d/halt.sh (after if and before /bin/fuser) [[ ${x} == "/usr" ]] && continue fixes baselayout 1.12.1 for me ! Does it solve the problems for the others here too ? I suspect Steve Arnold's bug is different ...
(In reply to comment #16) > I don't see any LVM messages > > Inserting this on line 119 in /etc/init.d/halt.sh (after if > and before /bin/fuser) > [[ ${x} == "/usr" ]] && continue > > fixes baselayout 1.12.1 for me ! Alas, that is not a valid fix :( Do you see any LVM messages with that line in though? Put this line just before the line you added [[ ${x} == "/usr" ]] && ps -p $(fuser -m /usr 2>/dev/null) Attach the output here to show what processes are using /usr still
Here is the output you asked for: PID TTY STAT TIME COMMAND 5504 ? S 0:00 ddclient - sleeping for 140 seconds 7147 ? Ss 0:00 /bin/bash /etc/init.d/halt.sh reboot Alas - removing ddclient from my system doesn't fix the bug.
I have posted a new halt.sh which logs to tty8 on the forums. http://forums.gentoo.org/viewtopic-p-3438893.html#3438893 The results are interesting, I have given them here for your convenience... Stopping udev Stopping swap Writing reboot record Removing loopback devices Unmounting filesystems /var/log /var /usr/portage/distfiles /usr/portage /usr /tmp /opt /home Processes now active PID TTY TIME CMD 1 ? 00:00:01 init 2 ? 00:00:00 migration/0 3 ? 00:00:00 ksoftirqd/0 4 ? 00:00:00 migration/1 5 ? 00:00:00 ksoftirqd/1 6 ? 00:00:00 events/0 7 ? 00:00:00 events/1 8 ? 00:00:00 khelper 9 ? 00:00:00 kthread 12 ? 00:00:00 kblockd/0 13 ? 00:00:00 kblockd/1 198 ? 00:00:00 pdflush 199 ? 00:00:00 pdflush 200 ? 00:00:00 kswapd0 201 ? 00:00:00 aio/0 202 ? 00:00:00 aio/1 277 ? 00:00:00 kseriod 318 ? 00:00:00 scsi_eh_0 352 ? 00:00:00 scsi_eh_1 467 ? 00:00:00 kirqd 468 ? 00:00:00 reiserfs/0 469 ? 00:00:00 reiserfs/1 470 ? 00:00:00 rsbacd 579 ? 00:00:00 udevd 2600 ? 00:00:01 rc 5673 ? 00:00:00 ps /var/log /var /usr/portage/distfiles /usr/portage /usr umount: /usr: device is busy umount: /usr: device is busy /usr: 2600m(root) COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME rc 2600 root mem REG 254,8 38652 22501 /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/libgcc_s.so.1 rc 2600 root mem REG 254,8 1243136 22510 /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/libstdc++.so.6.0.3 rc 2600 root mem REG 254,8 122320 75536 /usr/lib/librsbac.so.1.2.5 rc 2600 root mem REG 254,8 13740 75538 /usr/lib/libnss_rsbac.so.2.0.0 lsof 5682 root txt REG 254,8 773432 7593 /usr/sbin/lsof lsof 5683 root txt REG 254,8 773432 7593 /usr/sbin/lsof umount2: Device or resource busy umount: /dev/mapper/main-usr busy - remounted read-only /tmp /opt /home Removing dm-crypt mappings Stopping LVM Making sure /etc/mtab and /proc/mounts agree Remounting remaining filesystems readonly mount: / is busy mount: / is busy umount: /dev/main/usr busy - remounted read-only umount: udev busy - remounted read-only umount: /: device is busy umount: /: device is busy umount: /: device is busy Looks like the rc process should have been statically linked. :-) sys-apps/baselayout 1.11.15-r3 USE="acpi berkdb crypt dlloader hardened logrotate mmx ncurses nls nptl pam pic readline sse ssl tcpd unicode urandom userlocales x86 xorg zlib elibc_glibc input_devices_mouse input_devices_keyboard kernel_linux userland_GNU"
(In reply to comment #19) > I have posted a new halt.sh which logs to tty8 on the forums. Thank for a really good report! Could you please test with baselayout-1.12.1, and modify halt.sh again to log, OR make the below change to /sbin/rc and /etc/init.d/halt.sh Remove lines 714 - 720 source /etc/init.d/halt.sh if [[ ${SOFTLEVEL} == "reboot" ]] ; then source /etc/init.d/reboot.sh else source /etc/init.d/shutdown.sh fi Replace with LC_ALL=C exec /etc/init.d/halt.sh "${SOFTLEVEL}" Then put this line at the start of /etc/init.d/halt.sh (well, after the first 3 lines) [[ ${RC_GOT_FUNCTIONS} != "yes" ]] && source /sbin/functions.sh Then put this line at the end of /etc/init.d/halt.sh [[ -e /etc/init.d/"$1".sh ]] && source /etc/init.d/"$1".sh
Huh ! Emerging baselayout with "static" USE flag has solved the problem. Now, why didn't I think of that before ?
About the rsbac lib being loaded: as far as i know, using lsof, if /sbin/rc (aka /sbin/runscript) called a rsbac utility, then it wouldn't show up /sbin/rc as the process mapping the rsbac lib. So, it seems to me that your /sbin/runscript is using librsbac. Now, this shouldn't be possible so maybe I'm missing something and its late ;) Ill try to look a bit deeper tomorrow though
(In reply to comment #20) I chose your first option, install baselayout-1.12.1 so that we are working from the same version. I then modified halt.sh again to log. You will be sad to know that there is little change... Stopping udev umount: /dev: device is busy Stopping swap Writing reboot record Removing loopback devices Unmounting filesystems /var/log /var /usr/portage/distfiles /usr/portage /usr /tmp /opt /home Processes now active PID TTY TIME CMD 1 ? 00:00:02 init 2 ? 00:00:00 migration/0 3 ? 00:00:00 ksoftirqd/0 4 ? 00:00:00 migration/1 5 ? 00:00:00 ksoftirqd/1 6 ? 00:00:00 events/0 7 ? 00:00:00 events/1 8 ? 00:00:00 khelper 9 ? 00:00:00 kthread 12 ? 00:00:00 kblockd/0 13 ? 00:00:00 kblockd/1 198 ? 00:00:00 pdflush 199 ? 00:00:00 pdflush 200 ? 00:00:00 kswapd0 201 ? 00:00:00 aio/0 202 ? 00:00:00 aio/1 277 ? 00:00:00 kseriod 318 ? 00:00:00 scsi_eh_0 351 ? 00:00:00 scsi_eh_1 457 ? 00:00:00 kirqd 458 ? 00:00:00 reiserfs/0 459 ? 00:00:00 reiserfs/1 460 ? 00:00:00 rsbacd 572 ? 00:00:00 udevd 3815 ? 00:00:00 halt.sh 5621 ? 00:00:00 ps /var/log /var /usr/portage/distfiles /usr/portage /usr umount: /usr: device is busy umount: /usr: device is busy /usr: 3815m(root) COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME halt.sh 3815 root mem REG 254,8 38652 22501 /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/libgcc_s.so.1 halt.sh 3815 root mem REG 254,8 1243136 22510 /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/libstdc++.so.6.0.3 halt.sh 3815 root mem REG 254,8 122320 75536 /usr/lib/librsbac.so.1.2.5 halt.sh 3815 root mem REG 254,8 13740 75538 /usr/lib/libnss_rsbac.so.2.0.0 lsof 5651 root txt REG 254,8 773432 7593 /usr/sbin/lsof lsof 5652 root txt REG 254,8 773432 7593 /usr/sbin/lsof umount2: Device or resource busy umount: /dev/mapper/main-usr busy - remounted read-only /tmp /opt /home Removing dm-crypt mappings Stopping LVM Remounting unionfs branches as readonly Making sure /etc/mtab and /proc/mounts agree mount: / is busy mount: / is busy umount: /dev/main/usr busy - remounted read-only umount: /: device is busy umount: /: device is busy umount: /: device is busy (In reply to comment #22) I wouldn't worry too much about RSBAC. I can replicate this on two or three other systems without RSBAC.
(In reply to comment #23) > I wouldn't worry too much about RSBAC. I can replicate this on two or three > other systems without RSBAC. So attach a log from a system without RSBAC then please.
(In reply to comment #24) > (In reply to comment #23) > > I wouldn't worry too much about RSBAC. I can replicate this on two or three > > other systems without RSBAC. > > So attach a log from a system without RSBAC then please. > It'd be a waste of bandwidth. It looks exactly the same just without the references to /usr/lib/librsbac.so.1.2.5 and /usr/lib/libnss_rsbac.so.2.0.0. Delete them from the previous log I sent and they are identical. ;-) BTW: Compiling with static use flag does fix this, including the rsbac related issues so I'm happy to consider this closed if you all are. I'll be sticking with my halt.sh script though as it at least works under all circumstances. Perhaps a check to see if we are killing halt.sh would be a good idea?
(In reply to comment #25) (addition) If you want to replicate it from scratch I have just done so accidentally with another new install. The steps were as follows. Simply use the 2006.0 Universal CD and do a normal install just put /usr on an LVM2 or even a normal partition. All works fine. emerge --sync && emerge uDN world Switch to hardened profile. emerge -e system && emerge -e world Shutdown now fails with above issues. As I say if you emerge baselayout with the static use flag set it works fine. It just seems to me that most people wont know ths until they lose their data. We don't have to merge anything else with this flag set for normal operation afaik.
(In reply to comment #25) (correction) It would seem that I spoke too soon. The static use flag did fix it for one single reboot directly after the emerge. Subsequent reboots fail as before. I would be interested to know if Michele Alzetta (comment #21) had a similar experience. Seems it might be the hardened profile. I shall do a rebuild of the server I just installed and test at every stage to see what I find. This has become personal now! :-)
(In reply to comment #27) > It would seem that I spoke too soon. > > The static use flag did fix it for one single reboot directly after the emerge. > Subsequent reboots fail as before. I would be interested to know if Michele > Alzetta (comment #21) had a similar experience. No, it still works here. I am rebooting and shutting down without problems, with "static" build of baselayout.
(In reply to comment #28) > No, it still works here. I am rebooting and shutting down without problems, > with "static" build of baselayout. I've gone back to my previous unbugged ebuild. Baselayout built with 'static' flag works, but complains no end on shutdown about not being able to unmount my cdrom (with no cd inside); nothing bad happens, but it is a nuisance. I wander if this is somehow connected to our bug or if it is something else again ...
It seems this bug is actually a collection of problems of different kinds. The messages about the cdrom brought me to my personal solution ... I hope ! Things now work after editing /etc/lvm/lvm.conf and substituting the default: filter = [ "a/.*/" ] with: filter = [ "r|/dev/cdrom|" ] Even the new baselayout without static flag now works. so it seems in my case things actually broke with a combination of new baselayout and new lvm which required editing of lvm.conf (now this could be a documentation bug, or a warning to be given on updating lvm). However this can hardly apply to the problem described by tfb and bluedevils, and probably not to Steve Arnolds' one either. Maybe we ought to close this bug and open a couple of new ones, trying to define the other problems more clearly.
hi all, on my box problem was that /bin/bash was linked against /usr/lib/libgpm.so. my /usr is on a logical volume (/dev/system/usr) and /etc/init.d/halt.sh uses fuser to kill any process accessing /usr before unmounting. because halt.sh uses /bin/bash it was killing itself and then INIT complained about "no more processes left in this runlevel" my workaround was to temporarily move /usr/lib/libgpm.* to /tmp and recompile bash. now it looks like this ~ # ldd /bin/bash linux-gate.so.1 => (0xffffe000) libncurses.so.5 => /lib/libncurses.so.5 (0xb7ed9000) libdl.so.2 => /lib/libdl.so.2 (0xb7ed5000) libc.so.6 => /lib/libc.so.6 (0xb7dc5000) libgpm.so.1 => /lib/libgpm.so.1 (0xb7dc0000) /lib/ld-linux.so.2 (0xb7f45000) and poweroff is working again. however i dont know why libgpm.so is installed in /usr/lib/ as well as in /lib/. regards the2nd
This should be fixed in baselayout-1.13.0_alpha9 Basically if a mount point is used by the current process, we skip it. If we're shutting down then we re-mount it read-only.