Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 347583 - sys-apps/openrc confuseds cgroups automated per tty task groups with LXC systems
Summary: sys-apps/openrc confuseds cgroups automated per tty task groups with LXC systems
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: OpenRC Team
URL:
Whiteboard: openrc:rc_sys
Keywords:
Depends on:
Blocks: 295613
  Show dependency tree
 
Reported: 2010-12-02 17:25 UTC by Jouni Rinne
Modified: 2011-01-15 17:00 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Collection of 4 straces (straces.tar.bz2,244.15 KB, application/octet-stream)
2010-12-24 11:26 UTC, Frank Ridderbusch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jouni Rinne 2010-12-02 17:25:19 UTC
On rebooting after the upgrade to openrc-0.6.6 my system failed to boot properly. According to the messages I managed to read during the boot the root filesystem (RAID-1 on /dev/md2) was mounted read-only, causing the start failure of various daemons etc., resulting to a unusable system.

I managed to chroot into the system via rescue cd, and downgraded to openrc-0.6.3, which allowed the system to boot again. (The previous, removed openrc-0.6.5 worked well, too)
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-02 20:45:50 UTC
This is near impossible.
0.6.5 and 0.6.6 are almost identical
You can see a diff here between them:
http://paste.pocoo.org/show/299181/

Can you please retest with both 0.6.5 and 0.6.6.
Either both of them should fail or both should pass.
Or you're wrong and it was localmount that failed, instead of the root service...
Comment 2 Jouni Rinne 2010-12-02 22:00:53 UTC
(In reply to comment #1)
> This is near impossible.
> 0.6.5 and 0.6.6 are almost identical
> You can see a diff here between them:
> http://paste.pocoo.org/show/299181/
> 
> Can you please retest with both 0.6.5 and 0.6.6.
> Either both of them should fail or both should pass.

Got openrc-0.6.5 back from cvs, so far no problems...

> Or you're wrong and it was localmount that failed, instead of the root
> service...
> 

That's entirely possible, I just assumed it was rootfs, it would have explained the daemons/services failures. The startup error messages went by so fast that I could get only bits and pieces. Eventually I got to the command prompt but my usb keyboard was dead (didn't try with ps/2 one)

I remember seeing something weird during the shutdown after 0.6.6 upgrade; normally mdadm shuts down as one of the last processes, but at that time I saw the mdadm messages very early in the shutdown sequence. Maybe 0.6.6 somehow starts/stops stuff in the wrong order???

It's almost midnight here, I need to go to sleep... Unfortunately I don't have time to test this properly until at the earliest Saturday evening or Sunday morning, if that's okay to you?
Comment 3 Chris Bandy 2010-12-03 06:13:20 UTC
I can confirm this behavior with 0.6.6 using root on LVM, genkernel and initrd. Downgrading to 0.6.5 fixes.
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-03 19:24:03 UTC
We need more information.
I just tested it w/ root-on-LVM per comment 3, since that's the config I have on my laptop, and it works perfectly for me.

1. "emerge --info" from both of you
2. What else of system packages did you change around the same time?
3. Again, localmount vs. root.
4. Ideally, turn on boot logging, or attach a serial console of some sort, and capture the exact failure output for us to debug with.
Comment 5 Jouni Rinne 2010-12-05 11:00:54 UTC
Ok, just upgraded from 0.6.5 to 0.6.7, the latter of which is working perfectly again. Do you still need me to test 0.6.6?

Anyway, here is 1) emerge --info:

Portage 2.1.9.25 (default/linux/amd64/10.0, gcc-4.4.5, glibc-2.12.1-r3, 2.6.34-gentoo-r12 x86_64)
=================================================================
System uname: Linux-2.6.34-gentoo-r12-x86_64-Intel-R-_Core-TM-2_Duo_CPU_E8500_@_3.16GHz-with-gentoo-2.0.1
Timestamp of tree: Sat, 04 Dec 2010 17:45:01 +0000
distcc 3.1 x86_64-pc-linux-gnu [disabled]
ccache version 3.1.3 [disabled]
app-shells/bash:     4.1_p9
dev-java/java-config: 2.1.11-r2
dev-lang/python:     2.5.4-r4, 2.6.6-r1, 3.1.3
dev-util/ccache:     3.1.3
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.6.7
sys-apps/sandbox:    2.4
sys-devel/autoconf:  2.13, 2.68
sys-devel/automake:  1.7.9-r2, 1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.18-r3, 2.20.1-r1
sys-devel/gcc:       3.4.6-r2, 4.2.4-r1, 4.4.5
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.4-r1
sys-devel/make:      3.82
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=nocona -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-O2 -march=nocona -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests binpkg-logs distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo/"
LANG="fi_FI.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="fi"
MAKEOPTS="-j5"
PKGDIR="/usr/local/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /var/lib/layman/kde-sunset /var/lib/layman/lightscribe /var/lib/layman/csound-wii /var/lib/layman/pd-overlay"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa amd64 audiofile avahi berkdb bluetooth bonjour bzip2 cairo cdinstall cli crypt cups curl cxx dbus dirac dri dv dvd dvdread enca encode exif ffmpeg flac fontforge fortran gdbm gif git gnome gnutls gpm gtk2 hal iconv id3tag ieee1394 imagemagick ipv6 jack joystick jpeg kde ladspa lash lcms libnotify lzo mad matroska mikmod mjpeg mmap mmx mmxext modules motif mp3 mp4 mpeg mtp mudflap multilib ncurses nls nptl nptlonly nsplugin ogg openexr opengl openmp osc pam pcre pdf perl phonon png policykit pppd python qt3 qt3support qt4 quicktime readline ruby scanner sdl session smp sndfile sox speex sqlite sqlite3 sse sse2 ssl ssse3 svg sysfs tcpd theora tiff truetype udev unicode usb v4l v4l2 vorbis vpx wavpack x264 xcb xine xinerama xml xorg xulrunner xv xvid xvmc zeroconf zlib" ALSA_CARDS="seq-dummy dummy virmidi mtpav mts64 serial-u16550 mpu401 loopback portman2x4 ad1889 als300 als4000 ali5451 atiixp atiixp-modem au8810 au8820 au8830 azt3328 bt87x ca0106 cmipci cs4281 cs46xx darla20 gina20 layla20 darla24 gina24 layla24 mona mia echo3g indigo indigoio indigodj emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel hdsp hdspm ice1712 ice1724 intel8x0 intel8x0m korg1212 maestro3 mixart nm256 pcxhr riptide rme32 rme96 rme9652 sonicvibes trident via82xx via82xx-modem vx222 ymfpci pdplus asihpi usb-audio usb-usx2y vxpocket pdaudiocf soc aica emi26 emu1212 emu1616 emu1820" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="evdev keyboard mouse joystick" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="fi" PHP_TARGETS="php5-2" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="vesa fglrx radeon" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

2) list of packages upgraded at the same time as 0.6.6:
(1 of 8) dev-libs/lzo-2.04
(2 of 8) media-libs/freetype-2.4.4
(3 of 8) app-portage/eix-0.22.5
(4 of 8) sys-apps/openrc-0.6.6
(5 of 8) net-p2p/freenet-0.7.5_p1306
(6 of 8) net-print/cups-1.4.5
(7 of 8) games-strategy/ufo-ai-2.3.1
(8 of 8) www-plugins/adobe-flash-10.2.161.23_pre20101117
Comment 6 Jouni Rinne 2010-12-05 11:17:28 UTC
I noticed that both Chris and me are using initrd/initramfs; Chris a genkernel + initrd, me a normal manually-compiled from stable gentoo-sources + a custom initramfs. Does this have anything to do with the problem? 
Comment 7 Frank Ridderbusch 2010-12-05 13:34:11 UTC
Hi, I'd like to join the discussion, since I also have a bit of trouble with the system start this past week. Bear with me for a little bit of prose.

My problem started Sunday, 28th of Nov. openrc 0.6.3 was in operation since sometime in September and during the week before the 28th, I updated to a newer version (I think it was openrc 0.6.5) and booted without any problems. Just the same as on Saturday, 27th of Nov. I can confirm, since I'm keeping binary packages, that on Saturday no new packages was emerged. Then the next boot on Sunday, 28th went wrong. After the initial initramfs the root-fs wouldn't be remounted as a writeable fs. After some experimentation I could bring up the system by basically manually playing openrc (using the "i" key, call the and execute the missing scripts by hand). During the Sunday I downgraded to openrc 0.6.3, but that would not fix my problem (this downgrade was a recompile of 0.6.3 and not the installation of the binary package, which was build sometime in September. The usual amount changes happened during the time period on a ~arch system, new GCC, new Python, etc). Single stepping also didn't change the problems in the boot process.

Anyway, for the complete last week I always booted in single step mode hitting the "i" key the moment I saw the openrc message and each boot went correctly, including the openrc updates to 0.6.6 and 0.6.7. Then, thinking, that my problem has disappeared, on Saturday I didn't hit the "i" key and voila, problem is back. At this time I can't make any sense of this all, why it sometimes works and sometime not. 

Here are some observations however, that I collected. This is my configuration in the sysinit and boot runlevels.

# cd /etc/runlevels
# ls sysinit
devfs  dmesg  udev
# ls boot
alsasound    device-mapper  hwclock     lvm      net.lo  swap          urandom
bootmisc     fsck           keymaps     modules  procfs  sysctl
consolefont  hostname       localmount  mtab     root    termencoding

Now I execute rc-status.

# rc-status sysinit
 * Caching service dependencies ....                [ ok ]
Runlevel: sysinit
 dmesg                                              [  started  ]
 devfs                                              [  started  ]

udev is not displayed. Why? Executing rc-config:

# rc-config show sysinit
Status of init scripts in runlevel "sysinit"
  devfs                     [started]
  dmesg                     [started]
  udev                      [started]

The same is true for the boot runlevel:
# COLUMNS=40 rc-status boot
Runlevel: boot
 mtab                      [  started  ]
 hostname                  [  started  ]
 sysctl                    [  started  ]
 bootmisc                  [  started  ]
 lvm                       [  started  ]
 device-mapper             [  started  ]
 termencoding              [  started  ]
 urandom                   [  started  ]
 net.lo                    [  started  ]
 alsasound                 [  started  ]

Again, not all scripts, which reside in the /etc/runlevels/boot directory are displayed. rc-config displays correctly:

# rc-config show boot
Status of init scripts in runlevel "boot"
  alsasound                 [started]
  bootmisc                  [started]
  consolefont               [started]
  device-mapper             [started]
  fsck                      [started]
  hostname                  [started]
  hwclock                   [started]
  keymaps                   [started]
  localmount                [started]
  lvm                       [started]
  modules                   [started]
  mtab                      [started]
  net.lo                    [started]
  procfs                    [started]
  root                      [started]
  swap                      [started]
  sysctl                    [started]
  termencoding              [started]
  urandom                   [started]

I'm seeing these differences on a system, which has correctly invoked all init scripts. 

I don't know, if this adds anything useful to the discussion.

For completeness I include emerge --info

# emerge --info
Portage 2.1.9.25 (default/linux/amd64/10.0/desktop, gcc-4.5.1, glibc-2.12.1-r3, 2.6.36-tuxonice-r2 x86_64)
=================================================================
System uname: Linux-2.6.36-tuxonice-r2-x86_64-Intel-R-_Core-TM-2_Duo_CPU_E6750_@_2.66GHz-with-gentoo-2.0.1
Timestamp of tree: Sun, 05 Dec 2010 10:30:01 +0000
app-shells/bash:     4.1_p9
dev-java/java-config: 2.1.11-r2
dev-lang/python:     2.6.6-r1, 2.7.1, 3.1.3
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.6.3
sys-apps/sandbox:    2.4
sys-devel/autoconf:  2.13, 2.68
sys-devel/automake:  1.4_p6-r1, 1.5-r1, 1.6.3-r1, 1.7.9-r2, 1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.3.5, 4.4.5, 4.5.1-r1
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.4-r1
sys-devel/make:      3.82
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/genkernel/arch/x86_64 /usr/share/genkernel/x86_64 /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/portage/distfiles"
FEATURES="assume-digests binpkg-logs buildpkg distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://mirror.cambrium.nl/pub/os/linux/gentoo/ http://mirror.jamit.de/gentoo/ http://mirror.cambrium.nl/pub/os/linux/gentoo/ ftp://mirror.leaseweb.com/gentoo/ ftp://mirror.netcologne.de/gentoo/"
LANG="de_DE.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="de"
MAKEOPTS="-j2"
PKGDIR="/portage/packages-64bit"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/systemd /portage/local"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac aalib acl acpi ads aio akode alsa amd64 amr amrnb amrwb animation-rtl ao apache2 applet asf aspell athena audiofile automount avahi bash-completion berkdb bluetooth branding browserplugin bzip2 cairo calendar cdio cdparanoia cdr cgi cleartype cli clock connection-sharing consolekit consolkit cpudetection cracklib crypt css ctype cue cups curl cxx dbus device-mapper dhclient dhcpcd digitalradio djvu dnd drawing dri dts dv dvd dvdr emacs23icons emboss enca encode exchange exif expat expoblending extensions fam fbcondecor ffmpeg firefox flac font-server fontconfig foomaticdb fortran fts3 fuse gcj gdbm gdu geolocation gif gimp glade glib glitz gnome gnome-keyring gnutls gphoto2 gstreamer gtk gtk2 gtkhtml gzip-el hal hddtemp hdri iconv id3 id3tag idn ieee1394 imlib ipv6 jack java java5 java6 jce jpeg jruby kde kde4 kdrive kerberos keyboard kig-scripting kipi lame lcms ldap lensfun libedit libffi libmms libnotify libproxy libsamplerate lm_sensors logitech-mouse lzma mad mbox mikmod mjpeg mmap mmx mng modules mp3 mp4 mpeg mplayer mtp mudflap multilib musicbrainz mysql nautilus ncurses neXt netjack network networkmanager nls npp nptl nptlonly nsplugin nss ntlm ntp ogg opengl openmp otr pam pango pcre pdf perl pidgin plasma png policykit ppds pppd pulseaudio python qt3support qt4 quicktime raw rdesktop-vrdp readline redeyes resolvconf ruby samba sasl scanner sdk secure-delete semantic-desktop sensord session sip slang sndfile sound speex spell sqlite sqlite3 srt sse sse2 ssh ssl ssse3 startup-notification svg swat sylpheed sysfs taglib tagwriting tcpd theora thunderbird tiff toolkit-scroll-bars totem truetype udev unicode upnp usb v4l v4l2 vaapi vdpau vim-syntax vorbis wacom wav weather webkit wifi winbind x264 xcb xcomposite xface xft xine xinerama xml xmp xorg xscreensaver xulrunner xv xvid xvmc zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="*" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias auth_digest proxy proxy_ajp proxy_balancer proxy_connect proxy_ftp proxy_http" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="wacom evdev keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" PHP_TARGETS="php5-2" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia nv intel fbdev radeon vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 8 William Hubbs gentoo-dev 2010-12-05 17:45:56 UTC
(In reply to comment #5)
> Ok, just upgraded from 0.6.5 to 0.6.7, the latter of which is working 
> perfectly again. Do you still need me to test 0.6.6?

I would say no,  if 0.6.7 works don't worry about 0.6.6.
Comment 9 Chris Bandy 2010-12-07 01:40:04 UTC
I updated to openrc-0.6.6 on Dec 2. I don't recall exactly which packages I updated, but the following had messages in elog:

games-engines/scummvm-1.2.0-r1
sys-auth/pambase-20101024
sys-apps/openrc-0.6.6
sys-apps/attr-2.4.44

Today, I updated to openrc-0.6.7. I've booted cold, warm, on AC, on battery and most combinations between. I was not able to reproduce again. I am booting successfully for now.

# emerge --info openrc
Portage 2.1.9.24 (default/linux/amd64/10.0/desktop/kde, gcc-4.4.4, glibc-2.11.2-r3, 2.6.36-gentoo x86_64)
=================================================================
                        System Settings
=================================================================
System uname: Linux-2.6.36-gentoo-x86_64-Intel-R-_Core-TM-_i7_CPU_Q_720_@_1.60GHz-with-gentoo-2.0.1
Timestamp of tree: Sun, 05 Dec 2010 07:15:01 +0000
distcc 3.1 x86_64-pc-linux-gnu [enabled]
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11-r1
dev-lang/python:     2.6.5-r3, 3.1.2-r4
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.6.7
sys-apps/sandbox:    2.3-r1
sys-devel/autoconf:  2.13, 2.65-r1
sys-devel/automake:  1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.4-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
virtual/os-headers:  2.6.30-r1 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA dlj-1.1 AdobeFlash-10.1 PUEL"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=core2 -mcx16 -msahf -mpopcnt -msse4.2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=256 -O2 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo"
CXXFLAGS="-march=core2 -mcx16 -msahf -mpopcnt -msse4.2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=256 -O2 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests binpkg-logs distcc distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://brahe/gentoo http://gentoo.mirrors.tds.net/gentoo"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en_US en"
MAKEOPTS="-j12 --silent"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/kde /var/lib/layman/x11 /usr/local/portage"
SYNC="rsync://brahe/gentoo-portage"
USE="X a52 aac acl acpi alsa amd64 bash-completion branding bzip2 cairo cdr cli consolekit cracklib crypt cups cxx dbus dri dts dvd dvdr emboss encode exif ffmpeg firefox flac gdbm gif gnutls gpm gtk hal iconv java java6 jpeg kde lame lcms lua mad mikmod mmap mmx mng modules mp3 mp4 mpeg mudflap multilib musepack ncurses nls nptl nptlonly ogg opengl openmp pam pango pcre pdf perl png ppds pppd qt3support qt4 readline samba sdl session spell sse sse2 ssl ssse3 startup-notification svg sysfs tcpd theora threads tiff truetype unicode usb v4l2 vcd vorbis x264 xcb xcomposite xinerama xml xorg xulrunner xv xvid xvmc zip zlib" ALSA_CARDS="hda-intel usb-audio" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="canon" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="evdev keyboard mouse synaptics" KERNEL="linux" LINGUAS="en_US en" PHP_TARGETS="php5-3" QEMU_SOFTMMU_TARGETS="i386 x86_64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="radeon" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

=================================================================
                        Package Settings
=================================================================

sys-apps/openrc-0.6.7 was built with the following:
USE="(multilib) ncurses pam unicode -debug"
Comment 10 Chris Bandy 2010-12-07 06:06:39 UTC
I spoke too soon. No versions >= 0.6.5 seem to be working for me. I see similar results as comment 7, except interactive mode does not help.

How can I debug this further? RC_LOGGER does not work afaics since no filesystems are writable.
Comment 11 Frank Ridderbusch 2010-12-07 07:30:32 UTC
(In reply to comment #10)
> I spoke too soon. No versions >= 0.6.5 seem to be working for me. I see similar
> results as comment 7, except interactive mode does not help.

Let me clarify this. Interactive mode does indeed not help. It's just a workaround, that I can immediately interact and basically manually fix the boot process. 

In my personal scenario here, if I'm seeing that sysfs is mounted directly after /proc, then I'm in good shape and can hit "3" to finish the boot process unattended. However, when I'm seeing, that sysctl is executed, something, that should be much later in the boot process, then I'll use shell access to invoke sysfs, udev-mount, etc manually.
Comment 12 Chris Bandy 2010-12-07 15:23:15 UTC
This morning, I chrooted from a LiveCD, installed 0.6.7, ran etc-update and ran rc-update -u. Two successful boots so far.

Could it simply be an outdated deptree? Since 0.6.4, bootmisc and sysctl have stopped depending on hostname.
Comment 13 Frank Ridderbusch 2010-12-08 22:18:48 UTC
(In reply to comment #7)
.....
> # cd /etc/runlevels
> # ls sysinit
> devfs  dmesg  udev
> # ls boot
> alsasound    device-mapper  hwclock     lvm      net.lo  swap          urandom
> bootmisc     fsck           keymaps     modules  procfs  sysctl
> consolefont  hostname       localmount  mtab     root    termencoding
> 
> Now I execute rc-status.
> 
> # rc-status sysinit
>  * Caching service dependencies ....                [ ok ]
> Runlevel: sysinit
>  dmesg                                              [  started  ]
>  devfs                                              [  started  ]
.... 
> # COLUMNS=40 rc-status boot
> Runlevel: boot
>  mtab                      [  started  ]
>  hostname                  [  started  ]
>  sysctl                    [  started  ]
>  bootmisc                  [  started  ]
>  lvm                       [  started  ]
>  device-mapper             [  started  ]
>  termencoding              [  started  ]
>  urandom                   [  started  ]
>  net.lo                    [  started  ]
>  alsasound                 [  started  ]

Well, I've just emerged openrc 0.6.8. Are there any changes in it that might explain, why I'm now seeing the expected output with rc-status?

# COLUMNS=40 rc-status sysinit
Runlevel: sysinit
 dmesg                     [  started  ]
 udev                      [  started  ]
 devfs                     [  started  ]

# COLUMNS=40 rc-status boot
Runlevel: boot
 hwclock                   [  started  ]
 modules                   [  started  ]
 lvm                       [  started  ]
 device-mapper             [  started  ]
 fsck                      [  started  ]
 root                      [  started  ]
 mtab                      [  started  ]
 localmount                [  started  ]
 termencoding              [  started  ]
 sysctl                    [  started  ]
 bootmisc                  [  started  ]
 consolefont               [  started  ]
 swap                      [  started  ]
 keymaps                   [  started  ]
 hostname                  [  started  ]
 procfs                    [  started  ]
 urandom                   [  started  ]
 net.lo                    [  started  ]
 alsasound                 [  started  ]

No difference between rc-config and rc-status any more and the output is as I would expect. I have not yet rebooted though, but this looks promising.
Comment 14 Chris Bandy 2010-12-09 18:00:06 UTC
After some successful boots, I've run into this again. For some reason, the rootfs (on LVM using genkernel initrd) is read-only during sysinit.

Once this happens, it stays that way. Further reboots do not fix. Even chrooting from a LiveCD and up/down-grading openrc, rc-update -u, etc get nowhere.

Fortunately, I have found a way to regain read-write rootfs after a failed start:

1. Reboot, and enter interactive mode. By mashing the key, I am prompted immediately after /proc.
2. Enter the console.
3. Remount the rootfs read-write: # mount -o remount,rw /
4. Exit the console.
5. Continue the boot process. Many things will fail.
6. Login.
7. Update the openrc deptree: # rc-update -u
8. Reboot: # shutdown -r now

I noticed after step 6, as Frank did, that 'udev' did not appear during 'rc-status -a'. After running 'rc-update -u', it appeared in the sysinit runlevel.

The strange part is that when chrooted from a LiveCD, 'rc-status -a' showed udev in the sysinit runlevel.
Comment 15 Frank Ridderbusch 2010-12-09 21:51:26 UTC
Well, I can't say with 100% certainty with all this fiddling around, but I believe, that I did execute "rc-update -u" and that the output of rc-status still didn't show all activated script in the sysinit and boot runlevels (as I described earlier).

Just yesterday I experienced the problem again, booted by manually executing all the init-scripts, executed "rc-update -u" but after reboot the problem persisted. 

Only when I had updated to openrc 0.6.8 did everything appear to be working correctly (3 successful boots so far). 

BTW, I'm also using a genkernel generated kernel and initramfs, with root residing on a LVM volume. 
Comment 16 Simone Scanzoni 2010-12-21 20:06:11 UTC
I hit this bug with both 0.6.7 and 0.6.8.
The first symptom was with 0.6.8. When I shut down a perfectly working system the partitions weren't unmounted (nor remounted read-only). The next boot the root partition was mount read-only. I tinkered for a while and I don't remember what I did but it began to work again. Some boots later it happened again, it didn't unmount from a perfectly working state and at the next boot the root partition was read-only. This time I noticed that some system services weren't loaded. I used the "single" option in the kernel command line and manually ran the services: sysfs udev localmount .
Those were some of the missing ones. After that the boot process proceeded almost cleanly. But when I shut down it didn't unmount and so on. To escape this tedious situation after the needed workarounds I downgraded to 0.6.7. It worked for some boots and shutdowns. This time when the problem arose I used "rc-status boot" and these were the missing services:
hwclock modules fsck swap keymaps consolefont procfs
vanished as per comment #7 .
I tried to escape again downgrading to 0.6.6 and it is working, but after reading these comments I'll downgrade to 0.6.3.

I have a manually configured kernel with a physical root partition and I'm not using a initrd/initramfs.

emerge --info openrc
Portage 2.1.9.25 (default/linux/amd64/10.0, gcc-4.4.4, glibc-2.11.2-r3, 2.6.36-gentoo-r5 x86_64)
=================================================================
                         System Settings
=================================================================
System uname: Linux-2.6.36-gentoo-r5-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4200+-with-gentoo-2.0.1
Timestamp of tree: Mon, 20 Dec 2010 13:45:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11-r1
dev-lang/python:     2.6.5-r3, 3.1.2-r4
dev-util/ccache:     2.4-r7
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.6.6
sys-apps/sandbox:    2.4
sys-devel/autoconf:  2.13::<unknown repository>, 2.65-r1
sys-devel/automake:  1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.4-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA PUEL Q3AEULA skype-eula ut2003 Introversion googleearth @EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=athlon64 -mfpmath=sse -pipe -fomit-frame-pointer -msse3 -fpredictive-commoning -fgcse-after-reload"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/games/angband/edit/ /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-O2 -march=athlon64 -mfpmath=sse -pipe -fomit-frame-pointer -msse3 -fpredictive-commoning -fgcse-after-reload"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests binpkg-logs ccache distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="it_IT.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="it"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/gnustep /var/lib/layman/roslin /var/lib/layman/sunrise /var/lib/layman/gamerlay /var/lib/layman/x11 /usr/local/portage/nonno"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext R S3TC X a52 aac acpi additions aiglx alsa amd64 amr apache2 archive artworkextra automount bash-completion bdf bittorrent blas blender-game branding bullet bzip2 cairo caps ccache cdio cdr cli cracklib cups curl cxx dbus device-mapper disk-partition djvu dri dts dvb dvd dvdr dvdread dvi edb encode escreen exif expat extensions extra-dark faac fam fasttrack ffmpeg fftw flac fontconfig fontforge fortran ftp gd gdbm gif gimp gimpprint glib glibc-omitfp glitz gnet gnutella gnutls gpm gsl gstreamer gtk gtkhtml gzip-el hou hpn ical iconv icu imagemagick imap imlib inkjar inotify inquisitio iproute2 ithreads java java6 javascript joystick jpeg jpeg2k kdehiddenvisibility kqemu lapack laptop latex lcdfilter lcms liblockfile libmpd libnotify libsamplerate libyaml linuxthreads-tls lirc llvm lua lzma lzo mad maildir matroska mbrola mercurial midi mikmod mmap mmx mmxext mng modules moznocompose moznoirc moznomail mp3 mpeg mplayer msn mudflap multilib musicbrainz netplay network nfs nls nntp no-old-linux nokia nowin nptl nptlonly nsplugin ntp oav objc ods ogg opengl openmp optimized-qmake pam pccts pch pcre pdf perl plotutils png policykit portage postproc ppds pppd python python3 qt3support quicktime readline realmedia rle rtmp sasl session slp smp sms sndfile sound sox spell sse sse2 sse3 ssl startup-notification stream subtitles svg svgz sysfs t1lib taglib tagwriting tcpd tetex theora thin-splines threads threadsafe thunar tiff tordns totem track-src-odirect truetype udev udis86 ui unicode usb userlocales utempter v4l v4l2 vchroot vim vim-syntax visibility vorbis vpx websockets wmf wxwidgets wxwindows x264 xcb xcomposite xfce xft xls xml xmp xorg xrandr xulrunner xv xvid zlib zvbi" ALSA_CARDS="emu10k1" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" DVB_CARDS="usb-dib0700" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse aiptek evdev joystick" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="it" LIRC_DEVICES="pctv" PHP_TARGETS="php5-2" QEMU_SOFTMMU_TARGETS="i386 x86_64" RUBY_TARGETS="ruby18" SANE_BACKENDS="canon_pp" USERLAND="GNU" VIDEO_CARDS="radeon v4l fglrx" XFCE_PLUGINS="logout menu" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

=================================================================
                        Package Settings
=================================================================

sys-apps/openrc-0.6.6 was built with the following:
USE="(multilib) pam unicode -debug -ncurses"

I think the Severity of this bug should be raised, it can cause loss of data and forces to use an old version of a core component. And the Summary should say >=sys-apps/openrc-0.6.5 , not only 0.6.6.
Comment 17 Frank Ridderbusch 2010-12-21 22:43:24 UTC
I believe, I discovered some regularities, at least, where the rc-* commands are concerned and the fact, that new scripts were installed into /etc/init.d. 

Original situation is best described by looking at the output of rc-status:

# rc-status sysinit
 * Caching service dependencies ...                                                 [ ok ]
Runlevel: sysinit
 dmesg                                                             [  started  ]
 devfs                                                             [  started  ]

As you can see, udev is missing. Then doing a "rc-update -u" and a new rc-status after that:

# rc-update -u
 * Caching service dependencies ...                                                 [ ok ]
# rc-status sysinit
Runlevel: sysinit
 dmesg                                                             [  started  ]
 devfs                                                             [  started  ]

As you can see, udev is still missing. Now to the funny thing. All the above commands where executed in a XFCE4 Terminal, in a tab where I changed to root by "sudo -i" (BTW, "su - " doesn't make a difference).

I then changed to Linux console 0 and executed "rc-update -u" again and, can you believe it, rc-status showed the proper output, both on the console and later back in the XFCE4-Terminal.

# rc-status sysinit
Runlevel: sysinit
 dmesg                                                             [  started  ]
 udev                                                              [  started  ]
 devfs                                                             [  started  ]

I observed this funny behaviour repeatedly (4 times) the last couple of days and I think it pretty firmly correlates with the times, when a package installed a new or updated script in /etc/init.d.

-rwxr-xr-x 1 root root   738 19. Dez 11:57 git-daemon
-rwxr-xr-x 1 root root   758 20. Dez 10:36 bluetooth
-rwxr-xr-x 1 root root  3504 21. Dez 15:44 tomcat-7
-rwxr-xr-x 1 root root  1152 21. Dez 22:30 dbus
-rwxr-xr-x 1 root root  1114 21. Dez 22:32 hald

I emerged in sum about 50 packages the last couple of days in different batches.
However each time I found something amiss with the output of rc-status a new /etc/init.d script had been installed.

I didn't yet have any new boot problems since I made sure, that rc-status always produced the correct output and I further believe, that the actual version of the openrc package doesn't make any difference, at least it didn't do for me. 
Comment 18 Simone Scanzoni 2010-12-22 12:23:24 UTC
Inspired by comment #17 I decided to emerge 0.6.8 and did some tests.
I was able to trigger the bug systematically (I tried at least 5 times).
Every time I run rc-update -u from a root shell who has as ancestor a user shell with the following code in .bashrc (aka automated per tty task groups):
if [ "$PS1" ] ; then
       mkdir -m 0700 -p /cgroup/cpu/$$
       echo 1 > /cgroup/cpu/$$/notify_on_release
       echo $$ > /cgroup/cpu/$$/tasks
fi
the output of rc-status -a changes in this way:
--- rc-status-correct.log       2010-12-22 13:06:07.000000000 +0100
+++ rc-status-broken.log        2010-12-22 13:06:07.000000000 +0100
@@ -1,18 +1,12 @@
 Runlevel: boot
- hwclock                       [  started  ]
- modules                       [  started  ]
  mtab                          [  started  ]
- hibernate-cleanup             [  started  ]
- swap                          [  started  ]
+ hostname                      [  started  ]
  sysctl                        [  started  ]
  bootmisc                      [  started  ]
- keymaps                       [  started  ]
- hostname                      [  started  ]
  acpid                         [  started  ]
- consolefont                   [  started  ]
+ hibernate-cleanup             [  started  ]
  net.lo                        [  started  ]
  sfxload                       [  stopped  ]
- procfs                        [  started  ]
  urandom                       [  started  ]
 Runlevel: default
  metalog                       [  started  ]
@@ -36,23 +30,18 @@
 Runlevel: shutdown
  savecache                     [  stopped  ]
  killprocs                     [  stopped  ]
- mount-ro                      [  stopped  ]
 Runlevel: nonetwork
  local                         [  started  ]
 Runlevel: sysinit
- dmesg                         [  started  ]
- udev                          [  started  ]
  devfs                         [  started  ]
+ dmesg                         [  started  ]
 Runlevel: single
 Dynamic Runlevel: hotplugged
  net.eth0                      [  started  ]
 Dynamic Runlevel: needed
- sysfs                         [  started  ]
- udev-mount                    [  started  ]
- fsck                          [  started  ]
- root                          [  started  ]
- localmount                    [  started  ]
  mdnsd                         [  started  ]
- termencoding                  [  started  ]
  xdm-setup                     [  started  ]
+ udev-mount                    [  started  ]
 Dynamic Runlevel: manual
+ sysfs                         [  started  ]
+ termencoding                  [  started  ]
Running rc-update -u from a root shell which hasn't the mentioned ancestor fixes everything.
Comment 19 Frank Ridderbusch 2010-12-22 16:10:31 UTC
Well, good stuff. You've probably nailed it.

I have the same cgroup stuff in my .bashrc, slightly different, but basically the same. The Lennart Poettering stuff, that made the rounds over the internet some days ago.
Comment 20 Frank Ridderbusch 2010-12-22 19:18:31 UTC
As further confirmation to comment #18 here are some additional command lines. 

1st. Everything is OK.
# COLUMNS=40 rc-status sysinit
Runlevel: sysinit
 dmesg                     [  started  ]
 udev                      [  started  ]
 devfs                     [  started  ]

2nd. Then touch some script in /etc/init.d
# touch /etc/init.d/mdev 

3rd. rc-status is missing udev.
# COLUMNS=40 rc-status sysinit
 * Caching service dependencies . [ ok ]
Runlevel: sysinit
 dmesg                     [  started  ]
 devfs                     [  started  ]

4th. Force update.
# COLUMNS=40 rc-update -u
 * Caching service dependencies . [ ok ]

5th. Nothing changed.
# COLUMNS=40 rc-status sysinit
Runlevel: sysinit
 dmesg                     [  started  ]
 devfs                     [  started  ]

Alternative environment, freshly started tcsh (which of course doesn't read .bashrc) directly launched with ALT-F2 in a terminal window (within XFCE4, doesn't inherit the cgroup stuff).

1st. Missing udev.
# COLUMNS=40 rc-status sysinit
Runlevel: sysinit
 dmesg                     [  started  ]
 devfs                     [  started  ]

2nd. Force Update.
# COLUMNS=40 rc-update -u
 * Caching service dependencies . [ ok ]

3rd. Everything is OK.
# COLUMNS=40 rc-status sysinit
Runlevel: sysinit
 dmesg                     [  started  ]
 udev                      [  started  ]
 devfs                     [  started  ]

4th. Again, touch some /etc/init.d script.
# touch /etc/init.d/mdev 

5th. Everything stays OK.
# COLUMNS=40 rc-status sysinit
 * Caching service dependencies . [ ok ]
Runlevel: sysinit
 dmesg                     [  started  ]
 udev                      [  started  ]
 devfs                     [  started  ]
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-23 01:31:40 UTC
ALL:
Does this happen with only the bash version of the cgroups, or the kernel patch as well?

Jouni Rinne:
Are you using the cgroups as well?

Frank+Simone:
Can you please trace way down and how where the first place the udev turns up is.
Comment 22 Frank Ridderbusch 2010-12-23 10:18:48 UTC
(In reply to comment #21)
...
> Frank+Simone:
> Can you please trace way down and how where the first place the udev turns up
> is.

Well, not sure if I'm reading you correctly. I've been using "rc-status sysinit" as an example, since this runlevel only has three entries and it always appears, that udev is missing.

However the next runlevel boot is missing scripts as well (10 vs. 19), when the problem is in residence. 

# COLUMNS=40 rc-status boot
Runlevel: boot
 mtab                      [  started  ]
 sysctl                    [  started  ]
 bootmisc                  [  started  ]
 lvm                       [  started  ]
 device-mapper             [  started  ]
 termencoding              [  started  ]
 hostname                  [  started  ]
 urandom                   [  started  ]
 net.lo                    [  started  ]
 alsasound                 [  started  ]

# COLUMNS=40 rc-update -u
 * Caching service dependencies . [ ok ]

# COLUMNS=40 rc-status boot
Runlevel: boot
 hwclock                   [  started  ]
 modules                   [  started  ]
 lvm                       [  started  ]
 device-mapper             [  started  ]
 fsck                      [  started  ]
 root                      [  started  ]
 mtab                      [  started  ]
 localmount                [  started  ]
 termencoding              [  started  ]
 sysctl                    [  started  ]
 bootmisc                  [  started  ]
 consolefont               [  started  ]
 swap                      [  started  ]
 keymaps                   [  started  ]
 hostname                  [  started  ]
 procfs                    [  started  ]
 urandom                   [  started  ]
 net.lo                    [  started  ]
 alsasound                 [  started  ]

I guess, if it would help, I could provide strace's with cgroup in effect and without. 
Comment 23 Jouni Rinne 2010-12-23 14:23:52 UTC
(In reply to comment #21)
> 
> Jouni Rinne:
> Are you using the cgroups as well?
>
(Sigh) Yes, I was playing around with Lennart Poettering's cgroups at the time, too...

Damn Lennart... His designs seems good in theory, but in practice they cause much more trouble than they are worth (first pulseaudio, now cgroups...). In theory I'd like to say that he is a moron, but in practice I won't say it, because he would be offended :D

Merry Xmas to you all (even Lennart)!
Comment 24 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-23 19:13:49 UTC
(In reply to comment #22)
> (In reply to comment #21)
> ...
> > Frank+Simone:
> > Can you please trace way down and how where the first place the udev turns up
> > is.
...
> I guess, if it would help, I could provide strace's with cgroup in effect and
> without. 
Yes. Focus on the sysinit runlevel, since it's the smallest place we've seen the error. strace or whatever your preferred debugging tools are. auditd might be useful in this context for debugging as well.

ALL:
- Can we agree what the last version that worked was? There's conflicting reports of 0.6.3 vs. 0.6.5 here.
- We also need the first version where it was broken.
- Can everybody please confirm that the following files exist in /etc/runlevels/sysinit/? devfs dmesg sysfs udev (if you have anything else, I'd like to know as well)
Comment 25 Simone Scanzoni 2010-12-23 23:27:27 UTC
(In reply to comment #24)
> (In reply to comment #22)
> > (In reply to comment #21)
> > ...
> > > Frank+Simone:
> > > Can you please trace way down and how where the first place the udev turns up
> > > is.
> ...
> > I guess, if it would help, I could provide strace's with cgroup in effect and
> > without. 
> Yes. Focus on the sysinit runlevel, since it's the smallest place we've seen
> the error. strace or whatever your preferred debugging tools are. auditd might
> be useful in this context for debugging as well.
> 
> ALL:
> - Can we agree what the last version that worked was? There's conflicting
> reports of 0.6.3 vs. 0.6.5 here.
> - We also need the first version where it was broken.
> - Can everybody please confirm that the following files exist in
> /etc/runlevels/sysinit/? devfs dmesg sysfs udev (if you have anything else, I'd
> like to know as well)
> 

I just tested 0.6.3 (downgraded to 0.6.3, rebooted, touched /etc/init.d/netmount from a shell with cgroup) and it suffers the problem. I suppose we thought it wasn't affected just because we didn't try the cgroup stuff before 0.6.5 .
I don't have /etc/runlevels/sysinit/sysfs , Frank too (see comment #7 ).

Happy holidays to everybody.
Comment 26 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-24 02:11:01 UTC
1. Can you please add sysfs to your sysinit runlevels and test?
2. What other cgroup setup code do you have on your systems?
Comment 27 Frank Ridderbusch 2010-12-24 11:26:52 UTC
Created attachment 257934 [details]
Collection of 4 straces

The tar-archive contains these files:
-rw-rw-r-- root/root     91011 2010-12-24 11:33 strace-cgroup.txt
-rw-rw-r-- root/root     92124 2010-12-24 11:46 strace.txt
-rw-rw-r-- root/root   4627145 2010-12-24 12:03 strace-cgroup-f.txt
-rw-rw-r-- root/root   4614312 2010-12-24 12:04 strace-f.txt
The *-f.txt are traces with child processes included.
Comment 28 Frank Ridderbusch 2010-12-24 11:42:14 UTC
As for the straces. I looked with 

  wdiff -l /tmp/strace-cgroup-f.txt /tmp/strace-f.txt |less

at them. This highlights the differences quite nicely, better I think, than a standard diff would, being line oriented. 

As for the sysfs not being included in sysinit. 

# rc-update add sysfs sysinit 
 * service sysfs added to runlevel sysinit
# rc-status sysinit
Runlevel: sysinit
 dmesg         [  started  ]
 sysfs         [  started  ]
 udev          [  started  ]
 devfs         [  started  ]
# touch /etc/init.d/mdev 
# rc-status sysinit
 * Caching service dependencies ...    [ ok ]
Runlevel: sysinit
 dmesg         [  started  ]
 sysfs         [  started  ]
 devfs         [  started  ]

As you can see, udev is still missing.

As for the croup stuff:

In local_start() in /etc/conf.d/local

	mount -t tmpfs cgroupfs /sys/fs/cgroup
	mkdir -p /sys/fs/cgroup/cpu
	mount -t cgroup -o cpu cgroup /sys/fs/cgroup/cpu
	mkdir -m 0777 /sys/fs/cgroup/cpu/user

and in .bashrc in the 'if [ "$PS1" ] ; then' clause

        mkdir -m 0700 /sys/fs/cgroup/cpu/user/$$
        echo $$ > /sys/fs/cgroup/cpu/user/$$/tasks

BTW, one thing, that I'm not quite understanding. All rc-* commands are symlinks to /sbin/rc. Why is the "rc-status sysinit" output different from "rc-config show sysinit"? Am I missing some intended semantic difference?

Otherwise also Happy Holidays to everybody from me.
Comment 29 Simone Scanzoni 2010-12-24 17:32:58 UTC
(In reply to comment #26)
> 1. Can you please add sysfs to your sysinit runlevels and test?
> 2. What other cgroup setup code do you have on your systems?
> 

1. I did. This time I tested creating an empty file
# touch /etc/init.d/test
to keep the original mtimes. udev isn't seen anyway.


2. In local_start() (this alone doesn't create any problem afaik) :
        echo "/usr/bin/rmcgroup" > /cgroup/cpu/release_agent
        chmod 777 /cgroup/cpu

In a user's .bashrc :
if [ "$PS1" ] ; then
        mkdir -m 0700 -p /cgroup/cpu/$$
        echo 1 > /cgroup/cpu/$$/notify_on_release
        echo $$ > /cgroup/cpu/$$/tasks
fi

In /etc/fstab :
none    /cgroup/cpu    cgroup    cpu    0 0

And
# grep CGROUP -1 /usr/src/linux/.config
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_RESOURCE_COUNTERS is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_SYSFS_DEPRECATED_V2 is not set

I think it's everything.
Comment 30 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-24 18:43:21 UTC
frank, can you please redo the -f straces with the following added to your strace options:
-q -v -s 65536

Interesting observations so far:
(I pre-filtered the -f.txt files for addresses and PIDs that would change between runs, and then diffed them)
1.
- execve("/sbin/rc-update", ["rc-update", "-u"], [/* 57 vars */]) = 0
+ execve("/sbin/rc-update", ["rc-update", "-u"], [/* 59 vars */]) = 0
what was different in the ENV?

2.
This is the REALLY interesting part...
- read(3, "1:cpu:/\n", 1024)        = 8
- read(3, "", 1024)                 = 0
- close(3)                          = 0
- munmap(0xDEADBEEF, 4096)      = 0
- open("/proc/self/status", O_RDONLY) = 3
- fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
- mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xDEADBEEF
- read(3, "Name:\trc-update\nState:\tR (runnin"..., 1024) = 755
- read(3, "", 1024)                 = 0
+ read(3, "1:cpu:/user/5128\n", 1024) = 17
Comment 31 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-24 18:46:46 UTC
I think I found it

openrc/src/librc/librc.c:
236     else if (file_regex("/proc/self/cgroup", ":/.+$"))
237         return RC_SYS_LXC;

and you please test by changing line 236 there to:
236     else if (0 && file_regex("/proc/self/cgroup", ":/.+$"))

It'll break for LXC users, but we need to know if that fixes your systems first.
(alternatively, make a backup of your init.d/ and remove all mentions of 'nolxc' under the 'keyword' command in depend() blocks.)
Comment 32 Simone Scanzoni 2010-12-25 02:22:47 UTC
(In reply to comment #31)
> I think I found it
> 
> openrc/src/librc/librc.c:
> 236     else if (file_regex("/proc/self/cgroup", ":/.+$"))
> 237         return RC_SYS_LXC;
> 
> and you please test by changing line 236 there to:
> 236     else if (0 && file_regex("/proc/self/cgroup", ":/.+$"))
> 
> It'll break for LXC users, but we need to know if that fixes your systems
> first.
> (alternatively, make a backup of your init.d/ and remove all mentions of
> 'nolxc' under the 'keyword' command in depend() blocks.)
> 

It fixes my system! :)
Comment 33 Frank Ridderbusch 2010-12-25 12:01:00 UTC
(In reply to comment #32)
> (In reply to comment #31)
> > I think I found it
> > 
> > openrc/src/librc/librc.c:
> > 236     else if (file_regex("/proc/self/cgroup", ":/.+$"))
> > 237         return RC_SYS_LXC;
> > 
> > and you please test by changing line 236 there to:
> > 236     else if (0 && file_regex("/proc/self/cgroup", ":/.+$"))
> > 
> > It'll break for LXC users, but we need to know if that fixes your systems
> > first.
> > (alternatively, make a backup of your init.d/ and remove all mentions of
> > 'nolxc' under the 'keyword' command in depend() blocks.)
> > 
> 
> It fixes my system! :)
> 

Yes, here too. :-) as well!
Comment 34 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-25 22:50:26 UTC
Ok, so we need to come up with a better test case for LXC.

flameeyes:
ping. The existing testcase for LXC in openrc is:
if (file_regex("/proc/self/cgroup", ":/.+$")) return RC_SYS_LXC;

This is failing under normal usage of cgroups, so we need some better way to detect LXC. I don't know who else is using LXC, or has an LXC box that I can test on for detection.
Comment 35 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-12-25 23:56:18 UTC
And here we hit another problem with the loose way LXC userland has been designed: there is _no_ way to identify whether we're running inside an LXC or not. I've been thinking about it since OpenVZ moved to use cgroups as well, but it might get very _very_ tricky.
Comment 36 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-12-26 08:03:24 UTC
flameeyes:
Can I get access to a LXC VM to test please?

At a glance, the only thing that comes to mind so far is that the rc_sys function could really use a way to override it, maybe a magic file /etc/rc.sys that can contain the possible values for special systems. It would help the prefix people as well (they presently have a special define).
Comment 37 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-12-26 12:37:24 UTC
I don't have a VM with it but I'll see if I can work on getting one up. I think the only other solution would be to have /etc/rc.conf have a rc_system_type="lxc" and get users to explicitly set it there.
Comment 38 William Hubbs gentoo-dev 2010-12-28 20:24:19 UTC
(In reply to comment #37)
> the only other solution would be to have /etc/rc.conf have a
> rc_system_type="lxc" and get users to explicitly set it there.

My vote is for this solution.  I think we can get rid of all of the automatic tests for the system type and make the user configure it with rc_system_type.

What do others think?
Comment 39 Duncan 2011-01-06 00:37:23 UTC
(In reply to comment #38)
> (In reply to comment #37)
> > the only other solution would be to have /etc/rc.conf have a
> > rc_system_type="lxc" and get users to explicitly set it there.
> 
> My vote is for this solution.  I think we can get rid of all of the automatic
> tests for the system type and make the user configure it with rc_system_type.
> 
> What do others think?

+1

I don't use any of the keywords here (and wasn't affected by this bug as admin intuition said better let the task groups stuff cook a bit... now I know why!), but being an openrc tester since it was still part of baselayout (baselayout-1.13, anyone?) and seeing the keywords in various initscripts, I always wondered how they worked.

It seems to me that in the interest of good initscript troubleshooting, anyone using a keyword should /know/ they're using it, because they set it!  "Automagic" is fine when it works, but all too often it doesn't, and I know for my system at least, when it comes to booting, I want to KNOW the status of such control factors.

And the best way to KNOW the status is if you set it yourself. No automagic!  So I too vote to rip out the automagic keyword setup and replace it with a simple variable to which the user can add various keywords as appropriate.  =:^)
Comment 40 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-01-06 02:16:07 UTC
Ok, for now I propose we support as many of the automagic detections as possible still, and throw a warning if we had to detect it manually.

The following will continue to have automagic support:
FreeBSD Jail
Xen0 (Linux+NetBSD)
XenU (Linux+NetBSD)
UML
Vserver
OpenVZ

The following move to the variable-only immediately:
Prefix (pre-populated during ebuild)
LXC

Comment 41 Diego Elio Pettenò (RETIRED) gentoo-dev 2011-01-06 02:24:38 UTC
Looks fine to me Robin.
Comment 42 Duncan 2011-01-06 03:12:17 UTC
(In reply to comment #40)
> Ok, for now I propose we support as many of the automagic detections as
> possible still, and throw a warning if we had to detect it manually.
> 
> The following will continue to have automagic support:
> FreeBSD Jail
> Xen0 (Linux+NetBSD)
> XenU (Linux+NetBSD)
> UML
> Vserver
> OpenVZ

I take it those have robust enough detection that they aren't additional bugs waiting to spring on the unwary simply making use of normal system capacities in other ways?

> The following move to the variable-only immediately:
> Prefix (pre-populated during ebuild)
> LXC

Someone mentioned a manual override.  Will this variable take, for instance, 
both UML and -UML as overrides, for testing and as a quick workaround should another bug like this popup?

Is breaking keyword handling consistency (some automagic some manual) worth it?  Is the automagic long-term maintainable?  Behavior's a lot easier to change before we have an implementation in stable.

But as they say, he who makes the code, makes the rules. =:^)
Comment 43 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-01-06 03:22:01 UTC
This is the logic I had in mind:

if variable exists; then
  use variable ONLY
else
  use automagic
  (and throw a warning somewhere if the automagic was non-NULL)
endif

We'll be shipping an updated rc.conf, so only users that DON'T update with etc-update are going to get the automagic branch.
Comment 44 William Hubbs gentoo-dev 2011-01-06 04:42:29 UTC
My concern about the automagic branch is that we know it is flakey at least for lxc and openvz (the openvz issue is in bug #349389).
Comment 45 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-01-06 05:06:50 UTC
(In reply to comment #44)
> My concern about the automagic branch is that we know it is flakey at least for
> lxc and openvz (the openvz issue is in bug #349389).
I'm intending on dropping the LXC check in the automagic branch, since it's broken entirely (it's detecting cgroups, not LXC). That in itself will fix both this bug and bug #349389.

Give me a few hours to finish the patch, and I'll commit it and link from here.
Comment 46 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-01-06 06:33:59 UTC
Fixed in Git as of 647df8c.
Please test the 9999 ebuild.

Please note the introduction of the new rc_sys variable. It's REQUIRED for LXC users, there is no longer any way of automatically detecting LXC.

For all other users, we should be checking for the existence of /^rc_sys/ in /etc/rc.conf and telling them to etc-update the file. The new default in the file is rc_sys="", which is NOT the same as rc_sys being unset. If it is unset, then the old automatic detection code kicks in.
Comment 47 Duncan 2011-01-06 07:36:13 UTC
(In reply to comment #43)
> if variable exists; then
>   use variable ONLY
> else
>   use automagic
>   (and throw a warning somewhere if the automagic was non-NULL)
> endif
> 
> We'll be shipping an updated rc.conf, so only users that DON'T update with
> etc-update are going to get the automagic branch.

+1
Comment 48 Joakim 2011-01-15 11:43:45 UTC
Sorry for be totally stupid but, rc_sys, it isn't clear from what been said here e.g. on an openvz system if it need to be set on the host node, the containers or both and if the latter if it should be set to the same value in both host and containers? I understand "subsystem" is the magic keyword here but somehow I suspect it doesn't mean the containers aka the systems sub of main system the host node, so confused...

In simple terms I think it need to be clearer whether this is crucial to be set on host node or containers, especially as if not set or wrong it may take down the whole system.
Comment 49 William Hubbs gentoo-dev 2011-01-15 17:00:32 UTC
(In reply to comment #48)
> Sorry for be totally stupid but, rc_sys, it isn't clear from what been said
> here e.g. on an openvz system if it need to be set on the host node, the
> containers or both and if the latter if it should be set to the same value in
> both host and containers? I understand "subsystem" is the magic keyword here
> but somehow I suspect it doesn't mean the containers aka the systems sub of
> main system the host node, so confused...
> In simple terms I think it need to be clearer whether this is crucial to be set
> on host node or containers, especially as if not set or wrong it may take down
> the whole system.

Can you please open a separate bug for this instead of commenting on a closed bug?

Thanks,

William