Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 99413 - grsec kernels: reboot/shutdown problem for processes not started via init scripts
Summary: grsec kernels: reboot/shutdown problem for processes not started via init scr...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: The Gentoo Linux Hardened Team
URL:
Whiteboard:
Keywords:
: 116832 (view as bug list)
Depends on: 95565
Blocks:
  Show dependency tree
 
Reported: 2005-07-18 02:56 UTC by Alexander Stoll
Modified: 2006-03-16 04:39 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
grsec-2.1.8-fix-kill.patch (grsec-2.1.8-fix-kill.patch,708 bytes, patch)
2006-02-05 20:35 UTC, kfm
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Stoll 2005-07-18 02:56:53 UTC
When proceses started not via init scripts, for example self compiled apache,
manually started, or left over processes not shut down by init scripts on reboot
/ shutdown, unmount of "/" fails because of busy filesystem from not terminated
processes. Then the famous "mounting read only / hit CTRL-D" messages appears
followed by a hard reset/halt with inconsistent filesystem.
This should be fairly improved, sending a SIGKILL to such remaining processes
before unmounting. Today I found garbage in my .bash_history after such an
unclean reboot (on a Reiserfs partition!)

Reproducible: Always
Steps to Reproduce:
1. start a process that is not killed via init scripts on shutdown/reboot
2. reboot, watch busy / error message followed by a hard reboot
3. enjoy fsck for a inconsistent filesystem

Actual Results:  
file system corruption due to hard reboot from not unmounting fs, busy from
leftover processes

Expected Results:  
SIGKILL of all leftover processes, proper unmount of / and no fscking

Portage 2.0.51.22-r1 (default-linux/x86/2005.0, gcc-3.3.5-20050130, glibc-2.3.5-
r0, 2.4.31-grsec i686)
=================================================================
System uname: 2.4.31-grsec i686 Pentium III (Katmai)
Gentoo Base System version 1.6.12
dev-lang/python:     2.3.5
sys-apps/sandbox:    1.2.10
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5
sys-devel/binutils:  2.15.92.0.2-r10
sys-devel/libtool:   1.5.18-r1
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O3 -march=pentium3 -fomit-frame-pointer -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/
config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium3 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/Linux/
distributions/gentoo"
LC_ALL="de_DE@euro"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 bcmath bitmap-fonts bzlib crypt ctype dbm dbx emboss erandom ethereal f
latfile fortran ftp gd gd-external gdbm gif gnutls hardened hardenedphp icc icon
v ifc imagemagick imap imlib innodb jpeg maildir mbox memlimit mhash mime mmap m
mx mp3 mysql mysqli ncurses nls pam pcntl pcre perl pic pie png posix readline r
ecode shared sharedmem slang sockets socks5 sse ssl sysvipc szip tcpd tiff truet
ype truetype-fonts type1-fonts unicode xml2 zlib userland_GNU kernel_linux elibc
_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LDFLAGS, LINGUAS, MAKEOPTS, PORTDIR_OVERLAY
Comment 1 SpanKY gentoo-dev 2005-07-18 06:06:16 UTC
there already is code that tries to kill all non-init-related processes

the last time we saw troubles like this it was due to the user running a
hardened kernel ... i see you too are running grsec ... if you disable all grsec
/ pax features and reboot, does it work ?  also, if you login at the end when it
asks for your root pwd, do you see anything fishy in the output of `dmesg` ?
Comment 2 Alexander Stoll 2005-07-26 04:29:50 UTC
(In reply to comment #1)
sorry for the long delay, my DSL Line broke away for several days and this
caused a lot of trouble...

> there already is code that tries to kill all non-init-related processes
> the last time we saw troubles like this it was due to the user running a
> hardened kernel ... i see you too are running grsec ... if you disable all grsec
> / pax features and reboot, does it work ?  also, if you login at the end when it
> asks for your root pwd, do you see anything fishy in the output of `dmesg` ?

I
Comment 3 Alexander Stoll 2005-07-26 04:29:50 UTC
(In reply to comment #1)
sorry for the long delay, my DSL Line broke away for several days and this
caused a lot of trouble...

> there already is code that tries to kill all non-init-related processes
> the last time we saw troubles like this it was due to the user running a
> hardened kernel ... i see you too are running grsec ... if you disable all grsec
> / pax features and reboot, does it work ?  also, if you login at the end when it
> asks for your root pwd, do you see anything fishy in the output of `dmesg` ?

I´ve tested it with the same kernel and the whole grsec stuff disabled and it
worked all like it should. It seems only hardened kernels are affected, tested
with grsec option "high" in kernel config.
This reboot / halt issue is the only strange thing on this system, no strange
dmesg output or something.
This issue is very critical, last time I´ve tested it with grsec kernel,
reiserfsck broke locale in glibc and it has to be reemerged to restore a working
system...
Please let me know if I could test anything, since this really puts the
integrity of the filesystem at risk.
Comment 4 SpanKY gentoo-dev 2005-07-26 06:48:42 UTC
guess it's up to the hardened team to figure out why grsec prevents proper
signal handling at shut down
Comment 5 solar (RETIRED) gentoo-dev 2005-07-26 14:31:35 UTC
Anybody else seeing this problem? 
Comment 6 SpanKY gentoo-dev 2005-07-26 15:09:26 UTC
yes, see Bug 95565
Comment 7 Kevin F. Quinn (RETIRED) gentoo-dev 2005-07-26 15:46:13 UTC
Yeah; I get it sometimes - hadn't tried to chase it down though.
Comment 8 petre rodan (RETIRED) gentoo-dev 2005-07-26 22:57:18 UTC
I also encounter that bug, but given the fact that it happens on production
machines I do not have the possibility to debug it.
Comment 9 solar (RETIRED) gentoo-dev 2005-08-02 04:51:27 UTC
Can't say I've ever encounted this. Probably wont look into trying to track it 
down either. If you think it's a bug with grsec I'd suggest reporting upstream.
Comment 10 Alexander Stoll 2005-08-03 07:37:28 UTC
(In reply to comment #8)
> Can't say I've ever encounted this. Probably wont look into trying to track it 
> down either. If you think it's a bug with grsec I'd suggest reporting upstream.
Is reported upstream waiting for feedback...
Comment 11 Alexander Stoll 2005-08-09 02:30:09 UTC
(In reply to comment #8)
> Can't say I've ever encounted this. Probably wont look into trying to track it 
> down either. If you think it's a bug with grsec I'd suggest reporting upstream.

No reaction from upstream, I suggest that a Gentoo maintainer contacts spender,
maybe he gets heared...
The problem itself deserves a little more attention than it gets now, because it
is a _critical_ issue anyone could lose data at shutting down the system or a
simple reboot. It is _not_ a issue of sloppy system administration - ensure not
to leave running processes and all is fine - another reproducable cause is
compiling a new glibc (update) and you can kill what you want but never get a
proper unmount/remount ro for the fs root...
So this leaves you with the chance for a defective root fs at keeping your
system in sync.
Further testing shows that it is clearly a Gentoo specific issue, the same
kernel running on an old SuSE system never shows that weired behavior.
It is clearly either a grsec bug triggered by Gentoo specific handling of system
shutdown or something related to INIT/hardened profile.
Be sure that I will assist you in testing things as required in any way I could
be of help, sadly I
Comment 12 Alexander Stoll 2005-08-09 02:30:09 UTC
(In reply to comment #8)
> Can't say I've ever encounted this. Probably wont look into trying to track it 
> down either. If you think it's a bug with grsec I'd suggest reporting upstream.

No reaction from upstream, I suggest that a Gentoo maintainer contacts spender,
maybe he gets heared...
The problem itself deserves a little more attention than it gets now, because it
is a _critical_ issue anyone could lose data at shutting down the system or a
simple reboot. It is _not_ a issue of sloppy system administration - ensure not
to leave running processes and all is fine - another reproducable cause is
compiling a new glibc (update) and you can kill what you want but never get a
proper unmount/remount ro for the fs root...
So this leaves you with the chance for a defective root fs at keeping your
system in sync.
Further testing shows that it is clearly a Gentoo specific issue, the same
kernel running on an old SuSE system never shows that weired behavior.
It is clearly either a grsec bug triggered by Gentoo specific handling of system
shutdown or something related to INIT/hardened profile.
Be sure that I will assist you in testing things as required in any way I could
be of help, sadly I´m not a gifted (kernel) hacker that could help track down
this in reasonable time on the source level...
Comment 13 petre rodan (RETIRED) gentoo-dev 2005-08-22 09:36:44 UTC
I can replicate this bug anytime if I place something like

SV:123456:respawn:/usr/bin/svscanboot

in my inittab - (svscanboot is part of daemontools).

in this case I end up with a lot of processes that are not at all affected by
`killall5 -9`. 

 * Unmounting filesystems ...                                             [ ok ]
 * Removing dm-crypt mappings
 * Removing dm-crypt mapping for: crypt-swap ...                          [ ok ]
 * Remounting remaining filesystems readonly ...su(pam_unix)[10648]: session clo
sed for user root
umount: devpts busy - remounted read-only
Segmentation fault
umount: /: device is busy
umount: /: device is busy
umount: /: device is busy
                          [ !! ]
Give root password for maintenance
(or type Control-D to continue): 
pandora ~ # ps ax | grep -v '\[.*\]'
24051 ?        Ss     0:00 /bin/sh /usr/bin/svscanboot
26519 ?        S      0:00 svscan /service
 5449 ?        S      0:00 readproctitle service errors: .......................
30066 ?        S      0:00 supervise qmail-send
13024 ?        S      0:00 supervise log
 2499 ?        S      0:00 supervise sshd
22689 ?        S      0:00 supervise httpd
 5442 ?        S      0:00 supervise log
11461 ?        S      0:00 /usr/sbin/sshd -D
17919 ?        S      0:00 qmail-send
29039 ?        S      0:00 /usr/bin/multilog t s2500000 n10 /var/log/qmail/qmail
 3071 ?        S      0:00 /usr/bin/tcpserver -vDRHl0 -b50 -c100 0 80 /usr/bin/h
29970 ?        S      0:00 /usr/bin/multilog t /var/log/httpd -* +* * status: *
15792 ?        S      0:00 qmail-lspawn ./.maildir/
 7254 ?        S      0:00 qmail-rspawn
30147 ?        S      0:00 qmail-clean
15841 ?        Ss     0:00 sshd: prodan [priv] 
 4628 ?        S      0:00 sshd: prodan@notty  
14089 ?        Ss     0:00 /bin/bash /sbin/rc reboot
19349 ttyS0    Ss     0:00 bash
14461 ttyS0    R+     0:00 ps ax
pandora ~ # ps ax | wc -l
55
pandora ~ # killall5 -9
pandora ~ # killall5 -9
pandora ~ # killall5 -9
pandora ~ # ps ax | wc -l
55
pandora ~ # epm -qf `which killall5`
sysvinit-2.86
pandora ~ # uname -a
Linux pandora 2.6.11-hardened-r15-sunspire #3 SMP Sat Jul 2 01:19:18 EEST 2005 i
686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
Comment 14 Alexander Stoll 2005-08-22 10:17:26 UTC
(In reply to comment #11)
> I can replicate this bug anytime if I place something like
> SV:123456:respawn:/usr/bin/svscanboot
please remove runlevel 6 and check again, please have a look at
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4#doc_chap1

if you specify runlevel 6 (reboot) for svscan, every time you kill svscan, init
will start this process again and in runlevel 6 you never get a root fs that is
not busy...
Init is responsible for keeping processes running for the specified runlevel,
this is completely desired behavior. If you comment out the svscan line in
inittab and type "init q" and svscan is not gone, then you stepped over a real
issue...

Please check this and report, its important to validate reported test cases
until its clear where the (real) problem is located.
Comment 15 petre rodan (RETIRED) gentoo-dev 2005-08-22 10:35:00 UTC
(In reply to comment #12)
> Please check this and report, its important to validate reported test cases
> until its clear where the (real) problem is located.

oops, you were right. comment #11 is not a good test case for this bug.
Comment 16 Alexander Stoll 2005-12-10 08:56:37 UTC
I have found a way tu circumvent the odd filechecks when you upgrade your system
with a new glibc that causes a busy rootfs on reboot/halt - type the following
on console/term:

1. "init q"
2. "init u"
3. "init s"

When System comes up again in default runlevel 3, every time I follow this
procedure, I can savely reboot/halt the system without getting a busy rootfs and
inconsistent filesystem.
This brings up a new approach to this odd thing, is it an init related issue???
All people involved, please test...

Comment 17 kfm 2005-12-28 18:02:05 UTC
*** Bug 116832 has been marked as a duplicate of this bug. ***
Comment 18 SpanKY gentoo-dev 2006-02-05 19:59:50 UTC
could someone with this issue give one of the newer grsec patches a test:
http://www.grsecurity.net/~spender/

Brad updated them and he thinks he fixed this bug, he just needs someone to verify
Comment 19 kfm 2006-02-05 20:34:00 UTC
The new patch (grsecurity-2.1.8-2.6.14.7-200602052251) addresses two things over the prior release:

1) Hopefully this bug (for which I shall attach an incremental patch here)
2) bug 121250

Will report back if I anything conclusive comes of it on my setup.
Comment 20 kfm 2006-02-05 20:35:20 UTC
Created attachment 79003 [details, diff]
grsec-2.1.8-fix-kill.patch

This should apply cleanly on top of a hardened-sources-2.6.14-r5 or grsec-2.1.8-2.6.14.6-200601211647 tree.
Comment 21 kfm 2006-02-05 21:20:17 UTC
Excellent. I had been able to trigger this issue consistently by simply running a screen session and neglecting to close it completely before rebooting. I applied the above patch and set up a screen with a typical workload (irssi, few text editor instances etc) before issuing a reboot command. And now the problem has completely vanished :)
Comment 22 SpanKY gentoo-dev 2006-02-06 05:43:16 UTC
awesome, thanks for testing
Comment 23 kfm 2006-03-16 04:39:47 UTC
Fixed in hardened-sources-2.6.14-r6.