Summary: | grsec kernels: reboot/shutdown problem for processes not started via init scripts | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alexander Stoll <technoworx> |
Component: | [OLD] baselayout | Assignee: | The Gentoo Linux Hardened Team <hardened> |
Status: | RESOLVED UPSTREAM | ||
Severity: | critical | CC: | base-system, kfm, ragnaroc |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | 95565 | ||
Bug Blocks: | |||
Attachments: | grsec-2.1.8-fix-kill.patch |
Description
Alexander Stoll
2005-07-18 02:56:53 UTC
there already is code that tries to kill all non-init-related processes the last time we saw troubles like this it was due to the user running a hardened kernel ... i see you too are running grsec ... if you disable all grsec / pax features and reboot, does it work ? also, if you login at the end when it asks for your root pwd, do you see anything fishy in the output of `dmesg` ? (In reply to comment #1) sorry for the long delay, my DSL Line broke away for several days and this caused a lot of trouble... > there already is code that tries to kill all non-init-related processes > the last time we saw troubles like this it was due to the user running a > hardened kernel ... i see you too are running grsec ... if you disable all grsec > / pax features and reboot, does it work ? also, if you login at the end when it > asks for your root pwd, do you see anything fishy in the output of `dmesg` ? I (In reply to comment #1) sorry for the long delay, my DSL Line broke away for several days and this caused a lot of trouble... > there already is code that tries to kill all non-init-related processes > the last time we saw troubles like this it was due to the user running a > hardened kernel ... i see you too are running grsec ... if you disable all grsec > / pax features and reboot, does it work ? also, if you login at the end when it > asks for your root pwd, do you see anything fishy in the output of `dmesg` ? I´ve tested it with the same kernel and the whole grsec stuff disabled and it worked all like it should. It seems only hardened kernels are affected, tested with grsec option "high" in kernel config. This reboot / halt issue is the only strange thing on this system, no strange dmesg output or something. This issue is very critical, last time I´ve tested it with grsec kernel, reiserfsck broke locale in glibc and it has to be reemerged to restore a working system... Please let me know if I could test anything, since this really puts the integrity of the filesystem at risk. guess it's up to the hardened team to figure out why grsec prevents proper signal handling at shut down Anybody else seeing this problem? Yeah; I get it sometimes - hadn't tried to chase it down though. I also encounter that bug, but given the fact that it happens on production machines I do not have the possibility to debug it. Can't say I've ever encounted this. Probably wont look into trying to track it down either. If you think it's a bug with grsec I'd suggest reporting upstream. (In reply to comment #8) > Can't say I've ever encounted this. Probably wont look into trying to track it > down either. If you think it's a bug with grsec I'd suggest reporting upstream. Is reported upstream waiting for feedback... (In reply to comment #8) > Can't say I've ever encounted this. Probably wont look into trying to track it > down either. If you think it's a bug with grsec I'd suggest reporting upstream. No reaction from upstream, I suggest that a Gentoo maintainer contacts spender, maybe he gets heared... The problem itself deserves a little more attention than it gets now, because it is a _critical_ issue anyone could lose data at shutting down the system or a simple reboot. It is _not_ a issue of sloppy system administration - ensure not to leave running processes and all is fine - another reproducable cause is compiling a new glibc (update) and you can kill what you want but never get a proper unmount/remount ro for the fs root... So this leaves you with the chance for a defective root fs at keeping your system in sync. Further testing shows that it is clearly a Gentoo specific issue, the same kernel running on an old SuSE system never shows that weired behavior. It is clearly either a grsec bug triggered by Gentoo specific handling of system shutdown or something related to INIT/hardened profile. Be sure that I will assist you in testing things as required in any way I could be of help, sadly I (In reply to comment #8) > Can't say I've ever encounted this. Probably wont look into trying to track it > down either. If you think it's a bug with grsec I'd suggest reporting upstream. No reaction from upstream, I suggest that a Gentoo maintainer contacts spender, maybe he gets heared... The problem itself deserves a little more attention than it gets now, because it is a _critical_ issue anyone could lose data at shutting down the system or a simple reboot. It is _not_ a issue of sloppy system administration - ensure not to leave running processes and all is fine - another reproducable cause is compiling a new glibc (update) and you can kill what you want but never get a proper unmount/remount ro for the fs root... So this leaves you with the chance for a defective root fs at keeping your system in sync. Further testing shows that it is clearly a Gentoo specific issue, the same kernel running on an old SuSE system never shows that weired behavior. It is clearly either a grsec bug triggered by Gentoo specific handling of system shutdown or something related to INIT/hardened profile. Be sure that I will assist you in testing things as required in any way I could be of help, sadly I´m not a gifted (kernel) hacker that could help track down this in reasonable time on the source level... I can replicate this bug anytime if I place something like SV:123456:respawn:/usr/bin/svscanboot in my inittab - (svscanboot is part of daemontools). in this case I end up with a lot of processes that are not at all affected by `killall5 -9`. * Unmounting filesystems ... [ ok ] * Removing dm-crypt mappings * Removing dm-crypt mapping for: crypt-swap ... [ ok ] * Remounting remaining filesystems readonly ...su(pam_unix)[10648]: session clo sed for user root umount: devpts busy - remounted read-only Segmentation fault umount: /: device is busy umount: /: device is busy umount: /: device is busy [ !! ] Give root password for maintenance (or type Control-D to continue): pandora ~ # ps ax | grep -v '\[.*\]' 24051 ? Ss 0:00 /bin/sh /usr/bin/svscanboot 26519 ? S 0:00 svscan /service 5449 ? S 0:00 readproctitle service errors: ....................... 30066 ? S 0:00 supervise qmail-send 13024 ? S 0:00 supervise log 2499 ? S 0:00 supervise sshd 22689 ? S 0:00 supervise httpd 5442 ? S 0:00 supervise log 11461 ? S 0:00 /usr/sbin/sshd -D 17919 ? S 0:00 qmail-send 29039 ? S 0:00 /usr/bin/multilog t s2500000 n10 /var/log/qmail/qmail 3071 ? S 0:00 /usr/bin/tcpserver -vDRHl0 -b50 -c100 0 80 /usr/bin/h 29970 ? S 0:00 /usr/bin/multilog t /var/log/httpd -* +* * status: * 15792 ? S 0:00 qmail-lspawn ./.maildir/ 7254 ? S 0:00 qmail-rspawn 30147 ? S 0:00 qmail-clean 15841 ? Ss 0:00 sshd: prodan [priv] 4628 ? S 0:00 sshd: prodan@notty 14089 ? Ss 0:00 /bin/bash /sbin/rc reboot 19349 ttyS0 Ss 0:00 bash 14461 ttyS0 R+ 0:00 ps ax pandora ~ # ps ax | wc -l 55 pandora ~ # killall5 -9 pandora ~ # killall5 -9 pandora ~ # killall5 -9 pandora ~ # ps ax | wc -l 55 pandora ~ # epm -qf `which killall5` sysvinit-2.86 pandora ~ # uname -a Linux pandora 2.6.11-hardened-r15-sunspire #3 SMP Sat Jul 2 01:19:18 EEST 2005 i 686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux (In reply to comment #11) > I can replicate this bug anytime if I place something like > SV:123456:respawn:/usr/bin/svscanboot please remove runlevel 6 and check again, please have a look at http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4#doc_chap1 if you specify runlevel 6 (reboot) for svscan, every time you kill svscan, init will start this process again and in runlevel 6 you never get a root fs that is not busy... Init is responsible for keeping processes running for the specified runlevel, this is completely desired behavior. If you comment out the svscan line in inittab and type "init q" and svscan is not gone, then you stepped over a real issue... Please check this and report, its important to validate reported test cases until its clear where the (real) problem is located. (In reply to comment #12) > Please check this and report, its important to validate reported test cases > until its clear where the (real) problem is located. oops, you were right. comment #11 is not a good test case for this bug. I have found a way tu circumvent the odd filechecks when you upgrade your system with a new glibc that causes a busy rootfs on reboot/halt - type the following on console/term: 1. "init q" 2. "init u" 3. "init s" When System comes up again in default runlevel 3, every time I follow this procedure, I can savely reboot/halt the system without getting a busy rootfs and inconsistent filesystem. This brings up a new approach to this odd thing, is it an init related issue??? All people involved, please test... *** Bug 116832 has been marked as a duplicate of this bug. *** could someone with this issue give one of the newer grsec patches a test: http://www.grsecurity.net/~spender/ Brad updated them and he thinks he fixed this bug, he just needs someone to verify The new patch (grsecurity-2.1.8-2.6.14.7-200602052251) addresses two things over the prior release: 1) Hopefully this bug (for which I shall attach an incremental patch here) 2) bug 121250 Will report back if I anything conclusive comes of it on my setup. Created attachment 79003 [details, diff]
grsec-2.1.8-fix-kill.patch
This should apply cleanly on top of a hardened-sources-2.6.14-r5 or grsec-2.1.8-2.6.14.6-200601211647 tree.
Excellent. I had been able to trigger this issue consistently by simply running a screen session and neglecting to close it completely before rebooting. I applied the above patch and set up a screen with a typical workload (irssi, few text editor instances etc) before issuing a reboot command. And now the problem has completely vanished :) awesome, thanks for testing Fixed in hardened-sources-2.6.14-r6. |