Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 209947 - Recursive process loops hang (crash) Linux
Summary: Recursive process loops hang (crash) Linux
Status: VERIFIED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-13 01:08 UTC by Robert Bradbury
Modified: 2008-02-14 12:43 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Bradbury 2008-02-13 01:08:36 UTC
System is a Pentium 4 (prescott) with 1.5GB of main memory and 9+GB of swap space.
Running 2.6.23-gentoo-r8 #1 PREEMPT.

It is possible to "hang" the system such that it is entirely unresponsive to X commands and a reboot is required.

I have run into this situation on two occasions.  The first I believe was with a make command from a downloaded package which may have run a shell script which re-executed the make command (i.e. an infinite process loop) [The Makefile was buggy].  The second involved firefox which invoked a video via gtk-gnash.  I believe the firefox/javascript/video itself was infinitely replaying and perhaps nesting the player (gtk-gnash) commands (firefox has a "bug" that it doesn't "collect" child processes involving "flash" videos properly).

The symptoms were identical.  The executing gnome-panel "system monitor" traces first had an increase in CPU time, then an increase in swapping activity, then the CPU time dropped to near zero, swapping activity remained high then the system monitor finally "hung".  At that point it was impossible to bring any window (existing terminals, firefox, etc.) into the foreground on the display current workspace and impossible to switch workspaces.  One could still move the mouse cursor around but it remained as a vertical bar and would never return to an arrow.  So the system was "alive" but completely unusable.  In the "firefox" situation, an mplayer playing background music on a loop ceased playing music.

It is my opinion that the Linux kernel swapped out all "active" programs and simply refused to swap them back in -- presumably due to the "program creation" infinite loops causing all programs (or the windows running them) to be swapped out and thus non-executable.

This IMO is a fundamental bug for desktop linux systems (or even virtual OS systems) since it seems to be an easy way to bring Linux to its knees.


Reproducible: Always

Steps to Reproduce:
1. I would test either nested shell invocation loops or programs which call programs which call the initial program or infinite "fork()" loops, e.g. "while (1) fork();" or some equivalent, perhaps with an exec of some program which  waits for input (i.e. will force swapping once memory is exhausted).

Fundamentally, you should be able to run an infinite recursive shell loop and still have a usable system.  It should eventually exhaust swap space, and terminate the recursive loop.  All the time the system *should* remain usable.

Actual Results:  
X (Linux?) hangs and is unusable.

Expected Results:  
System should always immediately bring to the foreground a window or a gnome workspace you have clicked on.  The problem may require fixes to Linux (so "active" processes are forced forward and "fork/exec/swap" take secondary priority) and/or changes to X/Gnome to produce "preemptive" behavior for mouse or keyboard activity.

For people reading this, I believe this is a fundamental Linux kernel bug (as Linux does not swap/page at anywhere near the capacity of the disks (a few hundred to a few thousand blocks vs. ten thousand+ blocks).  If you know the best list on which to file Linux tuning/critical kernel questions/bugs please append it to this bug report.  I will be happy to take the problem there.

I believe there should be a Linux kernel test which (a) maximizes the CPU while (b) maximizing the swap disk at its maximal transfer rate; while (c) maintains a highly responsive desktop environment (firefox is good at stretching the system when you get 50-100 windows and 300-400 tabs [typically firefox is consuming 40-70% of main memory] -- In this state, both firefox and X CPU usage start to increase significantly).
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2008-02-13 01:18:34 UTC
And this is a Gentoo-specific issue exactly why? What are you expecting from us?
Comment 2 Robert Bradbury 2008-02-13 09:56:21 UTC
Jakub, I thought I would start with Gentoo because "uname -a" indicates the kernel is "PREEMPT" which I presume to mean it is "preemptive".  I was hoping that meant   it was tuned for desktop use but this bug report suggests it is not.

If you feel it isn't a Gentoo problem, then I would like suggestions as to:
a) what news group, blog, BBS, etc. I should post it to for feedback; and/or
b) precisely where one goes to file a Linux kernel bug report (to the best of my knowledge there is no bugzilla database).

I am aware that there have been some "wars" in the past (a year ago or more) regarding configuration of the kernel for desktop vs. server use (including changing changing vm.swappiness settings but I've never noticed much difference in the system behavior when I change that and I thought the kernel had been modified to resolve these problems.

If you resolve a bug as UPSTREAM, the bug report should point to where the bug report has been filed.
Comment 3 Jakub Moc (RETIRED) gentoo-dev 2008-02-13 10:10:34 UTC
Well... so for you issues - if you complain about kernel design, that's what LKML [1] is for, but... you'd better get some insight first on the inner workings of that thing to not have completely unrealistic expectations - such as that spawning infinite number of processes which exhaust all available RAM should leave your system perfectly responsive. It will become responsive again once OOM killer has done its job. :P And yeah, swap is slower than RAM as well, by several orders of magnitude. ;)

Wrt FF, the 2.0 series still uses X server as storage for uncompressed images, so wondering why X server CPU usage skyrockets when you launch hundreds of windows with thousands of tabs is uhm... kinda weird. There's no point in filing any bugs about it, FF3 is completely reworked wrt this, you can try the (unsupported!) ebuilds from mozilla overlay [2]

[1] http://lkml.org/
[2] http://overlays.gentoo.org/proj/mozilla/browser/www-client/mozilla-firefox

Closing, there's nothing we could fix here, at all.
Comment 4 Robert Bradbury 2008-02-14 12:43:53 UTC
Actually there is something that can be done about this bug by Gentoo and that is to have the default system login definitions for use process limits be more reasonable.

The shell is about 700 kb, so a reasonable process limit might be of the form:
/proc/meminfo [MemTtotal] / 700 which works out to ~2200 on my system.  Or given scans of my "typical" process usage perhaps between 200 and 400.  Then have the default login do a ulimit -Su $NUMPROC_LIMIT.

The OOM killer is a very bad solution as I believe it kills the largest processes first (which in my case would be firefox, X and open office) -- not runaway recursive shells or shells and makes (in which case it should be looking at the process count per process name and the process creation times/rates).