14466 – gentoo-sources-2.4.19-r9 kernel locks up

Bug 14466 - gentoo-sources-2.4.19-r9 kernel locks up

Summary: gentoo-sources-2.4.19-r9 kernel locks up

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	High normal (vote)
Assignee:	x86-kernel@gentoo.org (DEPRECATED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2003-01-23 17:57 UTC by Scott Beck
Modified:	2003-03-25 23:08 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Scott Beck 2003-01-23 17:57:27 UTC

Hi,

I've found that a simple perl oneliner that attempts to use large amounts
of memory my system causes the system to become non-responsive and never
recover. I tested this with gentoo-sources-2.4.19-r9. I tried
vanilla-sources-2.4.19 and vanilla-sources-2.4.20, neither one of these
kernels cause the lockups I see with the gentoo-sources. Under both of those
kernels the process ends up being killed. I can only assume it is one of the
patches that is applied. My system is running using the ~x86 keyword and is
completely uptodate. The perl command that causes this lockup is:
    perl -we 'print 0x7cff_ffff .. 0x7fff_ffff'

This is perl5.8.0 (latest for ~x86).

If you would like, I can attach the .config file for each of the kernels I
tried.

Thanks,

Scott

Comment 1 Jason Rhinelander 2003-01-24 11:32:24 UTC

I played around with this a little more, and compiled a few more kernels. I
didn't have too much time to spend on this, but what I tried ought to be helpful.

First, I modified the ebuild to only apply patches 00 through 08 and recompiled
the kernel.  This did not fix it.

Thinking that it might have something to do with the preemptive kernel (just a
guess, I admit), I also compiled the same kernel with preemptive turned off. 
This also did not fix it.

However, in testing I did notice something very interesting, while watching it
with a -20 niced top.  Top seemed to work perfectly well _until_ the load
average got to 10 - at which point top stopped responding.  But this is not
what's interesting.  What's interesting is that the perl process got to a point
where it was occupying pretty much all free memory (about 980MB, if I remember
correctly), and stopped growing in memory.  The CPU usage wasn't high -
generally at 0%, with a spike every couple seconds to full CPU (top was running
with 0.1s intervals).

Something else I should point out is that the system does not become totally
unresponsive - you can still ping the system, and a Ctrl-C on the linux console
_did_ kill the process, but took anywhere from just a couple seconds to 30
seconds.  In a terminal in X (tested xterm and gnome-terminal), Ctrl-C didn't do
anything, even after 10 minutes (at which point I became too impatient and did a
hard reset).

Now, on a vanilla kernel, when a process tries to grab too much memory, it dies
and "Terminated" is generally displayed on the terminal.  From what I've seen
here, however, it looks as though the kernel is no longer killing processes that
run out of memory.

So, as best Scott and I can guess, memory allocation is not failing properly,
thus causing a lockup.  This might be why top stopped responding - when the load
got to 10, it probably tried to reallocate the memory for the line, since it
became a character longer.

Also, a recommendation to anyone who tries to reproduce this - turn off swap
first!  Not doing so just makes the system take unnecessarily long to fill the
swap device before exhibiting the above behaviour.

Comment 2 Jason Rhinelander 2003-03-25 22:26:21 UTC

This is (or at least seems to be) fixed in gentoo-sources-2.4.20-r2 (not sure about -r1, since it refused to work on _several_ systems in our office, with different hardware and configurations).  My guess would be something in these patches:

  * removed ck4 O(1) sched, ll, preempt patch
  * added rml preempt & ll

Note that this bug still affects ck-sources, at least 2.4.20-r3, and possibly -r4.

Comment 3 Jay Pfeifer (RETIRED) gentoo-dev

2003-03-25 23:08:14 UTC

glad to hear the new sources work. i'll try your little one liner on subsequent gentoo-sources 
kernel releases. 
 
Thanks, 
 
Jay