81509 – [2.4] gentoo-sources: Kernel Panic while under load

Bug 81509 - [2.4] gentoo-sources: Kernel Panic while under load

Summary: [2.4] gentoo-sources: Kernel Panic while under load

Status:	RESOLVED NEEDINFO

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	High critical
Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-02-10 08:21 UTC by Steve Scaffidi
Modified:	2005-03-12 08:14 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steve Scaffidi 2005-02-10 08:21:23 UTC

When the system is under load, the kernel panics. This has been happening on a number of systems where the kernel was upgraded. The systems with this issue are running kernels from gentoo-sources 2.4.26 thru 2.4.28

The postfix servers typically stay up for several hours, sometimes for several days. It appears that they crash when the load is higher than normal.

The OpenLDAP servers also crash under high load, and in addition, using slapadd causes a panic when importing files over 5MB.

Reproducible: Sometimes
Steps to Reproduce:
1. 
2.
3.

Actual Results:  
Feb  2 20:47:20 clipper kernel BUG at exit.c:524!
Feb  2 20:47:20 clipper invalid operand: 0000
Feb  2 20:47:20 clipper CPU:    0
Feb  2 20:47:20 clipper EIP:    0010:[<80145e83>]    Not tainted
Feb  2 20:47:20 clipper EFLAGS: 00010282
Feb  2 20:47:20 clipper eax: 00000000   ebx: 8130e200   ecx: 00000000   edx:
00000000
Feb  2 20:47:20 clipper esi: 8e53a000   edi: 8e53a000   ebp: 8e53bee8   esp:
8e53bed0
Feb  2 20:47:20 clipper ds: 0018   es: 0018   ss: 0018
Feb  2 20:47:20 clipper Process slapadd (pid: 0, stackpage=8e53b000)
Feb  2 20:47:20 clipper Stack: 8e53a000 813a2e60 8f450f10 8e53a000 00000000
8e53bf2c 8e53bf08 8014cd97
Feb  2 20:47:20 clipper 00000002 00000002 8e53a67c 00000002 00000002 8e53bfc0
8e53bfb8 80129296
Feb  2 20:47:20 clipper 00000002 00000002 8e53bf2c 8e53bf44 8ea091e0 8fa8fad8
8e53a674 00000002
Feb  2 20:47:20 clipper Call Trace:    [<8014cd97>] [<80129296>] [<8013d164>]
[<8013e8d7>] [<80129524>]
Feb  2 20:47:20 clipper
Feb  2 20:47:20 clipper Code: 0f 0b 0c 02 05 c0 35 80 e9 00 fe ff ff c7 04 24 01
00 00 00
Feb  2 20:47:20 clipper <0>Kernel panic: Attempted to kill the idle task!

Expected Results:  
Not crashed ;-)

This seems to be reproduceable across several different servers, with different
hardware. (Dell PowerApp 120, PowerEdge 750, Optiplex ??)

Also, this bug is NOT reproduceable when using kernels based on vanilla-sources.

Servers we have not yet upgraded are still stable. These are running 2.4.20 and
under.

Comment 1 Tim Yamin (RETIRED) gentoo-dev

2005-02-10 09:32:17 UTC

Can you please:

a) Try vanilla-sources-2.4.28
b) Run that log trace through the 'ksymoops' utility on the server from where the stack trace was gathered and paste the output.

Comment 2 Steve Scaffidi 2005-02-11 10:34:02 UTC

Unfortunately, I'm no longer running a gentoo-sources kernel on any production servers, except the two that had not been upgraded, and thus have remained stable. Those are running 2.4.20-gentoo-r6 and 2.4.20-gentoo-r31.

However, I will try to apropriate a server from the lab and attempt to load test it to get some results. This may take several days to do. As a Gentoo fan, I wouldn't want management to have any reason to go sour on it.

As a side note, the production servers have been moved to kernels based on vanilla-sources 2.4.28 and have not crashed yet.

Comment 3 Steve Scaffidi 2005-03-12 08:14:25 UTC

Unfortunately, I have not been able to reproduce the problem on a test server. Also, one of the production servers that I moved to using a vanilla-kernel has been crashing at random. Luckily it is not nearly at the frequency that it used to crash. It used to go down every other day and now it tends to crash once a week or so. The frustrating part is, no stack dump or kernel trace or even error message is left in any of the logs. Since this is a production server it is set to reboot on a kernel panic so I never get to see the message on the console. I plan on trying a few more kernel variations.


I'm going to close this bug for now, since I'm beginning to wonder if it may be a hardware issue of some sort.