When the system is under load, the kernel panics. This has been happening on a number of systems where the kernel was upgraded. The systems with this issue are running kernels from gentoo-sources 2.4.26 thru 2.4.28 The postfix servers typically stay up for several hours, sometimes for several days. It appears that they crash when the load is higher than normal. The OpenLDAP servers also crash under high load, and in addition, using slapadd causes a panic when importing files over 5MB. Reproducible: Sometimes Steps to Reproduce: 1. 2. 3. Actual Results: Feb 2 20:47:20 clipper kernel BUG at exit.c:524! Feb 2 20:47:20 clipper invalid operand: 0000 Feb 2 20:47:20 clipper CPU: 0 Feb 2 20:47:20 clipper EIP: 0010:[<80145e83>] Not tainted Feb 2 20:47:20 clipper EFLAGS: 00010282 Feb 2 20:47:20 clipper eax: 00000000 ebx: 8130e200 ecx: 00000000 edx: 00000000 Feb 2 20:47:20 clipper esi: 8e53a000 edi: 8e53a000 ebp: 8e53bee8 esp: 8e53bed0 Feb 2 20:47:20 clipper ds: 0018 es: 0018 ss: 0018 Feb 2 20:47:20 clipper Process slapadd (pid: 0, stackpage=8e53b000) Feb 2 20:47:20 clipper Stack: 8e53a000 813a2e60 8f450f10 8e53a000 00000000 8e53bf2c 8e53bf08 8014cd97 Feb 2 20:47:20 clipper 00000002 00000002 8e53a67c 00000002 00000002 8e53bfc0 8e53bfb8 80129296 Feb 2 20:47:20 clipper 00000002 00000002 8e53bf2c 8e53bf44 8ea091e0 8fa8fad8 8e53a674 00000002 Feb 2 20:47:20 clipper Call Trace: [<8014cd97>] [<80129296>] [<8013d164>] [<8013e8d7>] [<80129524>] Feb 2 20:47:20 clipper Feb 2 20:47:20 clipper Code: 0f 0b 0c 02 05 c0 35 80 e9 00 fe ff ff c7 04 24 01 00 00 00 Feb 2 20:47:20 clipper <0>Kernel panic: Attempted to kill the idle task! Expected Results: Not crashed ;-) This seems to be reproduceable across several different servers, with different hardware. (Dell PowerApp 120, PowerEdge 750, Optiplex ??) Also, this bug is NOT reproduceable when using kernels based on vanilla-sources. Servers we have not yet upgraded are still stable. These are running 2.4.20 and under.
Can you please: a) Try vanilla-sources-2.4.28 b) Run that log trace through the 'ksymoops' utility on the server from where the stack trace was gathered and paste the output.
Unfortunately, I'm no longer running a gentoo-sources kernel on any production servers, except the two that had not been upgraded, and thus have remained stable. Those are running 2.4.20-gentoo-r6 and 2.4.20-gentoo-r31. However, I will try to apropriate a server from the lab and attempt to load test it to get some results. This may take several days to do. As a Gentoo fan, I wouldn't want management to have any reason to go sour on it. As a side note, the production servers have been moved to kernels based on vanilla-sources 2.4.28 and have not crashed yet.
Unfortunately, I have not been able to reproduce the problem on a test server. Also, one of the production servers that I moved to using a vanilla-kernel has been crashing at random. Luckily it is not nearly at the frequency that it used to crash. It used to go down every other day and now it tends to crash once a week or so. The frustrating part is, no stack dump or kernel trace or even error message is left in any of the logs. Since this is a production server it is set to reboot on a kernel panic so I never get to see the message on the console. I plan on trying a few more kernel variations. I'm going to close this bug for now, since I'm beginning to wonder if it may be a hardware issue of some sort.