34360 – init process locks itself up

Bug 34360 - init process locks itself up

Summary: init process locks itself up

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	High major
Assignee:	Martin Schlemmer (RETIRED)

URL:
Whiteboard:
Keywords:

Duplicates (1):	34361 (view as bug list)
Depends on:
Blocks:

Reported:	2003-11-25 13:56 UTC by Rafal Rzepecki
Modified:	2004-10-15 23:52 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rafal Rzepecki 2003-11-25 13:56:25 UTC

Recently I have seen strange sysvinit behaviour on my system. Sometimes init just locks up, using all CPU available. Thanks to various gentoo-sources realtime kernel enhancements (thanks guys!) it does not lock up the machine completely. This allows me to give you the top line showing init process statistics (it is locked up right now):
1 root 25 0 8244 7328 4 R 69.8 2.9 157:35.31 init

I have to mention, that this is a complete lockup; it doesn't respond to telinit U, and examining /proc/1/fd/10 shows that it even doesn't read from fifo at all. It looks like an infinite loop or a deadlock of some kind. I wanted to debug it, but gdb said that it's unable to attach to its process. I also have no idea how to debug it actually. So I would appreciate if someone would try to debug it, but maybe the problem is specific to my system; any hints on how to debug the init process would be helpful, anyway.
I haven't been able to find any condition that triggers this behaviour; it seems as if it's completely random.
I'm going to reboot now and then look at the sysvinit source code; maybe I can spot where the lockup is possible, or find a way to add some detailed logging code.

If anybody wants to help and would like any additional information, I will be happy to extract it at next lockup and provide it here.

My baselayout version was 1.8.6.10-r1; right now I have updated to 1.8.6.12 (not rebooted yet), although I don't think it will make any difference, as I don't see any init specific changes in changelog.
My machine is an x86 (Coppermine), running Linux 2.4.20-gentoo-r5 kernel.

Comment 1 Marius Mauch (RETIRED) gentoo-dev

2003-11-25 14:00:44 UTC

*** Bug 34361 has been marked as a duplicate of this bug. ***

Comment 2 Martin Schlemmer (RETIRED) gentoo-dev

2003-11-26 11:51:37 UTC

Please try another (vanilla maybe?) kernel.  Also, please try to do:

 # rm -rf /lib/dev-state/*

Comment 3 Rafal Rzepecki 2003-11-26 12:12:36 UTC

I'll try the vanilla kernel. 

I also did the rm, but I doubt it would give anything, as initctl was not present in /lib/dev-state.

How would the kernel make any difference? Are you just guessing, or do you have a specific possible cause to this problem in mind?

Anyway, it will be hard to spot whether a change made any difference, because the lockups seem to be completely random; for example, my current uptime is over 20 hours now (no lockup); the other time it had locked after only few hours of uptime, and another time after a few days.

Comment 4 Tim Yamin (RETIRED) gentoo-dev

2003-11-26 15:14:53 UTC

We're suggesting a possible kernel problem as this looks like a possible memory mapping bug [ quite unlikely ] or more likely is that your system has memory leaks: in these situations, init usually starts scrambling for every last byte it can find if something is taking up the memory [ or the memory is failing to allocate due to, for example, hardware failure ] and it would use 100% of the CPU as a result.

Can I suggest that you run 'memtest86' to find see if any problems come up with your RAM? Another good test is to run some intensive compiling...

Comment 5 SpanKY gentoo-dev

2004-10-15 23:52:12 UTC

try a newer kernel