Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 62343

Summary: [2.4] System hangs with gentoo-sources-2.4.26 when running X and exiting/opening certain applications (like Crossover Office)
Product: Gentoo Linux Reporter: Greg Tassone <gtgentoo>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED NEEDINFO    
Severity: critical CC: eonwe, x11
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Kernel config for gentoo-sources-2.4.26-r9
Kernel .config for 2.4.26 vanilla-sources

Description Greg Tassone 2004-08-30 22:20:30 UTC
Whenever I am running any of the current gentoo-sources kernels (like 2.4.26-r9) my system hangs during certain operations within X.  For example, whenever I launch Crossover Office ("Setup" app or whatever) the system hard-locks during the application load -- everything freezes and you can't even ping the system any more.

This condition also occurs when exiting certain applications, such as Enemy Territory.  In this case the terminal hangs as soon as I exit the game -- but it ran fine up to that point.

I have done extensive troubleshooting of this issue.  It seems to occur with or without the Nvidia kernel drivers installed and/or running.  Different Window Managers don't make any difference.  Older versions of glib and glibc don't seem to matter, either.  The only thing that I know fixes the problem FOR SURE is to run an older kernel, such as 2.4.25-r8.  I am doing this now and cannot get it to error under this configuration.

Reproducible: Always
Steps to Reproduce:
1. Boot up into any current gentoo-sources kernel, such as 2.4.26-r9
2. Go into X (any Window Manager -- doesn't matter)
3. Run certain applications, such as the CrossOver Office setup application (GTK-based app)

Actual Results:  
The system hard-locks, requiring a hard-reset.

Expected Results:  
The application should load and run properly just like it does on any of the
earlier kernels, such as 2.4.25-r8.

------------------------------------------------
Short output from "strace"
------------------------------------------------
I grabbed an strace, but since the system locks I can only hand-copy the stack
at that point.  The last few lines of the strace were:

gettimeofday({1093926389, 235893}, {420, 0}) = 0
select(5, [3 4], [], [], {1, 9166}

After several "gettimeofday" calls in a row, the "select" call stops at the last
point noted.

------------------------------------------------
Output from "emerge info"
------------------------------------------------

Portage 2.0.50-r10 (default-x86-1.4, gcc-3.3.4, glibc-2.3.3.20040420-r1,
2.4.25-gentoo-r8)
=================================================================
System uname: 2.4.25-gentoo-r8 i686 AMD Athlon(tm) MP 2000+
Gentoo Base System version 1.4.16
Autoconf: sys-devel/autoconf-2.59-r4
Automake: sys-devel/automake-1.8.5-r1
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-march=athlon-mp -O2 -pipe -Wall -finline-limit=1200 -falign-functions=32"
CHOST="i686-pc-linux-gnu"
COMPILER=""
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config
/usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref
/usr/share/config /usr/share/texmf/dvipdfm/config/
/usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/
/usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=athlon-mp -O2 -pipe -Wall -finline-limit=1200 -falign-functions=32"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo
http://gentoo.oregonstate.edu http://www.ibiblio.org/gentoo"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="3dnow X aalib acpi aim apm arts avi berkdb cdr crypt cups curl dedicated
divx4linux doc encode esd fbcon foomaticdb gdbm gif gpm gtk gtk2 icq imap imlib
jabber java jce joystick jpeg junit kde kerberos ldap libg++ libwww mad mcal
mikmod mmx motif mozilla mpeg ncurses nls oggvorbis opengl oscar oss pam pda
pdflib perl png python qt quicktime readline samba sasl scanner sdl slang spell
sse ssl svga tcpd tetex theora tiff truetype usb x86 xml2 xmms xv xvid yahoo zlib"
Comment 1 Tim Yamin (RETIRED) gentoo-dev 2004-08-31 03:49:47 UTC
Can you see if you can reproduce this with vanilla-sources-2.4.26 please?
Comment 2 Greg Tassone 2004-08-31 05:10:14 UTC
OK.  I have built vanilla-sources 2.4.26 and completed some testing.  I am NOT able to create/reproduce the problem on this kernel build (with the same config).  Therefore, it appears that the problem is isolated to something within the gentoo-sources kernel patches.

Let me know if I can help in some other way.
Comment 3 Tim Yamin (RETIRED) gentoo-dev 2004-08-31 05:41:12 UTC
Can you attach your kernel configuration please?
Comment 4 Greg Tassone 2004-08-31 23:39:41 UTC
Created attachment 38650 [details]
Kernel config for gentoo-sources-2.4.26-r9

Adding .config for gentoo-sources 2.4.26-r9 (experiences the problem)
Comment 5 Greg Tassone 2004-08-31 23:40:48 UTC
Created attachment 38651 [details]
Kernel .config for 2.4.26 vanilla-sources

Adding .config for vanilla-sources 2.4.26 (does NOT experience the problem)
Comment 6 Tim Yamin (RETIRED) gentoo-dev 2004-09-03 02:43:34 UTC
Hrm, interesting - could you give gentoo-sources-2.4.27 a go and see if that gives you this issue too? If so, then one of the patches is causing this and we can get rid of the problematic one.
Comment 7 Jules Gagnon 2004-09-03 07:35:43 UTC
I think I have a similar problem. Since last week my system hangs (about 2 times a day). Sometimes while I watch mythtv, the system hangs and the sound keep looping and I have to hit reset. But sometimes the system hangs while my screen is in energy economy. In this case, I am able to use alt-sysrq-(K|S|U|B). Sync and umount doesn't work but produce a message about killing the interrupt handler. If it helps, I will try to note more precise messages next time. But my evms-raid5 is too long to sync for more intensive testing.
Comment 8 Greg Tassone 2004-09-04 15:43:15 UTC
OK.  I've tried gentoo-sources-2.4.27-r1 today and it is also having the problem.  My system hard-locks within seconds of running one of the commands I mentioned.

There must have been a patch added somewhere in the 2.4.26 arena that has done this...  If I switch back to the gentoo-sources-2.4.25 series I have no problems no matter how many times I try it.
Comment 9 eyel 2004-09-20 17:50:11 UTC
Please see my comment (#24) for bug #37860
http://bugs.gentoo.org/show_bug.cgi?id=37860#c24
It might have some insight on the subject.

This might seem like a weird question, but does CrossOver Office have any GL utilization?
Comment 10 Jules Gagnon 2004-09-23 09:52:29 UTC
I still don't know if I am having the same bug...

But my system hangs were occuring (most of the time) while flagging commercial with mythtv. 

While doing commercial flagging mythtv is not using the capture card (so it isn't related to the capture driver). It is only reading video file, decoding some frames, looking for black frame and pausing a little before continuing.

It pauses to keep cpu usage low (~30%). Do you know if this behavior can cause problems with the scheduler?

Please tell me if I should open another bug (or report to mythtv instead).
Comment 11 Greg Tassone 2004-09-23 22:19:45 UTC
Well, I'm not sure from your description if you're having the same issue as me or not.  However, if I had to guess as to the cause, it does appear to be a timing or syncing issue in the kernel.

If you want to know if you have the same issue or now, go back to kernel Gentoo-sources 2.4.25-r8 (or 2.4.25-r9 if r8 is no longer in Portage).  I seem to not be having the problem on r9 and I have been using it for quite some time now.
Comment 12 Greg Tassone 2004-09-23 22:31:36 UTC
Regarding comment #9 (sorry for the delay but I've been busy lately):

No, CrossOver does not have any GL utilization.  It is a standard C-style GUI app -- the widgets appear to be either GTK 1 or TCL.

--------------

Regarding your bug posting (#37860):

This could be the same problem, although I've ruled out some of the issues being discussed there.  In particular, my issue doesn't seem to be strictly limited to GL at all -- as I mentioned I can reproduce this with non-GL software.

I've noticed that exiting from Enemy Terriroty (heavy GL game) causes the error at least 60% of the time (when I'm using one of the affected kernels).  Since I can reproduce the error 100% of the time using Crossover (without GL, on an affected kernel version), something else must be causing this.
----------------------

To bolster one of my earlier comments, this really appears to be a timing or syncing issue within the kernel.  I've noticed that when loading Crossover it goes through several loading stages, often with a few noticeable pauses between each stage.  It seems to be at one of these "pause points" that the system hang occurs -- and at the same place every time.
Comment 13 eyel 2004-09-24 00:44:05 UTC
Is the hard-lock reproduceable when the affected kernel is compiled with the "preemptive kernel" option set to N?  I'm curious.
Comment 14 Greg Tassone 2004-09-26 14:29:12 UTC
Nope, it still happens just the same.  I tried a fresh install of gentoo-sources-2.4.26-r9 with the same config as attached to this bug, with the exception of the Preemtible Kernel (which was off).

I actually was able to start the CrossOver office main application one time without locking the system.  However, I ran it a second time and it hard-locked exactly as before.  Either the first run was a fluke or the lack of Preemtible Kernel makes it only fail half the time, etc.
Comment 15 Jules Gagnon 2004-10-07 06:42:31 UTC
This weekend I compiled 2.4.25-r9, it seems more stable than 2.4.26. In three days I got one hang. 

With 2.4.26 it's difficult to get more than 12 hours uptime.

Maybe this hang was unrelated, I will continue using 2.4.25 and see what happen.

I also noticed something which may or may not be related, see Bug #66643
Comment 16 Greg Tassone 2004-10-13 22:19:06 UTC
Re: #15:
It doesn't appear that the bug you referenced is related to my issue.  First, I don't use RAID at all (straight IDE only on these systems).  Secondly, I have left the system in this condition for a LONG time (I forget exactly how long), and it never seems to recover.
Comment 17 Jules Gagnon 2004-10-14 04:55:45 UTC
I was just saying that the two bugs may be related. In the sense that the two may be caused by a problem with the scheduler. So fixing one will probably fix the other.
Comment 18 Jules Gagnon 2004-10-25 07:40:54 UTC
Is there any other tests we could do to help locate the source of this problem?
Comment 19 Greg Tassone 2004-12-09 02:40:28 UTC
Bump...

I just wanted to add my recent testing results.  I have tested the recent gentoo-sources kernels (like gentoo-sources-r13) to no avail.  The problem still exists and is immediately reproduceable for me.

Any thoughts on what kernel patches might be causing this?  If you give me some ideas I can hack the builds and remove the patches and do some test builds for you.
Comment 20 Tim Yamin (RETIRED) gentoo-dev 2005-02-11 15:20:53 UTC
Want to try gentoo-sources-2.4.28?
Comment 21 Greg Tassone 2005-02-13 18:20:25 UTC
OK, I've tested 2.4.28-gentoo-r5.

At first I thought the tests were going very well... I couldn't get it to fail.  Then, at about the 6th repeat, the whole system hard-locked as usual.

I'm not sure why it didn't fail the first time.  Each time it worked I performed various tasks within the application (Crossover office, in this case) just to test for any problems -- I didn't find any.

However, once it locked the system it locked everything.  I let it sit for several minutes in an unresponsive state.  I tested network connectivity from a separate machine and it wouldn't even ping (which normally works fine, of course).
Comment 22 Tim Yamin (RETIRED) gentoo-dev 2005-02-14 01:05:03 UTC
Ok, when it hard locks do the indicator lights on your keyboard flash? Either way, if you could get a serial console enabled I'd imagine the kernel should dump something there; that should help track down what patch is causing this.
Comment 23 Greg Tassone 2005-02-15 01:13:35 UTC
Sorry, but no lights were blinking.  The system was completely frozen in the exact state it is normally in.  None of the services appeared to be functioning... network interfaces were completely dead, and all input controls that I have at my disposal were abosolutely unresponsive.

I don't have any further equipment to debug further.  If you have any ideas and if I can attempt them with relative ease, I'd be happy to try.

However, my time and resources are a bit limited, so no promises.
Comment 24 Tim Yamin (RETIRED) gentoo-dev 2005-04-28 10:05:36 UTC
Ok, this might be related to the scheduler. Can you give ck-sources-2.4 a go and see if that has the same issue?
Comment 25 Greg Tassone 2005-04-28 10:43:40 UTC
Sorry, but I no longer have my x86 hardware.  I am using amd64 now on all systems that had been having the problem (I upgraded hardware).

Can anyone else test it?  Jules?
Comment 26 Jules Gagnon 2005-04-28 11:38:35 UTC
I am no longer using 2.4 and since I know no reliable way to produce this bug everytime, I am not very inclined to use a ck-sources for a long time. Moreover does ck-sources includes EVMS patches?

Oh.. and there are bugs when I try to emerge ck-sources-2.4.28-r3 (digest and patch error). Do you want me to file a bug report on that?

In all case, I won't be able to do any real test for the next 3 weeks.
Comment 27 Tim Yamin (RETIRED) gentoo-dev 2005-07-20 08:15:08 UTC
Closing bug - if you are able to test gentoo-sources-2.4.31 and see if the issue
still exists then please reopen this bug. Thanks!