Whenever I am running any of the current gentoo-sources kernels (like 2.4.26-r9) my system hangs during certain operations within X. For example, whenever I launch Crossover Office ("Setup" app or whatever) the system hard-locks during the application load -- everything freezes and you can't even ping the system any more. This condition also occurs when exiting certain applications, such as Enemy Territory. In this case the terminal hangs as soon as I exit the game -- but it ran fine up to that point. I have done extensive troubleshooting of this issue. It seems to occur with or without the Nvidia kernel drivers installed and/or running. Different Window Managers don't make any difference. Older versions of glib and glibc don't seem to matter, either. The only thing that I know fixes the problem FOR SURE is to run an older kernel, such as 2.4.25-r8. I am doing this now and cannot get it to error under this configuration. Reproducible: Always Steps to Reproduce: 1. Boot up into any current gentoo-sources kernel, such as 2.4.26-r9 2. Go into X (any Window Manager -- doesn't matter) 3. Run certain applications, such as the CrossOver Office setup application (GTK-based app) Actual Results: The system hard-locks, requiring a hard-reset. Expected Results: The application should load and run properly just like it does on any of the earlier kernels, such as 2.4.25-r8. ------------------------------------------------ Short output from "strace" ------------------------------------------------ I grabbed an strace, but since the system locks I can only hand-copy the stack at that point. The last few lines of the strace were: gettimeofday({1093926389, 235893}, {420, 0}) = 0 select(5, [3 4], [], [], {1, 9166} After several "gettimeofday" calls in a row, the "select" call stops at the last point noted. ------------------------------------------------ Output from "emerge info" ------------------------------------------------ Portage 2.0.50-r10 (default-x86-1.4, gcc-3.3.4, glibc-2.3.3.20040420-r1, 2.4.25-gentoo-r8) ================================================================= System uname: 2.4.25-gentoo-r8 i686 AMD Athlon(tm) MP 2000+ Gentoo Base System version 1.4.16 Autoconf: sys-devel/autoconf-2.59-r4 Automake: sys-devel/automake-1.8.5-r1 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-march=athlon-mp -O2 -pipe -Wall -finline-limit=1200 -falign-functions=32" CHOST="i686-pc-linux-gnu" COMPILER="" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=athlon-mp -O2 -pipe -Wall -finline-limit=1200 -falign-functions=32" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo http://gentoo.oregonstate.edu http://www.ibiblio.org/gentoo" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" USE="3dnow X aalib acpi aim apm arts avi berkdb cdr crypt cups curl dedicated divx4linux doc encode esd fbcon foomaticdb gdbm gif gpm gtk gtk2 icq imap imlib jabber java jce joystick jpeg junit kde kerberos ldap libg++ libwww mad mcal mikmod mmx motif mozilla mpeg ncurses nls oggvorbis opengl oscar oss pam pda pdflib perl png python qt quicktime readline samba sasl scanner sdl slang spell sse ssl svga tcpd tetex theora tiff truetype usb x86 xml2 xmms xv xvid yahoo zlib"
Can you see if you can reproduce this with vanilla-sources-2.4.26 please?
OK. I have built vanilla-sources 2.4.26 and completed some testing. I am NOT able to create/reproduce the problem on this kernel build (with the same config). Therefore, it appears that the problem is isolated to something within the gentoo-sources kernel patches. Let me know if I can help in some other way.
Can you attach your kernel configuration please?
Created attachment 38650 [details] Kernel config for gentoo-sources-2.4.26-r9 Adding .config for gentoo-sources 2.4.26-r9 (experiences the problem)
Created attachment 38651 [details] Kernel .config for 2.4.26 vanilla-sources Adding .config for vanilla-sources 2.4.26 (does NOT experience the problem)
Hrm, interesting - could you give gentoo-sources-2.4.27 a go and see if that gives you this issue too? If so, then one of the patches is causing this and we can get rid of the problematic one.
I think I have a similar problem. Since last week my system hangs (about 2 times a day). Sometimes while I watch mythtv, the system hangs and the sound keep looping and I have to hit reset. But sometimes the system hangs while my screen is in energy economy. In this case, I am able to use alt-sysrq-(K|S|U|B). Sync and umount doesn't work but produce a message about killing the interrupt handler. If it helps, I will try to note more precise messages next time. But my evms-raid5 is too long to sync for more intensive testing.
OK. I've tried gentoo-sources-2.4.27-r1 today and it is also having the problem. My system hard-locks within seconds of running one of the commands I mentioned. There must have been a patch added somewhere in the 2.4.26 arena that has done this... If I switch back to the gentoo-sources-2.4.25 series I have no problems no matter how many times I try it.
Please see my comment (#24) for bug #37860 http://bugs.gentoo.org/show_bug.cgi?id=37860#c24 It might have some insight on the subject. This might seem like a weird question, but does CrossOver Office have any GL utilization?
I still don't know if I am having the same bug... But my system hangs were occuring (most of the time) while flagging commercial with mythtv. While doing commercial flagging mythtv is not using the capture card (so it isn't related to the capture driver). It is only reading video file, decoding some frames, looking for black frame and pausing a little before continuing. It pauses to keep cpu usage low (~30%). Do you know if this behavior can cause problems with the scheduler? Please tell me if I should open another bug (or report to mythtv instead).
Well, I'm not sure from your description if you're having the same issue as me or not. However, if I had to guess as to the cause, it does appear to be a timing or syncing issue in the kernel. If you want to know if you have the same issue or now, go back to kernel Gentoo-sources 2.4.25-r8 (or 2.4.25-r9 if r8 is no longer in Portage). I seem to not be having the problem on r9 and I have been using it for quite some time now.
Regarding comment #9 (sorry for the delay but I've been busy lately): No, CrossOver does not have any GL utilization. It is a standard C-style GUI app -- the widgets appear to be either GTK 1 or TCL. -------------- Regarding your bug posting (#37860): This could be the same problem, although I've ruled out some of the issues being discussed there. In particular, my issue doesn't seem to be strictly limited to GL at all -- as I mentioned I can reproduce this with non-GL software. I've noticed that exiting from Enemy Terriroty (heavy GL game) causes the error at least 60% of the time (when I'm using one of the affected kernels). Since I can reproduce the error 100% of the time using Crossover (without GL, on an affected kernel version), something else must be causing this. ---------------------- To bolster one of my earlier comments, this really appears to be a timing or syncing issue within the kernel. I've noticed that when loading Crossover it goes through several loading stages, often with a few noticeable pauses between each stage. It seems to be at one of these "pause points" that the system hang occurs -- and at the same place every time.
Is the hard-lock reproduceable when the affected kernel is compiled with the "preemptive kernel" option set to N? I'm curious.
Nope, it still happens just the same. I tried a fresh install of gentoo-sources-2.4.26-r9 with the same config as attached to this bug, with the exception of the Preemtible Kernel (which was off). I actually was able to start the CrossOver office main application one time without locking the system. However, I ran it a second time and it hard-locked exactly as before. Either the first run was a fluke or the lack of Preemtible Kernel makes it only fail half the time, etc.
This weekend I compiled 2.4.25-r9, it seems more stable than 2.4.26. In three days I got one hang. With 2.4.26 it's difficult to get more than 12 hours uptime. Maybe this hang was unrelated, I will continue using 2.4.25 and see what happen. I also noticed something which may or may not be related, see Bug #66643
Re: #15: It doesn't appear that the bug you referenced is related to my issue. First, I don't use RAID at all (straight IDE only on these systems). Secondly, I have left the system in this condition for a LONG time (I forget exactly how long), and it never seems to recover.
I was just saying that the two bugs may be related. In the sense that the two may be caused by a problem with the scheduler. So fixing one will probably fix the other.
Is there any other tests we could do to help locate the source of this problem?
Bump... I just wanted to add my recent testing results. I have tested the recent gentoo-sources kernels (like gentoo-sources-r13) to no avail. The problem still exists and is immediately reproduceable for me. Any thoughts on what kernel patches might be causing this? If you give me some ideas I can hack the builds and remove the patches and do some test builds for you.
Want to try gentoo-sources-2.4.28?
OK, I've tested 2.4.28-gentoo-r5. At first I thought the tests were going very well... I couldn't get it to fail. Then, at about the 6th repeat, the whole system hard-locked as usual. I'm not sure why it didn't fail the first time. Each time it worked I performed various tasks within the application (Crossover office, in this case) just to test for any problems -- I didn't find any. However, once it locked the system it locked everything. I let it sit for several minutes in an unresponsive state. I tested network connectivity from a separate machine and it wouldn't even ping (which normally works fine, of course).
Ok, when it hard locks do the indicator lights on your keyboard flash? Either way, if you could get a serial console enabled I'd imagine the kernel should dump something there; that should help track down what patch is causing this.
Sorry, but no lights were blinking. The system was completely frozen in the exact state it is normally in. None of the services appeared to be functioning... network interfaces were completely dead, and all input controls that I have at my disposal were abosolutely unresponsive. I don't have any further equipment to debug further. If you have any ideas and if I can attempt them with relative ease, I'd be happy to try. However, my time and resources are a bit limited, so no promises.
Ok, this might be related to the scheduler. Can you give ck-sources-2.4 a go and see if that has the same issue?
Sorry, but I no longer have my x86 hardware. I am using amd64 now on all systems that had been having the problem (I upgraded hardware). Can anyone else test it? Jules?
I am no longer using 2.4 and since I know no reliable way to produce this bug everytime, I am not very inclined to use a ck-sources for a long time. Moreover does ck-sources includes EVMS patches? Oh.. and there are bugs when I try to emerge ck-sources-2.4.28-r3 (digest and patch error). Do you want me to file a bug report on that? In all case, I won't be able to do any real test for the next 3 weeks.
Closing bug - if you are able to test gentoo-sources-2.4.31 and see if the issue still exists then please reopen this bug. Thanks!