93866 – Switching to virtual console while using X sometimes freezes system

Bug 93866 - Switching to virtual console while using X sometimes freezes system

Summary: Switching to virtual console while using X sometimes freezes system

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Unspecified (show other bugs)
Hardware:	x86 Linux

Importance:	High critical
Assignee:	Gentoo X packagers

URL:	http://bugs.freedesktop.org/show_bug....
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-05-24 17:03 UTC by Frank T. Lofaro Jr.
Modified:	2005-06-05 18:53 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Frank T. Lofaro Jr. 2005-05-24 17:03:06 UTC

If I switch to a virtual console while using X sometimes the system freezes. It
does not respond to pings, and does not reboot when I set panic on reboot.

Reproducible: Sometimes
Steps to Reproduce:
1. Start X
2. Switch to a text mode virtual console (CTRL-ALT-F#)
3. Keep trying until your system freezes. Likely will take many attempts.
Sometimes you have to go and do other things and try it again.

Actual Results:  
System frozen. Blank text-mode VC displayed (monitor reports a proper signal).
System ignores pings and won't reboot itself even if reboot on panic is enabled.

Expected Results:  
Switch to VC without crash.

All I usually have running is X, xmms, firefox and konsole when this happens.

My friend also experiences freezes when VC switching from X.

He is using an Intel Pentium 4 and is NOT using the ~x86 ACCEPT_KEYWORD.
I am using an Intel Pentium 2 and ARE using the ~x86 ACCEPT_KEYWORD.

My system is very reliable aside from this issue. It will not crash except due
to this bug.

# emerge --info
Portage 2.0.51.22-r1 (default-linux/x86/2005.0, gcc-3.4.3-20050110,
glibc-2.3.5-r0, 2.6.11-gentoo-r9 i686)
=================================================================
System uname: 2.6.11-gentoo-r9 i686 Pentium II (Deschutes)
Gentoo Base System version 1.6.12
dev-lang/python:     2.3.5
sys-apps/sandbox:    1.2.8
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5
sys-devel/binutils:  2.16
sys-devel/libtool:   1.5.18
virtual/os-headers:  2.6.11
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability
-fsignaling-nans -mieee-fp"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env
/usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability
-fsignaling-nans -mieee-fp"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict userpriv"
GENTOO_MIRRORS="ftp://ftp.ussg.iu.edu/pub/linux/gentoo http://gentoo.osuosl.org
http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="x86 X Xaw3d acpi alsa apache2 apm arts avi bash-completion bcmath berkdb
bitmap-fonts bmp bzlib cdparanoia cdr crypt cups curl emacs emacs-w3 emboss
encode examples fam fastcgi fbcon foomaticdb fortran gd gdbm gif glut gpm gtk
gtk2 imagemagick imlib ipv6 jpeg kde lcms libg++ libwww lmsensors mad matrox
mikmod mp3 mpeg ncurses nls nocd offensive ogg oggvorbis opengl oss pam pdflib
perl png postgres python qt quicktime readline sdl spell ssl svga sysvipc tcltk
tcpd tidy tiff truetype truetype-fonts type1-fonts usb vorbis wddx x-face xml2
xmms xv yahoo zlib userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY

Is there anything I can do to help give more info?

There are no messages on the screen, no logs, and it can't be reached from the
Internet (I tried but the bug kills TCP/IP access to it).

Do I need to get a serial console?

Comment 1 Donnie Berkholz (RETIRED) gentoo-dev

2005-05-25 16:50:33 UTC

(In reply to comment #0)
> CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability
> -fsignaling-nans -mieee-fp"

That's cute. Try using flags that aren't so wack.

Also please unmask and try 6.8.99.8 to see whether that fixes your problem.
Which driver are you using?

Comment 2 Frank T. Lofaro Jr. 2005-05-25 19:38:47 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability
> > -fsignaling-nans -mieee-fp"
> 
> That's cute. Try using flags that aren't so wack.

Well it happens to my friend with normal CFLAGS (-O2 and selecting Pentium 4
arch), no ~X86 ACCEPT_KEYWORD and a Pentium 4 CPU, and it happens to me with my
"wack" CFLAGS, ~X86 ACCEPT_KEYWORD and a Pentium 2 CPU, so I think something
other than my "wack" CFLAGS or my setup in general is the reason - I think it is
probably a bit more universal than that.

> 
> Also please unmask and try 6.8.99.8 to see whether that fixes your problem.

I am emerging that now with just -O2 CFLAGS, it'll take a while (see my CPU,
above :) I wanted to get the above and the following info on my driver out there
in the mean time. I'll post again when I test it after it is done.

> Which driver are you using?

Straight from /etc/X11/xorg.conf:

	Identifier  "Card0"
	Driver      "mga"
	VendorName  "Matrox Graphics, Inc."
	BoardName   "MGA G200 AGP"
	BusID       "PCI:1:0:0"

Comment 3 Frank T. Lofaro Jr. 2005-05-25 19:46:58 UTC

Also, the bug completely kills my machine - no network, etc. Shouldn't we
suspect a kernel bug too? Any idea how that happens? And also if there is kernel
involvment why panic=3 doesn't help?

Comment 4 Donnie Berkholz (RETIRED) gentoo-dev

2005-05-25 19:58:50 UTC

No, X directly accesses hardware etc so it can lock up your machine without any
extra help.

Comment 5 Frank T. Lofaro Jr. 2005-05-25 21:15:44 UTC

(In reply to comment #4)
> No, X directly accesses hardware etc so it can lock up your machine without any
> extra help.

Strangely enough, it NEVER, EVER crashed on shutdown of X, only on a VC switch -
doesn't it do the same stuff to the HW (switch back down to text) in each case?

When it crashed it ALWAYS had the display in a text mode with a blank screen and
a good text mode (640x400x70 Hz) signal, so it seemed to get the HW mode restore
done as far as I could tell.

Could there be possibly be some interaction with the kernel VC switch code? 

I emerge'd the version of X you suggested and compiled it with just -O2 and it
seems to be working so far - but the bug has a low reproducibility so I don't
think of myself as out of the woods yet.

Both chvt and ctrl-alt-f# used to crash the system on occasion so I am testing
with both - so far so good.

I am doing provocative testing now - switching to occupied VCs, unoccupied ones,
under no load, under high load, etc. Anything I should be doing to make any bug
express itself?

I'll try to get my friend with the same issue to try this fix also. His solution
was to not switch VCs while in X. :)

Comment 6 Frank T. Lofaro Jr. 2005-05-25 21:29:13 UTC

It just now crashed using your X version compiled with just -O2, during a VC switch.

I run kde, so shutting down X means a lot of stuff has to shutdown, and is
probably generating a lot of disk I/O and memory access (I only have 128M) and
it never crashes then but it does on a VC switch which is a much faster, simpler
case.

Could it be a race condition?

Comment 7 Donnie Berkholz (RETIRED) gentoo-dev

2005-05-25 22:18:42 UTC

It's more likely X not handling the save and restore of its state properly.

Are you using framebuffer consoles? If so, which framebuffer? Try without
framebuffer if you have it on now, or try vesafb.

Comment 8 Frank T. Lofaro Jr. 2005-05-26 08:39:24 UTC

I am using normal text mode consoles. I'll try the vesafb consoles next, any
particular mode I should use?

During the night I had a shell loop running which would switch VC's back and
forth every 5 seconds (also run date and sync and I am running ext3). The
machine did 2056 switches from X to text and didn't crash once. I had swap
disabled for this test. I stopped the shell loop, added swap and let it run
again, it had crashed after 195 switches.

Comment 9 Donnie Berkholz (RETIRED) gentoo-dev

2005-05-26 09:18:08 UTC

No, if you're using normal that should be good. It goes roughly like this in
order of bad ideas:

1) text mode
2) vesafb
3) custom fb

Comment 10 Frank T. Lofaro Jr. 2005-05-27 07:52:37 UTC

Is there a reasonable way to lock X in memory so it can't get swapped out at all?

I can run my stress test on X and see if that helps. It has never crashed when
no swap is configured.

That might help you narrow it down.

Comment 11 Frank T. Lofaro Jr. 2005-05-30 10:38:41 UTC

It crashed one time with swap off, but free memory was very low. I think the
kernel might've throw out pages of X, thinking it can swap them in later. I was
thinking of using mlockall to lock X in memory - but I don't want to have to
recompile all of xorg to do so just for a temporary workaround.

Is there an easier way to do that?

Comment 12 Frank T. Lofaro Jr. 2005-05-31 22:03:59 UTC

I think I'll take this upstream on June 2, this is probably not a gentoo
specific issue anyway.

Comment 13 Joshua Baergen (RETIRED) gentoo-dev

2005-06-04 10:12:05 UTC

Alright, post the URL of the upstream bug here when you do so please.

Comment 14 Frank T. Lofaro Jr. 2005-06-05 16:03:54 UTC

(In reply to comment #13)

Posted bug to upstream (bug 3473 on bugs.freedesktop.org):
http://bugs.freedesktop.org/show_bug.cgi?id=3473
"Complete system lockup on switch from X to text mode virtual console"

Comment 15 Joshua Baergen (RETIRED) gentoo-dev

2005-06-05 18:53:19 UTC

URL added here and marked as such.  You should reference the upstream bug to
this one so that they have everything that has been discussed here.