If I switch to a virtual console while using X sometimes the system freezes. It does not respond to pings, and does not reboot when I set panic on reboot. Reproducible: Sometimes Steps to Reproduce: 1. Start X 2. Switch to a text mode virtual console (CTRL-ALT-F#) 3. Keep trying until your system freezes. Likely will take many attempts. Sometimes you have to go and do other things and try it again. Actual Results: System frozen. Blank text-mode VC displayed (monitor reports a proper signal). System ignores pings and won't reboot itself even if reboot on panic is enabled. Expected Results: Switch to VC without crash. All I usually have running is X, xmms, firefox and konsole when this happens. My friend also experiences freezes when VC switching from X. He is using an Intel Pentium 4 and is NOT using the ~x86 ACCEPT_KEYWORD. I am using an Intel Pentium 2 and ARE using the ~x86 ACCEPT_KEYWORD. My system is very reliable aside from this issue. It will not crash except due to this bug. # emerge --info Portage 2.0.51.22-r1 (default-linux/x86/2005.0, gcc-3.4.3-20050110, glibc-2.3.5-r0, 2.6.11-gentoo-r9 i686) ================================================================= System uname: 2.6.11-gentoo-r9 i686 Pentium II (Deschutes) Gentoo Base System version 1.6.12 dev-lang/python: 2.3.5 sys-apps/sandbox: 1.2.8 sys-devel/autoconf: 2.13, 2.59-r6 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5 sys-devel/binutils: 2.16 sys-devel/libtool: 1.5.18 virtual/os-headers: 2.6.11 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability -fsignaling-nans -mieee-fp" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability -fsignaling-nans -mieee-fp" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks sandbox sfperms strict userpriv" GENTOO_MIRRORS="ftp://ftp.ussg.iu.edu/pub/linux/gentoo http://gentoo.osuosl.org http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" USE="x86 X Xaw3d acpi alsa apache2 apm arts avi bash-completion bcmath berkdb bitmap-fonts bmp bzlib cdparanoia cdr crypt cups curl emacs emacs-w3 emboss encode examples fam fastcgi fbcon foomaticdb fortran gd gdbm gif glut gpm gtk gtk2 imagemagick imlib ipv6 jpeg kde lcms libg++ libwww lmsensors mad matrox mikmod mp3 mpeg ncurses nls nocd offensive ogg oggvorbis opengl oss pam pdflib perl png postgres python qt quicktime readline sdl spell ssl svga sysvipc tcltk tcpd tidy tiff truetype truetype-fonts type1-fonts usb vorbis wddx x-face xml2 xmms xv yahoo zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY Is there anything I can do to help give more info? There are no messages on the screen, no logs, and it can't be reached from the Internet (I tried but the bug kills TCP/IP access to it). Do I need to get a serial console?
(In reply to comment #0) > CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability > -fsignaling-nans -mieee-fp" That's cute. Try using flags that aren't so wack. Also please unmask and try 6.8.99.8 to see whether that fixes your problem. Which driver are you using?
(In reply to comment #1) > (In reply to comment #0) > > CFLAGS="-O9 -march=pentium2 -fomit-frame-pointer -fno-guess-branch-probability > > -fsignaling-nans -mieee-fp" > > That's cute. Try using flags that aren't so wack. Well it happens to my friend with normal CFLAGS (-O2 and selecting Pentium 4 arch), no ~X86 ACCEPT_KEYWORD and a Pentium 4 CPU, and it happens to me with my "wack" CFLAGS, ~X86 ACCEPT_KEYWORD and a Pentium 2 CPU, so I think something other than my "wack" CFLAGS or my setup in general is the reason - I think it is probably a bit more universal than that. > > Also please unmask and try 6.8.99.8 to see whether that fixes your problem. I am emerging that now with just -O2 CFLAGS, it'll take a while (see my CPU, above :) I wanted to get the above and the following info on my driver out there in the mean time. I'll post again when I test it after it is done. > Which driver are you using? Straight from /etc/X11/xorg.conf: Identifier "Card0" Driver "mga" VendorName "Matrox Graphics, Inc." BoardName "MGA G200 AGP" BusID "PCI:1:0:0"
Also, the bug completely kills my machine - no network, etc. Shouldn't we suspect a kernel bug too? Any idea how that happens? And also if there is kernel involvment why panic=3 doesn't help?
No, X directly accesses hardware etc so it can lock up your machine without any extra help.
(In reply to comment #4) > No, X directly accesses hardware etc so it can lock up your machine without any > extra help. Strangely enough, it NEVER, EVER crashed on shutdown of X, only on a VC switch - doesn't it do the same stuff to the HW (switch back down to text) in each case? When it crashed it ALWAYS had the display in a text mode with a blank screen and a good text mode (640x400x70 Hz) signal, so it seemed to get the HW mode restore done as far as I could tell. Could there be possibly be some interaction with the kernel VC switch code? I emerge'd the version of X you suggested and compiled it with just -O2 and it seems to be working so far - but the bug has a low reproducibility so I don't think of myself as out of the woods yet. Both chvt and ctrl-alt-f# used to crash the system on occasion so I am testing with both - so far so good. I am doing provocative testing now - switching to occupied VCs, unoccupied ones, under no load, under high load, etc. Anything I should be doing to make any bug express itself? I'll try to get my friend with the same issue to try this fix also. His solution was to not switch VCs while in X. :)
It just now crashed using your X version compiled with just -O2, during a VC switch. I run kde, so shutting down X means a lot of stuff has to shutdown, and is probably generating a lot of disk I/O and memory access (I only have 128M) and it never crashes then but it does on a VC switch which is a much faster, simpler case. Could it be a race condition?
It's more likely X not handling the save and restore of its state properly. Are you using framebuffer consoles? If so, which framebuffer? Try without framebuffer if you have it on now, or try vesafb.
I am using normal text mode consoles. I'll try the vesafb consoles next, any particular mode I should use? During the night I had a shell loop running which would switch VC's back and forth every 5 seconds (also run date and sync and I am running ext3). The machine did 2056 switches from X to text and didn't crash once. I had swap disabled for this test. I stopped the shell loop, added swap and let it run again, it had crashed after 195 switches.
No, if you're using normal that should be good. It goes roughly like this in order of bad ideas: 1) text mode 2) vesafb 3) custom fb
Is there a reasonable way to lock X in memory so it can't get swapped out at all? I can run my stress test on X and see if that helps. It has never crashed when no swap is configured. That might help you narrow it down.
It crashed one time with swap off, but free memory was very low. I think the kernel might've throw out pages of X, thinking it can swap them in later. I was thinking of using mlockall to lock X in memory - but I don't want to have to recompile all of xorg to do so just for a temporary workaround. Is there an easier way to do that?
I think I'll take this upstream on June 2, this is probably not a gentoo specific issue anyway.
Alright, post the URL of the upstream bug here when you do so please.
(In reply to comment #13) Posted bug to upstream (bug 3473 on bugs.freedesktop.org): http://bugs.freedesktop.org/show_bug.cgi?id=3473 "Complete system lockup on switch from X to text mode virtual console"
URL added here and marked as such. You should reference the upstream bug to this one so that they have everything that has been discussed here.