I have been testing all the different permutations with this problem for the past few days. I have tried both with and without FrameBuffer, with and without Video Mode Selection Support, both versions of the drivers that work with 2.6.x kernels, all with the same result. I can make X freeze every time. Steps to reproduce: 1. log into X 2. press ctrl-alt-f1 3. log in to TTY as root 4. emerge any package (i used monkey-bubble for testing as it is short but not too short to compile) 5. press ctrl-alt-f7 6. press ctrl-alt-f1 7. repeat steps 5 & 6 repeatedly until X freezes after pressing ctrl-alt-f1 It usually takes 3-5 switches to get the freeze... With FrameBuffer, it takes less time, usually about 3. There are no errors in any logs. Hardware: 1.8 GHz Pentium4, 1.25GB RAM, NVidia GeForce FX 5600 There is nothing special in my configuration. I do have NPTL enabled in my glibc but I have also recompiled glibc without NPTL and tested just as thoroughly with the same results. I am not sure if this is a NVidia problem, Kernel problem, or XFree problem. I do know that I did not notice/get these lockups with the previous XFree builds before the latest bugfix builds that came out. Please let me know if you require any further information or if you think I should report this to NVidia and/or Kernl devs.
I'm going to ask the question before the dev's do .. which version of xfree are you using .. can you give the 'emerge info' output as well which 2.6 kernel version .. I've seen intermittent lock-ups with X recently as well but I'm using xfree-4.3.99.902-r2 .. a 2.6.3 - win4lin patched kernel and nvidia-{kernel glx}-5336-r1 .. I've observed X running at 99% processor usage (after ssh'ing into the locked box ..with xscreensaver running on the locked terminal .. ssh'ing in from another box and killing the one X screen session eating the CPU cycles restores normal activity and gdm restarts the Xserver automaticly and I can log in again .. It's one of those 'confounded intermittents' but if you can reproduce .. that's real handy .. means somehow a handle can be found but more info is needed first
[ ddicks@linuxbox ~ ] $ emerge info Portage 2.0.50-r1 (default-x86-1.4, gcc-3.3.2, glibc-2.3.2-r9, 2.6.3) ================================================================= System uname: 2.6.3 i686 Intel(R) Pentium(R) 4 CPU 1.80GHz Gentoo Base System version 1.4.3.13 distcc 2.12.1 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] Autoconf: sys-devel/autoconf-2.58-r1 Automake: sys-devel/automake-1.8.2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -mfpmath=sse -msse -msse2 -mmmx -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -mfpmath=sse -msse -msse2 -mmmx -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://mirror.cpsc.ucalgary.ca/mirror/gentoo.org" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.ca.gentoo.org/gentoo-portage" USE="X aalib alsa avi berkdb bonobo cdr crypt cups dvd encode esd foomaticdb gdbm gif gimpprint gnome gstreamer gtk gtk2 gtkhtml imagemagick imap imlib java javascript joystick jpeg ldap libg++ libwww mad mikmod mmx motif mozilla mpeg mpeg4 ncurses nls nptl oggvorbis opengl oss pam pdflib perl png ppds python quicktime readline samba sdl slang spell sse ssl svga tcltk tcpd tiff truetype usb v4l x86 xml2 xmms xv zlib" * x11-base/xfree Latest version available: 4.3.0-r5 Latest version installed: 4.3.0-r5 Size of downloaded files: 54,146 kB Homepage: http://www.xfree.org Description: Xfree86: famous and free X server License: X11 MSttfEULA * media-video/nvidia-glx Latest version available: 1.0.5336-r1 Latest version installed: 1.0.5336-r1 Size of downloaded files: 6,661 kB Homepage: http://www.nvidia.com/ Description: XFree86 GLX libraries for the NVIDIA's X driver License: NVIDIA * media-video/nvidia-kernel Latest version available: 1.0.5336-r1 Latest version installed: 1.0.5336-r1 Size of downloaded files: 6,661 kB Homepage: http://www.nvidia.com/ Description: Linux kernel module for the NVIDIA's X driver License: NVIDIA *** This lockup happens with all versions of the nvidia drivers taht I have tried with 2.6.x kernel.
I have Kernel 2.6.3 Forgot to add that.
Created attachment 26378 [details] a new ebuild and the latest minion.de patch I've been having a similar problem. I noticed today that the most recent version of 4496 ebuild, "nvidia-kernel-1.0.4496-r4.ebuild", is using a fairly dated www.minion.de patch. Today I hacked the ebuild to use the most recent www.minion.de patch for 4496 and have not seen any hangs since. I suspect we could do a new ebuild (r5) using the latest minion.de patch, fairly easily. This really needs to be done since the newer versions of the nvidia driver are known to have dpms problems. I've attached a tgz file containing my hacked ebuild and the minion.de patch, though I'm not quite up to speed on making ebuilds
No such luck,still getting the lock-up with the newer patch. One other note though, I only began noticing these problems after upgrading from xfree-4.3.0-r4 to xfree-4.3.0-r5, and I have seen this with both 2.4.21-ac and 2.6.3-gentoo-r2 kernels. Also, I don't have any lock-ups with the 'nv' driver, though this costs me real dpms support (screen blanks, but does not power off).
Yes, I think that this all started for me after the latest XFree update as well... (-r5) I just saw your post about the patch and that it didn't work so I'm going to leave mine as as and just not switch to a TTY until somoene fixes this :)
Probably related: I started getting X server crashes (sometimes lockups) after upgrading from xfree-4.3.0-r3 to -r5, with nvidia-kernel-1.0.5328-r1.
I have the same problem on a 2.4.22-gentoo-7 with xfree-4.3.0-r5, nvidia-glx-1.0.5336-r1 and nvidia-kernel-1.0.5336-r1. I found another way to lockup the system: I used a GL screensaver (Euphoria). When I entered the password to unlock, X began to display the windows and the system locked up in the middle. Perhaps does the problem come from nvidia-glx ?
Oh yes, if it can help: -Shuttle SN41G2 system (nforce2 / geforce 4 mx) -Athlon XP 2800+ -no vesa bootsplash :-) # emerge info Portage 2.0.50-r1 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.4.22-gentoo-r7) ================================================================= System uname: 2.4.22-gentoo-r7 i686 AMD Athlon(tm) XP 2800+ Gentoo Base System version 1.4.3.13p1 distcc 2.12.1 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] Autoconf: sys-devel/autoconf-2.59-r3 Automake: sys-devel/automake-1.8.2 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-march=athlon-xp -O3 -pipe -fforce-addr -fomit-frame-pointer -fprefetch-loop-arrays -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -mmmx -msse -m3dnow -mfpmath=387,sse -momit-leaf-frame-pointer" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-march=athlon-xp -O3 -pipe -fforce-addr -fomit-frame-pointer -fprefetch-loop-arrays -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -mmmx -msse -m3dnow -mfpmath=387,sse -momit-leaf-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://gentoo.oregonstate.edu http://distro.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X aalib acl acpi alsa arts avi berkdb bonobo canna cdr cjk crypt cups curl dga directfb doc dvb dvd encode fbcon foomaticdb freewnn gdbm ggi gif gphoto2 gstreamer gtk guile imap imlib jack java joystick jpeg kde libg++ linguas_ar linguas_de linguas_en linguas_fr linguas_hu linguas_jp linguas_ro linguas_ru linguas_sp lirc mad maildir mbox mmx motif mozilla mpeg mysql nas ncurses nls oggvorbis opengl oss pam pdflib perl png postgres prelude python qt quicktime readline samba scanner sdl slang slp spell sse ssl svga tcltk tcpd tetex tiff truetype unicode usb vim-with-x wmf x86 xinerama xml xml2 xmms xv zlib"
Is there any chance we could find an archived version of the -r4 ebuild to confirm whether this is really a problem with -r5 and nvidia, or whether this is just a coincidence?
www.gentoo.org/cgi-bin/viewcvs.cgi/media-video/ Check the Attic in nvidia-kernel and nvidia-glx.
I switched to Linux-2.6.3-wolk1.0 _with_ framebuffer and have not had a lockup yet (2-3 days). Still using 5336-r1 nvidia driver.
well I spoke too soon. I was able to make it lock up by using the same method as before.
I retrieved the old xfree-4.3.0-r4 ebuild and rolled back to that version. I'm still having problems with both the 2.4 and 2.6 kernels, but I'm fairly sure the problems started at about the time the xfree-4.3.0-r5 ebuild came out. I suspect now it could be a gcc or binutils bug that just hadn't shown up till we recompiled xfree. I guess now it might make sense to either roll back further or try a binary install of xfree.
I reverted to 4.3.0-r4 and had no crash ever since.
Okay, this is just weird - but kind of not. I did something today to my configuration and now I can no longer replicate the bug. I removed devfs from my configuration and emerged udev per the instructions on how to switch over that are available in the Gentoo forums. I'd like to see if anyone else who is thinking of dropping devfs sees the same _good_ side-effect.
Just another me too... 2.4.25 kernel, xfree 4.3.0-r5, nvidia 1.0.4496, gcc-3.3.2-r5, glibc-2.3.2-r9, etc, etc... Except for me it appears to happen more often when VMWare is running. In fact I can't say that I've noticed the problem when VMWare wasn't running. Let me know if more information on the system/config will be helpful.
I should probably also mention that it only freezes with a moderate to high CPU / disk IO load... If the box stays idle or relatively so, it doesn't seem to freeze. I've also noticed that a number of processes appear in disk/IO wait (status D in ps) when X freezes. So I'll have X in 'DL' status, VMWare, gcc/c++ and perhaps a shell are all status D. Although not everything is stuck there, for instance I can still log in most of the time. The disk is *not* thrashing out of control rather it's mostly idle.
I rolled back the gcc, binutils.and glibc and recompiled the kernel, xfree, and nvidia and still had problems. Most recently I removed driverloader (www.linuxant.com) and have been using a orinoco pccard for wireless access. Since doing so I have not had any lockups... So my next plan is to try to roll things foreward again and see if I have any problems... Anyway, I'm curious if anyone else having problems also has an unusual config (i.e. driverloader)... The driverloader list doesn't have anyone else reporting problems so this could still be a combination type of problem, so I'm still hesitant to point any fingers. My guess at the moment is that it may be some kind of weird PCI interrupt problem -- what is to blame is still a question.
Whats happening guys? How did the rolling forward go?
Still have lockups, but they're infrequent enough that I'm just living with it. I've switched from driverloader to ndiswrapper and upgraded to the 5336-r1 nvidia driver. Also, I'm using a 2.6.5-gentoo kernel... Anyway, no real change...
I have not had a lockup since i got rid of devfsd.
Still happening for me too, but mine are relatively frequent. I needed to reboot the VM about 4 times in 8 hours today... 2.4.25. Will try with 2.4.26 after compilation completes.
I'll agree with Dale, im using udev and havent experienced one lockup with it _yet_ but ive been going for a month or so now... However, for those of you that are still having difficulties, have you tried various different AGP drivers, eg using the kernel to control AGP, using nvidia-kernel drivers to control AGP. Do the results vary? (See info on NvAGP option in XF86Config)
I've had this problem ever since upgrading to the 2.6.x kernel series. I've tried a bunch of versions of X and am now using xorg-x11. Right now I'm doing two things that are interesting. 1) I tried switching back off the nvidia-kernel. Switched Driver to "nv" instead of nvidia in xorg.conf and then ran "opengl-update xorg-x11". The interesting thing is that now it crashes every time I try to boot up gnome. Specifically when there is a lot if Disk I/O. That is not using the nvidia stuff as far as I can tell and it crashes way more often. The effect is the same though, I can ssh in and see X taking 99% cpu, kill, reboot etc. 2) I switched to udev and it does the exact same thing. I've switched back to nvidia while still using udev. I hope things are stable like others have suggested... Time will tell.
OK I just ran some tests from my machine here... I copied about 500M worth to an NFS share, then back again. I moved the 500M worth to the NFS share, then back again... From the command prompt no problems what so ever, from Xorg runnning under the "nv" driver no problems what so ever. But the biggest pain is when using "nvidia" the copying of the files to the server works fine, copying them back causes a lockup, moving in either direction caused a lockup. Anyone else see this?
I see the same type of behavior. X seems to lock when some disk activity starts. It doesn't have to be as much as 500M and it is quite random. By the way. After switching to udev and back to nvidia-kernel. It just happened again-running since my last post. So.. switching to udev did NOT solve the problem for me.
I recompiled my kernel without agpgart support and then changed my xorg.conf file to use: Option "NvAGP" "1" # - use nvidia's agp drivers I used the system about 1/2 a day and then had another lockup. Before, I had been using the agpgart drivers in the kernel.
Just got a X lookup and oops while my comp was idle (just running xscreensaver). Hope this will help. From /var/log/syslog: May 4 04:39:54 tux Badness in pci_find_subsys at drivers/pci/search.c:167 May 4 04:39:54 tux Call Trace: May 4 04:39:54 tux [<c0275610>] pci_find_subsys+0x110/0x120 May 4 04:39:54 tux [<c027564f>] pci_find_device+0x2f/0x40 May 4 04:39:54 tux [<c0275408>] pci_find_slot+0x28/0x50 May 4 04:39:54 tux [<e0bcb37f>] os_pci_init_handle+0x35/0x62 [nvidia] May 4 04:39:54 tux [<e0be51ff>] _nv001243rm+0x1f/0x24 [nvidia] May 4 04:39:54 tux [<e0d2bab5>] _nv000816rm+0x2f5/0x384 [nvidia] May 4 04:39:54 tux [<e0c942cc>] _nv003801rm+0xd8/0x100 [nvidia] May 4 04:39:54 tux [<e0d2b5ef>] _nv000809rm+0x2f/0x34 [nvidia] May 4 04:39:54 tux [<e0cc34e8>] _nv003606rm+0xe4/0x114 [nvidia] May 4 04:39:54 tux [<e0cc2ee3>] _nv003564rm+0x513/0x908 [nvidia] May 4 04:39:54 tux [<e0bfdc07>] _nv004046rm+0x3a3/0x3b0 [nvidia] May 4 04:39:54 tux [<e0cff4a3>] _nv001476rm+0x1d3/0x45c [nvidia] May 4 04:39:54 tux [<e0be7d3a>] _nv000896rm+0x4a/0x64 [nvidia] May 4 04:39:54 tux [<e0be9554>] rm_isr_bh+0xc/0x10 [nvidia] May 4 04:39:54 tux [<e0bc8afa>] nv_kern_isr_bh+0x11/0x15 [nvidia] May 4 04:39:54 tux [<c0125f35>] tasklet_action+0x65/0xc0 May 4 04:39:54 tux [<c0125c87>] do_softirq+0xc7/0xd0 May 4 04:39:54 tux [<c0109d38>] do_IRQ+0x138/0x190 May 4 04:39:54 tux [<c0107ee8>] common_interrupt+0x18/0x20 May 4 04:39:54 tux May 4 04:39:54 tux Badness in pci_find_subsys at drivers/pci/search.c:167 May 4 04:39:54 tux Call Trace: May 4 04:39:54 tux [<c0275610>] pci_find_subsys+0x110/0x120 May 4 04:39:54 tux [<c027564f>] pci_find_device+0x2f/0x40 May 4 04:39:54 tux [<c0275408>] pci_find_slot+0x28/0x50 May 4 04:39:54 tux [<e0bcb37f>] os_pci_init_handle+0x35/0x62 [nvidia] May 4 04:39:54 tux [<e0cb1fff>] _nv001613rm+0x6f/0x7c [nvidia] May 4 04:39:54 tux [<e0be51ff>] _nv001243rm+0x1f/0x24 [nvidia] May 4 04:39:54 tux [<e0c963fd>] _nv003797rm+0xa9/0x128 [nvidia] May 4 04:39:54 tux [<e0d02e41>] _nv001490rm+0x55/0xe4 [nvidia] May 4 04:39:54 tux [<e0d2baf4>] _nv000816rm+0x334/0x384 [nvidia] May 4 04:39:54 tux [<e0c942cc>] _nv003801rm+0xd8/0x100 [nvidia] May 4 04:39:54 tux [<e0d2b5ef>] _nv000809rm+0x2f/0x34 [nvidia] May 4 04:39:54 tux [<e0cc34e8>] _nv003606rm+0xe4/0x114 [nvidia] May 4 04:39:54 tux [<e0cc2ee3>] _nv003564rm+0x513/0x908 [nvidia] May 4 04:39:54 tux [<e0bfdc07>] _nv004046rm+0x3a3/0x3b0 [nvidia] May 4 04:39:54 tux [<e0cff4a3>] _nv001476rm+0x1d3/0x45c [nvidia] May 4 04:39:54 tux [<e0be7d3a>] _nv000896rm+0x4a/0x64 [nvidia] May 4 04:39:54 tux [<e0be9554>] rm_isr_bh+0xc/0x10 [nvidia] May 4 04:39:54 tux [<e0bc8afa>] nv_kern_isr_bh+0x11/0x15 [nvidia] May 4 04:39:54 tux [<c0125f35>] tasklet_action+0x65/0xc0 May 4 04:39:54 tux [<c0125c87>] do_softirq+0xc7/0xd0 May 4 04:39:54 tux [<c0109d38>] do_IRQ+0x138/0x190 May 4 04:39:54 tux [<c0107ee8>] common_interrupt+0x18/0x20 # lspci 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 0000:00:09.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 0000:00:09.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 0000:00:09.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51) 0000:00:0c.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 43) 0000:00:0d.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 0000:00:0d.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 0000:00:0e.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a) 0000:00:0e.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a) 0000:00:0f.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 01) 0000:00:10.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02) 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge 0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 0000:00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 0000:00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 0000:01:00.0 VGA compatible controller: nVidia Corporation NV15 [GeForce2 GTS/Pro] (rev a4) # emerge -pv nvidia-kernel nvidia-glx xfree [ebuild R ] media-video/nvidia-kernel-1.0.5336-r2 0 kB [ebuild R ] media-video/nvidia-glx-1.0.5336-r2 0 kB [ebuild R ] x11-base/xfree-4.3.0-r5 -3dfx +3dnow -bindist -cjk -debug -doc -ipv6 +mmx +nls +pam -sdk +sse -static +truetype +xml2 16,984 kB # emerge info Portage 2.0.50-r6 (default-x86-2004.0, gcc-3.3.3, glibc-2.3.3_pre20040420-r0, 2.6.5-gentoo-r1) ================================================================= System uname: 2.6.5-gentoo-r1 i686 AMD Athlon(TM) XP 1800+ Gentoo Base System version 1.4.10 distcc 2.14 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] Autoconf: sys-devel/autoconf-2.59-r3 Automake: sys-devel/automake-1.8.3 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-mcpu=athlon-xp -O2 -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-mcpu=athlon-xp -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="3dnow X alsa apm avi berkdb crypt cups dvd encode foomaticdb gdbm gif gnome gtk gtk2 imlib java jpeg kde libg++ libwww mad mikmod mmx motif mpeg ncurses nls oggvorbis opengl oss pam pdflib perl png python qt quicktime readline sdl slang spell sse ssl svga tcltk tcpd tetex truetype x86 xml xml2 xmms xv zlib"
Please test nvidia-kernel-5336-r3 (preferably with 2.6.6 kernel, but not essential) and report back please.
I upgraded to 2.6.6 development-sources and nvidia 5336-r3 Problem is same as it was before. All I have to do to get it to lock up is run glxgears. Of course, it still locks up other times too.
I noticed in comment #29 above part of the syslog. My syslog had the same line: --snip-- May 11 16:01:30 [kernel] Badness in pci_find_subsys at drivers/pci/search.c:167 --snip-- But I didn't have all the other info.
I have tried to get this to happen aging. But it looks like it was a onetimer. I will reply again if I get it once more. Thanks for you're support!
Specifically if you are experiencing similar problems to whats shown in comment #29 try out the following, but if you are having lock ups try it anyway. Please download http://dev.gentoo.org/~cyfred/nvidia-kernel-1.0.5336-r3.tar.bz2 and extract to your $PORTDIR_OVERLAY/media-video/ directory. Then try the ebuild provided in this tarball. It patches the kernel api driver so that proper kernel function calls are actually used not deprecated ones... I have notified nvidia of this will see what they say.
I used the tar-ball from comment #34 and just got this: May 24 21:58:49 azoff kernel: Badness in pci_find_subsys at drivers/pci/search.c:167 May 24 21:58:49 azoff kernel: Call Trace: May 24 21:58:49 azoff kernel: [<c0275610>] pci_find_subsys+0x110/0x120 May 24 21:58:49 azoff kernel: [<c027564f>] pci_find_device+0x2f/0x40 May 24 21:58:49 azoff kernel: [<c0275408>] pci_find_slot+0x28/0x50 May 24 21:58:49 azoff kernel: [<e0e963f2>] os_pci_init_handle+0x39/0x68 [nvidia] May 24 21:58:49 azoff kernel: [<e0d2a85f>] _nv001243rm+0x1f/0x24 [nvidia] May 24 21:58:49 azoff kernel: [<e0e71115>] _nv000816rm+0x2f5/0x384 [nvidia] May 24 21:58:49 azoff kernel: [<e0dd992c>] _nv003801rm+0xd8/0x100 [nvidia] May 24 21:58:49 azoff kernel: [<e0e70c4f>] _nv000809rm+0x2f/0x34 [nvidia] May 24 21:58:49 azoff kernel: [<e0e08b48>] _nv003606rm+0xe4/0x114 [nvidia] May 24 21:58:49 azoff kernel: [<e0e087f9>] _nv003564rm+0x7c9/0x908 [nvidia] May 24 21:58:49 azoff kernel: [<e0d43267>] _nv004046rm+0x3a3/0x3b0 [nvidia] May 24 21:58:49 azoff kernel: [<e0e44b03>] _nv001476rm+0x1d3/0x45c [nvidia] May 24 21:58:49 azoff kernel: [<e0d2d39a>] _nv000896rm+0x4a/0x64 [nvidia] May 24 21:58:49 azoff kernel: [<e0d2ebb4>] rm_isr_bh+0xc/0x10 [nvidia] May 24 21:58:49 azoff kernel: [<e0e93b0b>] nv_kern_isr_bh+0xf/0x13 [nvidia] May 24 21:58:49 azoff kernel: [<c0125f35>] tasklet_action+0x65/0xc0 May 24 21:58:49 azoff kernel: [<c0125c87>] do_softirq+0xc7/0xd0 May 24 21:58:49 azoff kernel: [<c0109d38>] do_IRQ+0x138/0x190 May 24 21:58:49 azoff kernel: [<c0107ee8>] common_interrupt+0x18/0x20 May 24 21:58:49 azoff kernel: May 24 21:58:49 azoff kernel: Badness in pci_find_subsys at drivers/pci/search.c:167 May 24 21:58:49 azoff kernel: Call Trace: May 24 21:58:49 azoff kernel: [<c0275610>] pci_find_subsys+0x110/0x120 May 24 21:58:49 azoff kernel: [<c027564f>] pci_find_device+0x2f/0x40 May 24 21:58:49 azoff kernel: [<c0275408>] pci_find_slot+0x28/0x50 May 24 21:58:49 azoff kernel: [<e0e963f2>] os_pci_init_handle+0x39/0x68 [nvidia] May 24 21:58:49 azoff kernel: [<e0d2a85f>] _nv001243rm+0x1f/0x24 [nvidia] May 24 21:58:49 azoff kernel: [<e0ddba5d>] _nv003797rm+0xa9/0x128 [nvidia] May 24 21:58:49 azoff kernel: [<e0e484a1>] _nv001490rm+0x55/0xe4 [nvidia] May 24 21:58:49 azoff kernel: [<e0e71154>] _nv000816rm+0x334/0x384 [nvidia] May 24 21:58:49 azoff kernel: [<e0dd992c>] _nv003801rm+0xd8/0x100 [nvidia] May 24 21:58:49 azoff kernel: [<e0e70c4f>] _nv000809rm+0x2f/0x34 [nvidia] May 24 21:58:49 azoff kernel: [<e0e08b48>] _nv003606rm+0xe4/0x114 [nvidia] May 24 21:58:49 azoff kernel: [<e0e087f9>] _nv003564rm+0x7c9/0x908 [nvidia] May 24 21:58:49 azoff kernel: [<e0d43267>] _nv004046rm+0x3a3/0x3b0 [nvidia] May 24 21:58:49 azoff kernel: [<e0e44b03>] _nv001476rm+0x1d3/0x45c [nvidia] May 24 21:58:49 azoff kernel: [<e0d2d39a>] _nv000896rm+0x4a/0x64 [nvidia] May 24 21:58:49 azoff kernel: [<e0d2ebb4>] rm_isr_bh+0xc/0x10 [nvidia] May 24 21:58:49 azoff kernel: [<e0e93b0b>] nv_kern_isr_bh+0xf/0x13 [nvidia] May 24 21:58:49 azoff kernel: [<c0125f35>] tasklet_action+0x65/0xc0 May 24 21:58:49 azoff kernel: [<c0125c87>] do_softirq+0xc7/0xd0 May 24 21:58:49 azoff kernel: [<c0109d38>] do_IRQ+0x138/0x190 May 24 21:58:49 azoff kernel: [<c0107ee8>] common_interrupt+0x18/0x20 May 24 21:58:49 azoff kernel: It's the same setting and the same machine as my last post. Can it maybe be the gentoo(2.6.5-gentoo-r1) kernel that's fscking the nvidia mod? The uptime is 20days and 13h. If I get this once again in this kernel I will try the vanilla source. Thanks for working on gentoo!
Yep thats nvidia adding a depend... im sorry guys but Im waiting from word from nvidia on scheduling control in there drivers. @ Torbjorn : what chipset do you have for your AGP bus? Post the output of lspci aswell please.
I just received a lockup today while VMWare was open and I was emerging wine. I've since jumped to XOrg-X11 so we'll see what tomorrow brings.
I am using agpgart in the kernel. This lockup came when I hade some aterms running and tvtime and was compiling. I have done this _many_ times without any lockups. Heres the lspci: # /sbin/lspci 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 0000:00:09.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 0000:00:09.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 0000:00:09.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51) 0000:00:0c.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 43) 0000:00:0d.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 0000:00:0d.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 0000:00:0e.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a) 0000:00:0e.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a) 0000:00:0f.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 01) 0000:00:10.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02) 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge 0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 0000:00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 0000:00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 0000:01:00.0 VGA compatible controller: nVidia Corporation NV15 [GeForce2 GTS/Pro] (rev a4) # uname -a Linux tux 2.6.5-gentoo-r1 #2 SMP Wed Apr 28 19:06:48 CEST 2004 i686 AMD Athlon(TM) XP 1800+ AuthenticAMD GNU/Linux # X -version This is a pre-release version of XFree86, and is not supported in any way. Bugs may be reported to XFree86@XFree86.Org and patches submitted to fixes@XFree86.Org. Before reporting bugs in pre-release versions, please check the latest version in the XFree86 CVS repository (http://www.XFree86.Org/cvs). XFree86 Version 4.3.0.1 Release Date: 15 August 2003 X Protocol Version 11, Revision 0, Release 6.6 Build Operating System: Linux 2.6.5-gentoo-r1 i686 [ELF] Build Date: 28 April 2004 Before reporting problems, check http://www.XFree86.Org/ to make sure that you have the latest version. Module Loader present * media-video/nvidia-kernel Latest version available: 1.0.5336-r3 Latest version installed: 1.0.5336-r3 Size of downloaded files: 6,661 kB Homepage: http://www.nvidia.com/ Description: Linux kernel module for the NVIDIA's X driver I can't find anything else that you need to know. Tell me if so and I will put it here :)
Bug 51524 mentions needing a large PSU to stop instability in the AGP bus, Torbjorn what is the current output power of your PSU?
I don't think thats my problem. It's a GFS2 card (2xAGP) and I don't think it needs that much power. It stands 340W on it, but how do I get the current? :)
@Torbjorn : you're running SMP though aren't you? Power consumption in that case is going to be "more" still GFS2 does sort of make me think that it might not be the whole story... Can you post : lscpi -xxx -s 0:0 (you'll need to be root)....
I did write wrong, it's a 4xAGP port. Anyway, here the output comes.. If there is anything else you need, just tell me :) # lspci -xxx -s 0:0 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 00: 06 11 99 30 06 00 30 22 00 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 7f 80 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 18 88 80 82 45 01 00 18 24 88 80 82 44 00 00 50: 16 f4 69 ea 20 05 20 20 e0 ee 10 20 20 20 20 20 60: aa aa 00 a0 e6 99 c0 1e 68 ed 54 70 c1 68 00 00 70: 80 c8 00 01 00 01 10 00 01 00 00 00 00 00 00 03 80: 0f 00 00 00 c0 00 00 00 02 00 70 1d 00 00 00 00 90: 16 f4 69 ea 07 1c f1 0b 21 ff 00 00 21 23 74 00 a0: 02 c0 20 00 07 02 00 1f 04 01 00 00 2f 08 04 9a b0: 7f 9a 18 00 40 00 00 00 80 00 00 00 00 00 00 84 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 44 99 99 99 00 44 00 00 f0: 00 20 00 00 00 94 94 00 00 00 00 00 00 00 00 00
Torbjorn: Try a large PSU if at all possible.. The above looks normal though so im thinking its either hardware or the actual drivers, of which only one can be tested... :) BTW theres a new -r4 driver in the tree which would be worth testing out aswell.
Theres new drivers out 6106 see bug #55714.
This is something in nvidias code so we cant do much about it Summarily 1) Use the newest 6106 drivers as they at least dont oops 2) Bug nvidia about it 3) Try a larger PSU