42870 – NVIDIA Drivers (4496 and 5336-r1) cause lockups with XFree86

Bug 42870 - NVIDIA Drivers (4496 and 5336-r1) cause lockups with XFree86

Summary: NVIDIA Drivers (4496 and 5336-r1) cause lockups with XFree86

Status:	RESOLVED CANTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All All

Importance:	High normal
Assignee:	Gentoo X packagers

URL:
Whiteboard:
Keywords:

Depends on:	51524
Blocks:
	Show dependency tree

Reported:	2004-02-25 05:03 UTC by Dale K Dicks
Modified:	2004-07-20 19:20 UTC (History)
CC List:	3 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
a new ebuild and the latest minion.de patch (nvidiar5.tgz,29.48 KB, application/x-compressed-tar) 2004-02-25 21:15 UTC, Tod Morrison	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dale K Dicks 2004-02-25 05:03:01 UTC

I have been testing all the different permutations with this problem for the past few days.

I have tried both with and without FrameBuffer, with and without Video Mode Selection Support, both versions of the drivers that work with 2.6.x kernels, all with the same result.

I can make X freeze every time.

Steps to reproduce:

1. log into X
2. press ctrl-alt-f1
3. log in to TTY as root
4. emerge any package (i used monkey-bubble for testing as it is short but not too short to compile)
5. press ctrl-alt-f7
6. press ctrl-alt-f1
7. repeat steps 5 & 6 repeatedly until X freezes after pressing ctrl-alt-f1

It usually takes 3-5 switches to get the freeze...  With FrameBuffer, it takes less time, usually about 3.

There are no errors in any logs.

Hardware: 1.8 GHz Pentium4, 1.25GB RAM, NVidia GeForce FX 5600

There is nothing special in my configuration.  I do have NPTL enabled in my glibc but I have also recompiled glibc without NPTL and tested just as thoroughly with the same results.

I am not sure if this is a NVidia problem, Kernel problem, or XFree problem.  I do know that I did not notice/get these lockups with the previous XFree builds before the latest bugfix builds that came out.

Please let me know if you require any further information or if you think I should report this to NVidia and/or Kernl devs.

Comment 1 Derk W te Bokkel 2004-02-25 10:53:24 UTC

I'm going to ask the question before the dev's do .. which version of xfree are you using .. can you give the 'emerge info' output as well
which 2.6 kernel version ..

I've seen intermittent lock-ups with X recently as well but I'm using xfree-4.3.99.902-r2 .. a 2.6.3 - win4lin patched kernel and nvidia-{kernel glx}-5336-r1  .. I've observed X running at 99% processor usage (after ssh'ing into the locked box ..with xscreensaver running on the locked terminal .. ssh'ing in from another box and killing the one X screen session eating the CPU cycles restores normal activity and gdm restarts the Xserver automaticly and I can log in again ..

It's one of those 'confounded intermittents' but if you can reproduce .. that's real handy .. means somehow a handle can be found but more info is needed first

Comment 2 Dale K Dicks 2004-02-25 11:03:09 UTC

[ ddicks@linuxbox ~ ] $ emerge info
Portage 2.0.50-r1 (default-x86-1.4, gcc-3.3.2, glibc-2.3.2-r9, 2.6.3)
=================================================================
System uname: 2.6.3 i686 Intel(R) Pentium(R) 4 CPU 1.80GHz
Gentoo Base System version 1.4.3.13
distcc 2.12.1 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
Autoconf: sys-devel/autoconf-2.58-r1
Automake: sys-devel/automake-1.8.2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -mfpmath=sse -msse -msse2 -mmmx -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
CXXFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -mfpmath=sse -msse -msse2 -mmmx -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://mirror.cpsc.ucalgary.ca/mirror/gentoo.org"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.ca.gentoo.org/gentoo-portage"
USE="X aalib alsa avi berkdb bonobo cdr crypt cups dvd encode esd foomaticdb gdbm gif gimpprint gnome gstreamer gtk gtk2 gtkhtml imagemagick imap imlib java javascript joystick jpeg ldap libg++ libwww mad mikmod mmx motif mozilla mpeg mpeg4 ncurses nls nptl oggvorbis opengl oss pam pdflib perl png ppds python quicktime readline samba sdl slang spell sse ssl svga tcltk tcpd tiff truetype usb v4l x86 xml2 xmms xv zlib"

*  x11-base/xfree
      Latest version available: 4.3.0-r5
      Latest version installed: 4.3.0-r5
      Size of downloaded files: 54,146 kB
      Homepage:    http://www.xfree.org
      Description: Xfree86: famous and free X server
      License:     X11 MSttfEULA


*  media-video/nvidia-glx
      Latest version available: 1.0.5336-r1
      Latest version installed: 1.0.5336-r1
      Size of downloaded files: 6,661 kB
      Homepage:    http://www.nvidia.com/
      Description: XFree86 GLX libraries for the NVIDIA's X driver
      License:     NVIDIA


*  media-video/nvidia-kernel
      Latest version available: 1.0.5336-r1
      Latest version installed: 1.0.5336-r1
      Size of downloaded files: 6,661 kB
      Homepage:    http://www.nvidia.com/
      Description: Linux kernel module for the NVIDIA's X driver
      License:     NVIDIA


*** This lockup happens with all versions of the nvidia drivers taht I have tried with 2.6.x kernel.

Comment 3 Dale K Dicks 2004-02-25 11:03:47 UTC

I have Kernel 2.6.3

Forgot to add that.

Comment 4 Tod Morrison 2004-02-25 21:15:17 UTC

Created attachment 26378 [details]
a new ebuild and the latest minion.de patch

I've been having a similar problem. I noticed today that the most recent
version of 4496 ebuild, "nvidia-kernel-1.0.4496-r4.ebuild", is using a fairly
dated www.minion.de patch. Today I hacked the ebuild to use the most recent
www.minion.de patch for 4496 and have not seen any hangs since. I suspect we
could do a new ebuild (r5) using the latest minion.de patch, fairly easily.
This really needs to be done since the newer versions of the nvidia driver are
known to have dpms problems. 

I've attached a tgz file containing my hacked ebuild and the minion.de patch,
though I'm not quite up to speed on making ebuilds

Comment 5 Tod Morrison 2004-02-25 23:18:01 UTC

No such luck,still getting the lock-up with the newer patch. One other note though, I only began noticing these problems after upgrading from xfree-4.3.0-r4 to xfree-4.3.0-r5, and I have seen this with both 2.4.21-ac and 2.6.3-gentoo-r2 kernels. Also, I don't have any lock-ups with the 'nv' driver, though this costs me real dpms support (screen blanks, but does not power off).

Comment 6 Dale K Dicks 2004-02-26 04:15:37 UTC

Yes, I think that this all started for me after the latest XFree update as well... (-r5)

I just saw your post about the patch and that it didn't work so I'm going to leave mine as as and just not switch to a TTY until somoene fixes this :)

Comment 7 REMOVED ACCOUNT 2004-02-28 14:50:23 UTC

Probably related: I started getting X server crashes (sometimes lockups) after upgrading from xfree-4.3.0-r3 to -r5, with nvidia-kernel-1.0.5328-r1.

Comment 8 Greisberger Christophe 2004-02-28 17:09:15 UTC

I have the same problem on a 2.4.22-gentoo-7 with xfree-4.3.0-r5, nvidia-glx-1.0.5336-r1 and nvidia-kernel-1.0.5336-r1.

I found another way to lockup the system: I used a GL screensaver (Euphoria).
When I entered the password to unlock, X began to display the windows and the system locked up in the middle.
Perhaps does the problem come from nvidia-glx ?

Comment 9 Greisberger Christophe 2004-02-28 17:13:21 UTC

Oh yes, if it can help:

-Shuttle SN41G2 system (nforce2 / geforce 4 mx)
-Athlon XP 2800+
-no vesa bootsplash :-)

# emerge info
Portage 2.0.50-r1 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.4.22-gentoo-r7)
=================================================================
System uname: 2.4.22-gentoo-r7 i686 AMD Athlon(tm) XP 2800+
Gentoo Base System version 1.4.3.13p1
distcc 2.12.1 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-march=athlon-xp -O3 -pipe -fforce-addr -fomit-frame-pointer -fprefetch-loop-arrays -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -mmmx -msse -m3dnow -mfpmath=387,sse -momit-leaf-frame-pointer"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
CXXFLAGS="-march=athlon-xp -O3 -pipe -fforce-addr -fomit-frame-pointer -fprefetch-loop-arrays -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -mmmx -msse -m3dnow -mfpmath=387,sse -momit-leaf-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://gentoo.oregonstate.edu http://distro.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X aalib acl acpi alsa arts avi berkdb bonobo canna cdr cjk crypt cups curl dga directfb doc dvb dvd encode fbcon foomaticdb freewnn gdbm ggi gif gphoto2 gstreamer gtk guile imap imlib jack java joystick jpeg kde libg++ linguas_ar linguas_de linguas_en linguas_fr linguas_hu linguas_jp linguas_ro linguas_ru linguas_sp lirc mad maildir mbox mmx motif mozilla mpeg mysql nas ncurses nls oggvorbis opengl oss pam pdflib perl png postgres prelude python qt quicktime readline samba scanner sdl slang slp spell sse ssl svga tcltk tcpd tetex tiff truetype unicode usb vim-with-x wmf x86 xinerama xml xml2 xmms xv zlib"

Comment 10 Tod Morrison 2004-02-28 20:14:44 UTC

Is there any chance we could find an archived version of the -r4 ebuild to confirm whether this is really a problem with -r5 and nvidia, or whether this is just a coincidence?

Comment 11 Donnie Berkholz (RETIRED) gentoo-dev

2004-02-28 21:42:37 UTC

www.gentoo.org/cgi-bin/viewcvs.cgi/media-video/

Check the Attic in nvidia-kernel and nvidia-glx.

Comment 12 Dale K Dicks 2004-02-29 03:57:10 UTC

I switched to Linux-2.6.3-wolk1.0 _with_ framebuffer and have not had a lockup yet  (2-3 days).  Still using 5336-r1 nvidia driver.

Comment 13 Dale K Dicks 2004-02-29 07:36:12 UTC

well I spoke too soon.  I was able to make it lock up by using the same method as before.

Comment 14 Tod Morrison 2004-03-03 15:33:20 UTC

I retrieved the old xfree-4.3.0-r4 ebuild and rolled back to that version. I'm still having problems with both the 2.4 and 2.6 kernels, but I'm fairly sure the problems started at about the time the xfree-4.3.0-r5 ebuild came out. I suspect now it could be a gcc or binutils bug that just hadn't shown up till we recompiled xfree. I guess now it might make sense to either roll back further or try a binary install of xfree.

Comment 15 REMOVED ACCOUNT 2004-03-05 06:56:55 UTC

I reverted to 4.3.0-r4 and had no crash ever since.

Comment 16 Dale K Dicks 2004-03-06 16:26:33 UTC

Okay, this is just weird - but kind of not.

I did something today to my configuration and now I can no longer replicate the bug.

I removed devfs from my configuration and emerged udev per the instructions on how to switch over that are available in the Gentoo forums.

I'd like to see if anyone else who is thinking of dropping devfs sees the same _good_ side-effect.

Comment 17 Paul Kronenwetter 2004-03-08 06:28:32 UTC

Just another me too...  2.4.25 kernel, xfree 4.3.0-r5, nvidia 1.0.4496, gcc-3.3.2-r5, glibc-2.3.2-r9, etc, etc...

Except for me it appears to happen more often when VMWare is running.  In fact I can't say that I've noticed the problem when VMWare wasn't running.  Let me know if more information on the system/config will be helpful.

Comment 18 Paul Kronenwetter 2004-03-08 09:16:01 UTC

I should probably also mention that it only freezes with a moderate to high CPU / disk IO load...  If the box stays idle or relatively so, it doesn't seem to freeze.  

I've also noticed that a number of processes appear in disk/IO wait (status D in ps) when X freezes.  So I'll have X in 'DL' status, VMWare, gcc/c++ and perhaps a shell are all status D.  

Although not everything is stuck there, for instance I can still log in most of the time.  The disk is *not* thrashing out of control rather it's mostly idle.

Comment 19 Tod Morrison 2004-03-09 05:36:37 UTC

I rolled back the gcc, binutils.and glibc and recompiled the kernel, xfree, and nvidia and still had problems. Most recently I removed driverloader (www.linuxant.com) and have been using a orinoco pccard for wireless access. Since  doing so I have not had any lockups... So my next plan is to try to roll things foreward again and see if I have any problems...

Anyway, I'm curious if anyone else having problems also has an unusual config (i.e. driverloader)... The driverloader list doesn't have anyone else reporting problems so this could still be a combination type of problem, so I'm still hesitant to point any fingers. My guess at the moment is that it may be some kind of weird PCI interrupt problem -- what is to blame is still a question.

Comment 20 Andrew Bevitt 2004-04-15 08:06:05 UTC

Whats happening guys? How did the rolling forward go?

Comment 21 Tod Morrison 2004-04-15 17:20:08 UTC

Still have lockups, but they're infrequent enough that I'm just living with it. I've switched from driverloader to ndiswrapper and upgraded to the 5336-r1 nvidia driver. Also, I'm using a 2.6.5-gentoo kernel...

Anyway, no real change...

Comment 22 Dale K Dicks 2004-04-15 17:38:08 UTC

I have not had a lockup since i got rid of devfsd.

Comment 23 Paul Kronenwetter 2004-04-15 17:42:37 UTC

Still happening for me too, but mine are relatively frequent.  I needed to reboot the VM about 4 times in 8 hours today...  2.4.25.  Will try with 2.4.26 after compilation completes.

Comment 24 Andrew Bevitt 2004-04-15 18:16:41 UTC

I'll agree with Dale, im using udev and havent experienced one lockup with it _yet_ but ive been going for a month or so now...

However, for those of you that are still having difficulties, have you tried various different AGP drivers, eg using the kernel to control AGP, using nvidia-kernel drivers to control AGP. Do the results vary? (See info on NvAGP option in XF86Config)

Comment 25 Dennis Muhlestein 2004-04-27 13:19:54 UTC

I've had this problem ever since upgrading to the 2.6.x kernel series.  I've tried a bunch of versions of X and am now using xorg-x11.

Right now I'm doing two things that are interesting.

1) I tried switching back off the nvidia-kernel.  Switched Driver to "nv" instead of nvidia in xorg.conf and then ran "opengl-update xorg-x11".  The interesting thing is that now it crashes every time I try to boot up gnome.  Specifically when there is a lot if Disk I/O.  That is not using the nvidia stuff as far as I can tell and it crashes way more often.  The effect is the same though, I can ssh in and see X taking 99% cpu, kill, reboot etc.

2) I switched to udev and it does the exact same thing.

I've switched back to nvidia while still using udev.  I hope things are stable like others have suggested... Time will tell.

Comment 26 Andrew Bevitt 2004-04-27 16:51:54 UTC

OK I just ran some tests from my machine here...

I copied about 500M worth to an NFS share, then back again.
I moved the 500M worth to the NFS share, then back again...

From the command prompt no problems what so ever, from Xorg runnning under the "nv" driver no problems what so ever. But the biggest pain is when using "nvidia" the copying of the files to the server works fine, copying them back causes a lockup, moving in either direction caused a lockup.

Anyone else see this?

Comment 27 Dennis Muhlestein 2004-04-29 12:40:17 UTC

I see the same type of behavior.  X seems to lock when some disk activity starts.  It doesn't have to be as much as 500M and it is quite random.

By the way.  After switching to udev and back to nvidia-kernel.  It just happened again-running since my last post.

So.. switching to udev did NOT solve the problem for me.

Comment 28 Dennis Muhlestein 2004-05-04 08:06:32 UTC

I recompiled my kernel without agpgart support and then
changed my xorg.conf file to use:

Option "NvAGP" "1" # - use nvidia's agp drivers

I used the system about 1/2 a day and then had another lockup.

Before, I had been using the agpgart drivers in the kernel.

Comment 29 Torbjörn Svensson 2004-05-05 11:57:05 UTC

Just got a X lookup and oops while my comp was idle (just running xscreensaver). 
Hope this will help. 

From /var/log/syslog:
May  4 04:39:54 tux Badness in pci_find_subsys at drivers/pci/search.c:167
May  4 04:39:54 tux Call Trace:
May  4 04:39:54 tux [<c0275610>] pci_find_subsys+0x110/0x120
May  4 04:39:54 tux [<c027564f>] pci_find_device+0x2f/0x40
May  4 04:39:54 tux [<c0275408>] pci_find_slot+0x28/0x50
May  4 04:39:54 tux [<e0bcb37f>] os_pci_init_handle+0x35/0x62 [nvidia]
May  4 04:39:54 tux [<e0be51ff>] _nv001243rm+0x1f/0x24 [nvidia]
May  4 04:39:54 tux [<e0d2bab5>] _nv000816rm+0x2f5/0x384 [nvidia]
May  4 04:39:54 tux [<e0c942cc>] _nv003801rm+0xd8/0x100 [nvidia]
May  4 04:39:54 tux [<e0d2b5ef>] _nv000809rm+0x2f/0x34 [nvidia]
May  4 04:39:54 tux [<e0cc34e8>] _nv003606rm+0xe4/0x114 [nvidia]
May  4 04:39:54 tux [<e0cc2ee3>] _nv003564rm+0x513/0x908 [nvidia]
May  4 04:39:54 tux [<e0bfdc07>] _nv004046rm+0x3a3/0x3b0 [nvidia]
May  4 04:39:54 tux [<e0cff4a3>] _nv001476rm+0x1d3/0x45c [nvidia]
May  4 04:39:54 tux [<e0be7d3a>] _nv000896rm+0x4a/0x64 [nvidia]
May  4 04:39:54 tux [<e0be9554>] rm_isr_bh+0xc/0x10 [nvidia]
May  4 04:39:54 tux [<e0bc8afa>] nv_kern_isr_bh+0x11/0x15 [nvidia]
May  4 04:39:54 tux [<c0125f35>] tasklet_action+0x65/0xc0
May  4 04:39:54 tux [<c0125c87>] do_softirq+0xc7/0xd0
May  4 04:39:54 tux [<c0109d38>] do_IRQ+0x138/0x190
May  4 04:39:54 tux [<c0107ee8>] common_interrupt+0x18/0x20
May  4 04:39:54 tux
May  4 04:39:54 tux Badness in pci_find_subsys at drivers/pci/search.c:167
May  4 04:39:54 tux Call Trace:
May  4 04:39:54 tux [<c0275610>] pci_find_subsys+0x110/0x120
May  4 04:39:54 tux [<c027564f>] pci_find_device+0x2f/0x40
May  4 04:39:54 tux [<c0275408>] pci_find_slot+0x28/0x50
May  4 04:39:54 tux [<e0bcb37f>] os_pci_init_handle+0x35/0x62 [nvidia]
May  4 04:39:54 tux [<e0cb1fff>] _nv001613rm+0x6f/0x7c [nvidia]
May  4 04:39:54 tux [<e0be51ff>] _nv001243rm+0x1f/0x24 [nvidia]
May  4 04:39:54 tux [<e0c963fd>] _nv003797rm+0xa9/0x128 [nvidia]
May  4 04:39:54 tux [<e0d02e41>] _nv001490rm+0x55/0xe4 [nvidia]
May  4 04:39:54 tux [<e0d2baf4>] _nv000816rm+0x334/0x384 [nvidia]
May  4 04:39:54 tux [<e0c942cc>] _nv003801rm+0xd8/0x100 [nvidia]
May  4 04:39:54 tux [<e0d2b5ef>] _nv000809rm+0x2f/0x34 [nvidia]
May  4 04:39:54 tux [<e0cc34e8>] _nv003606rm+0xe4/0x114 [nvidia]
May  4 04:39:54 tux [<e0cc2ee3>] _nv003564rm+0x513/0x908 [nvidia]
May  4 04:39:54 tux [<e0bfdc07>] _nv004046rm+0x3a3/0x3b0 [nvidia]
May  4 04:39:54 tux [<e0cff4a3>] _nv001476rm+0x1d3/0x45c [nvidia]
May  4 04:39:54 tux [<e0be7d3a>] _nv000896rm+0x4a/0x64 [nvidia]
May  4 04:39:54 tux [<e0be9554>] rm_isr_bh+0xc/0x10 [nvidia]
May  4 04:39:54 tux [<e0bc8afa>] nv_kern_isr_bh+0x11/0x15 [nvidia]
May  4 04:39:54 tux [<c0125f35>] tasklet_action+0x65/0xc0
May  4 04:39:54 tux [<c0125c87>] do_softirq+0xc7/0xd0
May  4 04:39:54 tux [<c0109d38>] do_IRQ+0x138/0x190
May  4 04:39:54 tux [<c0107ee8>] common_interrupt+0x18/0x20 


# lspci
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
0000:00:09.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
0000:00:09.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
0000:00:09.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51)
0000:00:0c.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 43)
0000:00:0d.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
0000:00:0d.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
0000:00:0e.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
0000:00:0e.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
0000:00:0f.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 01)
0000:00:10.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
0000:00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV15 [GeForce2 GTS/Pro] (rev a4)


# emerge -pv nvidia-kernel nvidia-glx xfree
[ebuild   R   ] media-video/nvidia-kernel-1.0.5336-r2   0 kB 
[ebuild   R   ] media-video/nvidia-glx-1.0.5336-r2   0 kB 
[ebuild   R   ] x11-base/xfree-4.3.0-r5  -3dfx +3dnow -bindist -cjk -debug -doc -ipv6 +mmx +nls +pam -sdk +sse -static +truetype +xml2  16,984 kB


# emerge info
Portage 2.0.50-r6 (default-x86-2004.0, gcc-3.3.3, glibc-2.3.3_pre20040420-r0, 2.6.5-gentoo-r1)
=================================================================
System uname: 2.6.5-gentoo-r1 i686 AMD Athlon(TM) XP 1800+
Gentoo Base System version 1.4.10
distcc 2.14 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.3
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-mcpu=athlon-xp -O2 -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-mcpu=athlon-xp -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow X alsa apm avi berkdb crypt cups dvd encode foomaticdb gdbm gif gnome gtk gtk2 imlib java jpeg kde libg++ libwww mad mikmod mmx motif mpeg ncurses nls oggvorbis opengl oss pam pdflib perl png python qt quicktime readline sdl slang spell sse ssl svga tcltk tcpd tetex truetype x86 xml xml2 xmms xv zlib"

Comment 30 Andrew Bevitt 2004-05-10 20:44:50 UTC

Please test nvidia-kernel-5336-r3 (preferably with 2.6.6 kernel, but not essential) and report back please.

Comment 31 Dennis Muhlestein 2004-05-11 21:21:52 UTC

I upgraded to 2.6.6 development-sources and nvidia 5336-r3

Problem is same as it was before.  All I have to do to get it to lock up is run glxgears.  Of course, it still locks up other times too.

Comment 32 Dennis Muhlestein 2004-05-11 21:26:16 UTC

I noticed in comment #29 above part of the syslog.

My syslog had the same line:

--snip--
May 11 16:01:30 [kernel] Badness in pci_find_subsys at drivers/pci/search.c:167
--snip--

But I didn't have all the other info.

Comment 33 Torbjörn Svensson 2004-05-12 00:35:26 UTC

I have tried to get this to happen aging. But it looks like it was a onetimer. I will reply again if I get it once more.
Thanks for you're support!

Comment 34 Andrew Bevitt 2004-05-21 08:21:24 UTC

Specifically if you are experiencing similar problems to whats shown in comment #29 try out the following, but if you are having lock ups try it anyway.

Please download http://dev.gentoo.org/~cyfred/nvidia-kernel-1.0.5336-r3.tar.bz2 and extract to your $PORTDIR_OVERLAY/media-video/ directory. Then try the ebuild provided in this tarball. 

It patches the kernel api driver so that proper kernel function calls are actually used not deprecated ones... I have notified nvidia of this will see what they say.

Comment 35 Torbjörn Svensson 2004-05-24 13:29:20 UTC

I used the tar-ball from comment #34 and just got this:

May 24 21:58:49 azoff kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
May 24 21:58:49 azoff kernel: Call Trace:
May 24 21:58:49 azoff kernel:  [<c0275610>] pci_find_subsys+0x110/0x120
May 24 21:58:49 azoff kernel:  [<c027564f>] pci_find_device+0x2f/0x40
May 24 21:58:49 azoff kernel:  [<c0275408>] pci_find_slot+0x28/0x50
May 24 21:58:49 azoff kernel:  [<e0e963f2>] os_pci_init_handle+0x39/0x68 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2a85f>] _nv001243rm+0x1f/0x24 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e71115>] _nv000816rm+0x2f5/0x384 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0dd992c>] _nv003801rm+0xd8/0x100 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e70c4f>] _nv000809rm+0x2f/0x34 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e08b48>] _nv003606rm+0xe4/0x114 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e087f9>] _nv003564rm+0x7c9/0x908 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d43267>] _nv004046rm+0x3a3/0x3b0 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e44b03>] _nv001476rm+0x1d3/0x45c [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2d39a>] _nv000896rm+0x4a/0x64 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2ebb4>] rm_isr_bh+0xc/0x10 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e93b0b>] nv_kern_isr_bh+0xf/0x13 [nvidia]
May 24 21:58:49 azoff kernel:  [<c0125f35>] tasklet_action+0x65/0xc0
May 24 21:58:49 azoff kernel:  [<c0125c87>] do_softirq+0xc7/0xd0
May 24 21:58:49 azoff kernel:  [<c0109d38>] do_IRQ+0x138/0x190
May 24 21:58:49 azoff kernel:  [<c0107ee8>] common_interrupt+0x18/0x20
May 24 21:58:49 azoff kernel: 
May 24 21:58:49 azoff kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
May 24 21:58:49 azoff kernel: Call Trace:
May 24 21:58:49 azoff kernel:  [<c0275610>] pci_find_subsys+0x110/0x120
May 24 21:58:49 azoff kernel:  [<c027564f>] pci_find_device+0x2f/0x40
May 24 21:58:49 azoff kernel:  [<c0275408>] pci_find_slot+0x28/0x50
May 24 21:58:49 azoff kernel:  [<e0e963f2>] os_pci_init_handle+0x39/0x68 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2a85f>] _nv001243rm+0x1f/0x24 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0ddba5d>] _nv003797rm+0xa9/0x128 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e484a1>] _nv001490rm+0x55/0xe4 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e71154>] _nv000816rm+0x334/0x384 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0dd992c>] _nv003801rm+0xd8/0x100 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e70c4f>] _nv000809rm+0x2f/0x34 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e08b48>] _nv003606rm+0xe4/0x114 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e087f9>] _nv003564rm+0x7c9/0x908 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d43267>] _nv004046rm+0x3a3/0x3b0 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e44b03>] _nv001476rm+0x1d3/0x45c [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2d39a>] _nv000896rm+0x4a/0x64 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0d2ebb4>] rm_isr_bh+0xc/0x10 [nvidia]
May 24 21:58:49 azoff kernel:  [<e0e93b0b>] nv_kern_isr_bh+0xf/0x13 [nvidia]
May 24 21:58:49 azoff kernel:  [<c0125f35>] tasklet_action+0x65/0xc0
May 24 21:58:49 azoff kernel:  [<c0125c87>] do_softirq+0xc7/0xd0
May 24 21:58:49 azoff kernel:  [<c0109d38>] do_IRQ+0x138/0x190
May 24 21:58:49 azoff kernel:  [<c0107ee8>] common_interrupt+0x18/0x20
May 24 21:58:49 azoff kernel:


It's the same setting and the same machine as my last post. 
Can it maybe be the gentoo(2.6.5-gentoo-r1) kernel that's fscking the nvidia mod? The uptime is 20days and 13h.
If I get this once again in this kernel I will try the vanilla source.
Thanks for working on gentoo!

Comment 36 Andrew Bevitt 2004-05-24 18:12:16 UTC

Yep thats nvidia adding a depend... im sorry guys but Im waiting from word from nvidia on scheduling control in there drivers. 

@ Torbjorn : what chipset do you have for your AGP bus? Post the output of lspci aswell please.

Comment 37 Paul Kronenwetter 2004-05-24 18:20:32 UTC

I just received a lockup today while VMWare was open and I was emerging wine.  I've since jumped to XOrg-X11 so we'll see what tomorrow brings.

Comment 38 Torbjörn Svensson 2004-05-25 00:23:16 UTC

I am using agpgart in the kernel. This lockup came when I hade some aterms running and tvtime and was compiling. I have done this _many_ times without any lockups.
Heres the lspci:

# /sbin/lspci 
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
0000:00:09.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
0000:00:09.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
0000:00:09.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51)
0000:00:0c.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 43)
0000:00:0d.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
0000:00:0d.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
0000:00:0e.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
0000:00:0e.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
0000:00:0f.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 01)
0000:00:10.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
0000:00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV15 [GeForce2 GTS/Pro] (rev a4)

# uname -a
Linux tux 2.6.5-gentoo-r1 #2 SMP Wed Apr 28 19:06:48 CEST 2004 i686 AMD Athlon(TM) XP 1800+ AuthenticAMD GNU/Linux

# X -version  

This is a pre-release version of XFree86, and is not supported in any
way.  Bugs may be reported to XFree86@XFree86.Org and patches submitted
to fixes@XFree86.Org.  Before reporting bugs in pre-release versions,
please check the latest version in the XFree86 CVS repository
(http://www.XFree86.Org/cvs).

XFree86 Version 4.3.0.1
Release Date: 15 August 2003
X Protocol Version 11, Revision 0, Release 6.6
Build Operating System: Linux 2.6.5-gentoo-r1 i686 [ELF] 
Build Date: 28 April 2004
        Before reporting problems, check http://www.XFree86.Org/
        to make sure that you have the latest version.
Module Loader present


*  media-video/nvidia-kernel
      Latest version available: 1.0.5336-r3
      Latest version installed: 1.0.5336-r3
      Size of downloaded files: 6,661 kB
      Homepage:    http://www.nvidia.com/
      Description: Linux kernel module for the NVIDIA's X driver

I can't find anything else that you need to know. Tell me if so and I will put it here :)

Comment 39 Andrew Bevitt 2004-06-02 16:25:31 UTC

Bug 51524 mentions needing a large PSU to stop instability in the AGP bus,

Torbjorn what is the current output power of your PSU?

Comment 40 Torbjörn Svensson 2004-06-03 02:39:44 UTC

I don't think thats my problem. It's a GFS2 card (2xAGP) and I don't think it needs that much power. It stands 340W on it, but how do I get the current? :)

Comment 41 Andrew Bevitt 2004-06-12 00:14:21 UTC

@Torbjorn : you're running SMP though aren't you?
Power consumption in that case is going to be "more" still GFS2 does sort of make me think that it might not be the whole story... 

Can you post : lscpi -xxx -s 0:0 (you'll need to be root)....

Comment 42 Torbjörn Svensson 2004-06-12 08:09:24 UTC

I did write wrong, it's a 4xAGP port. Anyway, here the output comes..
If there is anything else you need, just tell me :)

# lspci -xxx -s 0:0
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00: 06 11 99 30 06 00 30 22 00 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 7f 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 18 88 80 82 45 01 00 18 24 88 80 82 44 00 00
50: 16 f4 69 ea 20 05 20 20 e0 ee 10 20 20 20 20 20
60: aa aa 00 a0 e6 99 c0 1e 68 ed 54 70 c1 68 00 00
70: 80 c8 00 01 00 01 10 00 01 00 00 00 00 00 00 03
80: 0f 00 00 00 c0 00 00 00 02 00 70 1d 00 00 00 00
90: 16 f4 69 ea 07 1c f1 0b 21 ff 00 00 21 23 74 00
a0: 02 c0 20 00 07 02 00 1f 04 01 00 00 2f 08 04 9a
b0: 7f 9a 18 00 40 00 00 00 80 00 00 00 00 00 00 84
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 44 99 99 99 00 44 00 00
f0: 00 20 00 00 00 94 94 00 00 00 00 00 00 00 00 00

Comment 43 Andrew Bevitt 2004-06-15 17:39:14 UTC

Torbjorn: Try a large PSU if at all possible..

The above looks normal though so im thinking its either hardware or the actual drivers, of which only one can be tested... :)

BTW theres a new -r4 driver in the tree which would be worth testing out aswell.

Comment 44 Andrew Bevitt 2004-07-01 00:57:05 UTC

Theres new drivers out 6106 see bug #55714.

Comment 45 Andrew Bevitt 2004-07-20 17:11:08 UTC

This is something in nvidias code so we cant do much about it

Summarily
1) Use the newest 6106 drivers as they at least dont oops 
2) Bug nvidia about it
3) Try a larger PSU