When using Option "NvAGP" "1" in XF86config - xfree starts without AGP. Without it, it tries to use AGPGART and locks the system with colorful ASCII art on screen or a black screen - switch ing the console is impossible. However SysRq+k works and then consoles can be switched (alternative:ssh?). X runs with 100% CPU. killall -9 X kills it. Starting xfree again succeeds with AGP and everything! Reproducible: Always Steps to Reproduce: 1. startx Actual Results: black screen/ASCII-art screen and lockup. Expected Results: xfree starting From dmesg: nvidia: no version for "struct_module" found: kernel tainted. nvidia: no version magic, tainting kernel. nvidia: module license 'NVIDIA' taints kernel. 0: nvidia: loading NVIDIA Linux x86 NVIDIA Kernel Module 1.0-5336 Wed Jan 14 1 8:29:26 PST 2004 agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0. agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode --- "killall -9 X" here agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0. agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode From lspci: 00:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device 0746 (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation: Unknown device 0312 (rev a1) from emerge info: Portage 2.0.49-r21 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r9, 2.6.2) ================================================================= System uname: 2.6.2 i686 AMD Athlon(tm) XP 2400+ Gentoo Base System version 1.4.3.10 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O3 -march=athlon-xp -mmmx -m3dnow -fomit-frame-pointer -fforce-addr -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-O3 -march=athlon-xp -mmmx -m3dnow -fomit-frame-pointer -fforce-addr -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" USE="3dnow X Xaw3d acpi acpi4linux alsa avi berkdb bonobo cdr crypt cscope cups curl dga directfb dvd encode esd fbcon ffmpeg foomaticdb gdbm gif glut gnome gstreamer gtk gtk2 gtkhtml guile imlib java jikes joystick jpeg libg++ libwww ltsp mad maildir mikmod mmx motif mozilla moznocompose moznoirc moznomail mpeg mpeg4 ncurses nls nvidia offensive oggvorbis openal opengl oss pam pdflib perl pic png ppds python readline sdl slang spell sqlite sse ssl tcpd tetex tiff transcode truetype unicode usb v4l vim-with-x x86 xinerama xml2 xmms xv xvid zlib"
Oh, and BTW the results are independant of the 2.6 kernel version and they are independant of the nvidia driver version. It seems like X and AGPGART locking each other while registering AGP-devices. If anybody knows how to get useful additional debug info, please please tell me! Maybe this is a kernel bug and not a xfree bug? The system locks too sometimes, if I do a "shutdown -h now" from a xterm - again with full CPU-load... However sending a SysRq+k kills X and the system then shuts down normally.
These messages accompany the locks (sometimes): Feb 15 20:21:45 [kernel] 0: nvidia: trying to map 0xbfe4d000 to kernel space, but we're in an interrupt or holding a spinlock Feb 15 20:23:10 [kernel] 0: nvidia: trying to map 0xbfe43c00 to kernel space, but we're in an interrupt or holding a spinlock Feb 15 20:24:34 [kernel] 0: nvidia: trying to map 0xbfc59000 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:05:23 [kernel] 0: nvidia: trying to map 0xbfad4800 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:10:31 [kernel] 0: nvidia: trying to map 0xbfac6800 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:11:21 [kernel] 0: nvidia: trying to map 0xbf7a7f00 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:19:19 [kernel] 0: nvidia: trying to map 0xbf7a7f00 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:33:28 [kernel] 0: nvidia: trying to map 0xbf7a7f00 to kernel space, but we're in an interrupt or holding a spinlock Feb 16 00:45:34 [kernel] 0: nvidia: trying to map 0xbd9a3f00 to kernel space, but we're in an interrupt or holding a spinlock
This patch by Oliver Schoett: http://lkml.org/lkml/2004/2/22/102 solved the problem. The patch is for the SiS 648 but worked also for my SiS 746. I applied the patch manually against vanilla 2.6.4-rc1. If the patch is not in the gentoo-kernel maybe it should be added?
reopening bug to give to the kernel team
I just found this patch seems also to be in the love-sources, so it it already tested a bit: http://forums.gentoo.org/viewtopic.php?t=140586&highlight=sis+lovesources BTW, I posted a link to this bug to the lkml.
There is a bug open for this on bugzilla.kernel.org: http://bugzilla.kernel.org/show_bug.cgi?id=2327 A more extense patch is proposed there.
Bugfix is implemented in vanilla 2.6.5.
The changes in 2.6.5 only apply for the SiS 648, but *NOT* for the SiS 746. I wrote a patch against vanilla 2.6.5 to fix this and posted it here: http://bugzilla.kernel.org/show_bug.cgi?id=2327
This is something that should be pushed upstream. Please file a bug at http://bugme.osdl.org.
I already contacted Dave Jones on LKML and this is under heavy development. Although it is broken for the SiS 746 in 2.6.5 it seems to be fixed in the current 2.6.6 rc. http://testing.lkml.org/slashdot.php?mid=465593