ACPI has been extensively modified very recently to change how it assigns/deals with interrupts in the 2.6 kernel. While I've had problems with in on Intel based chip sets and cpus, it seems very especially a problem for nForce based chipsets and nVidia graphics drivers. See the forum discussion here: http://forums.gentoo.org/viewtopic.php?t=77378&start=0&postdays=0&postorder=asc&highlight= There are links in the forum to new patches for nvnet and nvidia-kernel in the above thread. These include patches for 2.6.0-test4. The links are: http://www.minion.de/ - these are the nvidia-kernel patches http://holarse.wue.de/index.php?content=treiber_nvidia_26x - I think these are the nvnet patches. I can't tell because A it's in german and B the site is participating in the European "No Software Patents" protest. If these patches could be made a part of the respective ebuilds, it would be appreciated. Obviously, the ebuilds would have to be smart enough to know when to include the patches and when to not include them. The nvidia-kernel ebuild should probably also contain a message to the effect that: "If you are trying the 2.6.0 kernel and you experience trouble with the nvidia-kernel drivers, try adding 'pci=noacpi' to your boot options. If you continue to have problems, try adding 'acpi=off'. When all else fails, re-compile your kernel without ACPI and without APCI support." Reproducible: Always Steps to Reproduce: Actual Results: The nature of problems experienced range from X not starting at all to hard system freezes. This depends on the various options tried. Different people have different results. Other than adding the noted patches, this 'problem' won't be resolved until the kernel team fixes/finishes the new ACPI code. One of the problems with finishing this kernel module appears to be bugs found in some Via chipsets for example. Sorry, I didn't note which ones. If you update the ebuilds, you may want to leave this bug 'open' just so that people can easily find it. I tried to include reasonable search terms in the summary.
You might want to look at this bug http://bugzilla.kernel.org/show_bug.cgi?id=1120 And follow the instructions ther to provide ACPI info dumps for Len Brown to look at, hopefully he'll be able to sort out the ACPI issues. You should file bugs like this on the kernel bugzilla btw; posting it as a Gentoo bug is unlikely to have an impact, as it isn't a Gentoo bug.
To clarify: I don't expect the problems with ACPI to be fixed here. I realize that's a job for the kernel folk. What I really want to request is that the nvidia-kernel and nvnet ebuilds be updated to apply the 2.6.0-test4 patches. This should be assigned to whomever is maintaining those ebuilds. Sorry for my confusion. Thanks to Jonathan for pointing this out to me. :-/
nvidia-kernel: The 2.6-test3-bk5+ patch is in so it will compile OK. Looking through the message it seems the there's a new version of the NVIDIA patch. CC'ing Martin, I assume you could bump this for us? Martin, the other thing the reporter was trying to get through to us was he wanted an "if you get problems disable ACPI" warning added to the ebuild? nvnet: CC'ing Dean Bailey, who appeared most on the ChangeLog. Please reassign if this isn't your area Dean... wranglers: Why did this bounce to us?
This thing is confusing - Guy, I am prob on crack, but please tell me next time the 2003-08-18 patch we got is old, and that the newer 2003-08-23 patch fixes irq stuff. I however cannot see a major diff between 0818 and the 0823 versions, so not updating. What else should I check ?
Martin, as far as I can tell, the only difference between the two is the renaming of some variables in nvidia-kernel to match some variable name changes in the 2.6.0-test4 kernel. All I really know is I no longer get kernel trace messages regarding the nvidia graphics chip on on one machine. I still get them on another. FWIW, I think this is going to be a 'sticky, messy' area for awhile. Especially for nForce chipset folk. My work machine has an Intel865 chipset in it. With 2.6.0-test4-bk3, the APIC, IO-APIC stuff seems to be working almost correctly. My home machine with the nForce chipset and an 8/02 ASUS bios simply sucks. I can't even compile in ACPI at all. The machine will boot if I do, but will freeze dead sooner later every single time. I just downloaded FreeDOS in order to try to upgrade my BIOS but ... One of the mroe serious problems I'm having with the ASUS ACPI is that it insists on assigning the nForce audio to the same interrupt as the built in GeForce 2 graphics core. I can't play music and run anything 3D at all. Yet, on 2.4.20, this was not a problem. [shrug] Right now, I'm running a version 2.6.0-test4 with no ACPI compiled in. If I want to listen to music, I don't do anything else on the computer. Otherwise, I just leave music off. But really, the 8/23 patch is definitely more stable on both my machines than just the 8/18 patch. I can only tell you what (sort of) works. Best regards, Guy
Just be a bit careful with the new patches since the 5 Sept. 2003 patch for the Nvidia 1.0-3123 driver/Linux 2.6 deletes its own Makefile. (I was working on an ebuild for it; my PNY Geforce 3 Ti 200 does not tolerate the 4xxx drivers.) I have notified Christian Zander (minion.de) of this, and await his response.
Bah; I'm being silly (he's following the format of the others, yet only showing the 4496 README). Unfortunately, the 3123's Makefile.nvidia (what Gentoo prefers for userpriv/sandbox issues) create invalid modules. I'll look into that.
Further information: Even with all patches upto and including 8/23, the nvidia.o driver might still not work with 2.6.0-test4-mm6 on some systems. It definitely does _not_ work on nforce chipsets. From here, it looks like two different things are 'broken': 1. The nvidia drivers need a serious overhaul to work properly with the 2.6 kernel. 2. The ACPI module since > 2.6.0-test3-bk? sucks for nforce chipsets. I don't consider this to be a kernel.org problem however. I consider this to be a problem with NVidia Corp. IE: They need to either drop the proprietary BS or do a much better job of keeping the drivers up to date. Work around: Go to the 2.4.20 kernel. This avoids some problems created in later 2.4 kernels by the back-porting 2.6 code. Related bugs include # 28061 & # 28072 FWIW: I'm now running my home machine {ASUS A7N266-VM/AA mobo} with the latest BIOS {1007 as of 8/xx/03} using 2.4.20 with no hiccups whatsoever. For those of you annoyed by the deleting of previous {other kernel} versions of the nvidia.o driver do the following immediately after your emerge of nvidia-kernel: # touch /lib/modules/2.4.0-test4-mm6/video/nvidia.o This will change the modification time of when Make.nvidia installed the driver. Subsequent emerges of nvidia-kernel will then leave the 'touched' driver(s) alone. You will need to do this for each version of the kernel for which you also do an 'emerge nvidia-kernel'
Have a look at bug #28061, its a mm6 issue: http://marc.theaimsgroup.com/?l=linux-kernel&m=106277599818082&w=2
Hi Martin, Thanx for the pointer regarding -mm6. It saved me from wasting any more time on it. I've tested both the -r2 and -r3 on my problem system using kernel -test4-bk3. -r3 is certainly much, much better. My feelings as far as the nvidia-kernel ebuilds are concerned are that you can close this bug out. As usual, you're A#1 in my book. :-) I have other additional issues but they appear to be Nvidia specific and/or kernel org specific. ACPI is definitely a problem with the nForce chipset using the built-in GeForce graphics core. I suspect strongly, but haven't yet had a chance to test putting an independent nvidia graphics card in the AGP slot, that doing so will ease my nvidia problems a great deal. I will check this out as time permits. Even with your latest -r3 ebuild, I'm still getting kernel messages. I only started getting them sometime after 2.5.75. Anyway, if you could give me advice on whether to report this and to who I would appreciate it. I get more than enough kernel messages to overflow the dmesg buffer. They are all similar to: Badness in pci_find_subsys at drivers/pci/search.c:132 Call Trace: [<c0229525>] pci_find_subsys+0xe5/0xf0 [<c022955f>] pci_find_device+0x2f/0x40 [<c0229418>] pci_find_slot+0x28/0x50 [<f8b7d05f>] os_pci_init_handle+0x3a/0x67 [nvidia] [<f8b8f91f>] __nvsym00057+0x1f/0x24 [nvidia] [<f8c63bdb>] __nvsym04236+0x1f/0x24 [nvidia] [<f8c3bf1d>] __nvsym03984+0x215/0x1d88 [nvidia] [<f8c3da5b>] __nvsym03984+0x1d53/0x1d88 [nvidia] [<c01386a3>] __rmqueue+0xd3/0x130 [<c0138989>] buffered_rmqueue+0xc9/0x170 [<c0138abd>] __alloc_pages+0x8d/0x330 [<c0142afa>] do_anonymous_page+0x12a/0x220 [<c0143142>] handle_mm_fault+0xe2/0x180 [<c0124906>] update_process_times+0x46/0x50 [<c012477d>] update_wall_time+0xd/0x40 [<c011917a>] default_wake_function+0x2a/0x30 [<f8c3f05c>] __nvsym03993+0x15cc/0x15d8 [nvidia] [<f8c3bc56>] __nvsym03986+0x21a/0x2cc [nvidia] [<f8c4171d>] __nvsym04031+0xc5/0x418 [nvidia] [<f8c41610>] __nvsym04015+0x68/0xb0 [nvidia] [<f8c36ddd>] __nvsym00610+0x85/0x954 [nvidia] [<c0138d95>] __get_free_pages+0x35/0x40 [<c0309d0e>] sock_aio_read+0xce/0xe0 [<c01508f9>] do_sync_read+0x89/0xc0 [<f8c6682a>] __nvsym00688+0x16a/0x338 [nvidia] [<f8b92029>] __nvsym00827+0xd/0x1c [nvidia] [<f8b936c4>] rm_isr_bh+0xc/0x10 [nvidia] [<c01208c6>] tasklet_action+0x46/0x70 [<c01206d5>] do_softirq+0xa5/0xb0 [<c010b966>] do_IRQ+0x116/0x160 [<c0109d18>] common_interrupt+0x18/0x20 [<c010937c>] system_call+0x4/0x2c If I have ACPI support compiled into the kernel, I get the following results: {no boot options} My kdm logon dialog comes up. I can logon. System freezes dead within 30 seconds. The freezes appear to be associated with simultaneous graphic and audio activity. {pci=noacpi pci=biosirq} My kdm logon dialog comes up. I can logon. System is relatively stable. I can freeze the system by playing xmms and then starting up an OpenGL based screen saver. {ACPI is not compiled in} This is the most stable 2.6.0 configuration for me. Sound quality is a function of what other activity is going on on this system. {2.4.20 kernel with acpi enabled} No sound glitches at all. Graphics are slowed down enough not to impact sound quality. Advice?
You will have to check LKML, as I am not so familiar with all the issues behind the nforce2 issues. The basics is thus: 1) nforce have acpi irq routing issues besides other things. 2) pci=noacpi in 2.6 at least did not work (and still do not if a specific patch is not applied) 3) The reports differ fairly, as some say they work ok with APM, and others not. Some say pci=noacpi works, others not. 4) I think I did see an acpi patch for 2.4, but not sure if its in 23_pre, or even in 2.6 Further, can you specify 'pci=foo pci=bar', or shouldn't it be 'pci=foo,bar' ? Have a look at: http://marc.theaimsgroup.com/?l=linux-kernel&m=106217840031426&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=106095354216873&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=106072199910689&w=2 With this one having a temp solution maybe: http://marc.theaimsgroup.com/?l=linux-kernel&m=106146433525024&w=2 Which is all I can find with a quick search.
Hi Martin, I see what you're getting at with APIC. And in fact, I had left out APIC support at one point. With the -beta4-bk3 release, I've gotten much more reasonable messages regarding APIC than I used to. My 'cat /proc/interrupts' now looks very reasonable in fact. I'll think what I'll do is build a few new versions of the -beta4-bk3 kernel with and without different combinations of APIC & ACPI support. Re: 'pci=noacpi' working and not working. Both are true. I've experienced both. It seemed to depend on which version of the 2.6.0 kernel _and_ which APIC / ACPI options are compiled in. Currently -beta4-bk3 appears to respect the 'pci=noacpi' boot option. At least, that's my current experience. 'pci=foo pci=bar' versus 'pci=foo,bar'. The first way is the only way I've seen suggested. I imagine that both are valid. Boot options are space delimited and it is permitted to have multiple options. Since 'foo' and 'bar' are, in this case, separate flags, I would not expect a problem. In the case 'pci=foo pci=nofoo', then I would expect the last occurance of 'foo' in 'pci' to be the active flag. IE: 'nofoo'. For reference, this is my machine at work: # cat /proc/interrupts CPU0 0: 18360138 IO-APIC-edge timer 1: 14295 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 542038 IO-APIC-edge i8042 14: 123485 IO-APIC-edge ide0 15: 20 IO-APIC-edge ide1 16: 1613151 IO-APIC-level uhci-hcd, uhci-hcd, nvidia 17: 388290 IO-APIC-level Intel ICH5 18: 110644 IO-APIC-level ide2, uhci-hcd, eth0 19: 0 IO-APIC-level uhci-hcd NMI: 0 LOC: 18361276 ERR: 0 MIS: 1
Rather use test4-bk7 (8 is a bit flaky), or bk9 if out, as they have fixes for the acpi/apic/lapic kernel params ...
Hi Martin, Thanks. -test5 just became available, so I guess I play with that one. ;-) I'll check the -r3 nvidia-kernel ebuild on this system both with the built-in GeForce 2 graphics and a cheap Nvidia card in the AGP slot. Will let you know the results later. :-)
re: -beta5 - For reference: ASUS A7N266-VM/AA mobo. APIC and ACPI compiled in kernel # cat /proc/interrupts CPU0 0: 419818 XT-PIC timer 1: 645 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 11: 13052 IO-APIC-edge NVidia nForce, nvidia 12: 6290 IO-APIC-edge i8042 14: 6957 IO-APIC-edge ide0 15: 42 IO-APIC-edge ide1 18: 3419 IO-APIC-level ide2, ide3, eth0 21: 0 IO-APIC-level ohci-hcd, ohci-hcd NMI: 0 LOC: 419612 ERR: 0 MIS: 0 dmesg: Badness in pci_find_subsys at drivers/pci/search.c:132 Call Trace: [<c0229615>] pci_find_subsys+0xe5/0xf0 [<c022964f>] pci_find_device+0x2f/0x40 [<c0229508>] pci_find_slot+0x28/0x50 [<f8b7f05f>] os_pci_init_handle+0x3a/0x67 [nvidia] [<f8b95f6b>] __nvsym00822+0x4fb/0xd60 [nvidia] [<f8b9191f>] __nvsym00057+0x1f/0x24 [nvidia] [<f8c65bdb>] __nvsym04236+0x1f/0x24 [nvidia] [<f8c3df1d>] __nvsym03984+0x215/0x1d88 [nvidia] [<f8b92cba>] __nvsym00257+0x12/0x18 [nvidia] [<f8cb2f59>] __nvsym01638+0x151/0x15c [nvidia] [<f8c8c046>] __nvsym00995+0x142/0x15c [nvidia] [<f8c2fb05>] __nvsym03871+0xa9/0x114 [nvidia] [<f8c24587>] __nvsym03751+0x85f/0x8a4 [nvidia] [<f8bc490a>] __nvsym00830+0x12/0x18 [nvidia] [<f8bac480>] __nvsym00805+0x2c/0x228 [nvidia] [<f8b92cba>] __nvsym00257+0x12/0x18 [nvidia] [<f8c689eb>] __nvsym00688+0x32b/0x338 [nvidia] [<f8b940fd>] __nvsym00727+0x31/0x38 [nvidia] [<f8b9402f>] __nvsym00827+0x13/0x1c [nvidia] [<f8b956c4>] rm_isr_bh+0xc/0x10 [nvidia] [<c0120b56>] tasklet_action+0x46/0x70 [<c02d3c48>] serio_interrupt+0x58/0x60 [<c011940a>] default_wake_function+0x2a/0x30 [<c0119441>] __wake_up_common+0x31/0x50 [<f8c4105c>] __nvsym03993+0x15cc/0x15d8 [nvidia] [<f8c3dc56>] __nvsym03986+0x21a/0x2cc [nvidia] [<f8c4371d>] __nvsym04031+0xc5/0x418 [nvidia] [<f8c43610>] __nvsym04015+0x68/0xb0 [nvidia] [<f8c38ddd>] __nvsym00610+0x85/0x954 [nvidia] [<c0138d75>] __get_free_pages+0x35/0x40 [<c030a1ae>] sock_aio_read+0xce/0xe0 [<c01507f9>] do_sync_read+0x89/0xc0 [<f8c6882a>] __nvsym00688+0x16a/0x338 [nvidia] [<f8b94029>] __nvsym00827+0xd/0x1c [nvidia] [<f8b956c4>] rm_isr_bh+0xc/0x10 [nvidia] [<c0120b56>] tasklet_action+0x46/0x70 [<c0120965>] do_softirq+0xa5/0xb0 [<c010ba26>] do_IRQ+0x116/0x160 [<c0109da8>] common_interrupt+0x18/0x20 Clean boot.
Well, things are definitely improved. I can still freeze this system solid by playing xmms and any GL based screensaver. But I just had my mind blown totally. When I killed xmms before my system could completely freeze up, my configure desktop dialog box closed as well. Now I'm wondering if KDE 3.1.3 could be involved in this mess too. I'll try putting in a graphics card tomorrow into the AGP slot. See if that makes a difference. Martin, if you're keeping this open just for me, you don't need to. :-) It's definitely appreciated, but you've already fixed the nvidia-kernel ebuild. On the other hand, if you want me to try anything related to these issues, I'm certainly game. BTW - I'm really impressed with what the kernel crew have done with the APIC and ACPI stuff. While I would not understand any of the code if it was shown to me, I certainly can see the results. The -beta5 kernel 'feels' really good at the moment. I just wish I understood more on how to use the debugging tools better. Now, if I could just have my favorite GL based screen savers on screen while enjoying my OGG / MP3 collection ... at the same time. Best regards and much thanks!
Hi Guy. Sorry for not being able to help more, as its a bit out of my league, and I am generally swamped =) Anyhow, Ill close this now, if there is fixes nvidia-kernel side, just let me know.