Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 27412 - kernel-2.6.0-test4 nvidia-kernel acpi bad interactions and new patches test4 nvidia nvnet kernel
Summary: kernel-2.6.0-test4 nvidia-kernel acpi bad interactions and new patches test4 ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Highest trivial (vote)
Assignee: x86-kernel@gentoo.org (DEPRECATED)
URL:
Whiteboard:
Keywords:
Depends on: 28072
Blocks:
  Show dependency tree
 
Reported: 2003-08-27 04:25 UTC by Guy
Modified: 2003-09-14 07:16 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guy 2003-08-27 04:25:46 UTC
ACPI has been extensively modified very recently to change how it assigns/deals
with interrupts in the 2.6 kernel. While I've had problems with in on Intel
based chip sets and cpus, it seems very especially a problem for nForce based
chipsets and nVidia graphics drivers.

See the forum discussion here:
http://forums.gentoo.org/viewtopic.php?t=77378&start=0&postdays=0&postorder=asc&highlight=

There are links in the forum to new patches for nvnet and nvidia-kernel in the
above thread. These include patches for 2.6.0-test4. The links are:

http://www.minion.de/ - these are the nvidia-kernel patches
http://holarse.wue.de/index.php?content=treiber_nvidia_26x - I think these are
the nvnet patches. I can't tell because A it's in german and B the site is
participating in the European "No Software Patents" protest.

If these patches could be made a part of the respective ebuilds, it would be
appreciated.

Obviously, the ebuilds would have to be smart enough to know when to include the
patches and when to not include them. The nvidia-kernel ebuild should probably
also contain a message to the effect that:

"If you are trying the 2.6.0 kernel and you experience trouble with the
nvidia-kernel drivers, try adding 'pci=noacpi' to your boot options. If you
continue to have problems, try adding 'acpi=off'. When all else fails,
re-compile your kernel without ACPI and without APCI support."

Reproducible: Always
Steps to Reproduce:

Actual Results:  
The nature of problems experienced range from X not starting at all to hard
system freezes. This depends on the various options tried. Different people have
different results.


Other than adding the noted patches, this 'problem' won't be resolved until the
kernel team fixes/finishes the new ACPI code. One of the problems with finishing
this kernel module appears to be bugs found in some Via chipsets for example.
Sorry, I didn't note which ones.

If you update the ebuilds, you may want to leave this bug 'open' just so that
people can easily find it. I tried to include reasonable search terms in the
summary.
Comment 1 Jonathan Heaney 2003-08-27 18:16:23 UTC
You might want to look at this bug

http://bugzilla.kernel.org/show_bug.cgi?id=1120

And follow the instructions ther to provide ACPI info dumps for Len Brown to look at, hopefully he'll be able to sort out the ACPI issues.

You should file bugs like this on the kernel bugzilla btw; posting it as a Gentoo bug is unlikely to have an impact, as it isn't a Gentoo bug.
Comment 2 Guy 2003-08-28 14:44:30 UTC
To clarify: I don't expect the problems with ACPI to be fixed here. I realize that's a job for the kernel folk.

What I really want to request is that the nvidia-kernel and nvnet ebuilds be updated to apply the 2.6.0-test4 patches. This should be assigned to whomever is maintaining those ebuilds.

Sorry for my confusion. Thanks to Jonathan for pointing this out to me.

:-/
Comment 3 Tim Yamin (RETIRED) gentoo-dev 2003-08-28 14:55:49 UTC
nvidia-kernel: The 2.6-test3-bk5+ patch is in so it will compile OK. Looking through the message it seems the there's a new version of the NVIDIA patch. CC'ing Martin, I assume you could bump this for us? Martin, the other thing the reporter was trying to get through to us was he wanted an "if you get problems disable ACPI" warning added to the ebuild?

nvnet: CC'ing Dean Bailey, who appeared most on the ChangeLog. Please reassign if this isn't your area Dean...

wranglers: Why did this bounce to us?
Comment 4 Martin Schlemmer (RETIRED) gentoo-dev 2003-09-01 11:27:31 UTC
This thing is confusing - Guy, I am prob on crack, but please tell me next time
the 2003-08-18 patch we got is old, and that the newer 2003-08-23 patch fixes
irq stuff.

I however cannot see a major diff between 0818 and the 0823 versions, so not
updating.  What else should I check ?
Comment 5 Guy 2003-09-04 19:48:45 UTC
Martin, as far as I can tell, the only difference between the two is the renaming of some variables in nvidia-kernel to match some variable name changes in the 2.6.0-test4 kernel. All I really know is I no longer get kernel trace messages regarding the nvidia graphics chip on on one machine. I still get them on another.

FWIW, I think this is going to be a 'sticky, messy' area for awhile. Especially for nForce chipset folk.

My work machine has an Intel865 chipset in it. With 2.6.0-test4-bk3, the APIC, IO-APIC stuff seems to be working almost correctly.

My home machine with the nForce chipset and an 8/02 ASUS bios simply sucks. I can't even compile in ACPI at all. The machine will boot if I do, but will freeze dead sooner later every single time. I just downloaded FreeDOS in order to try to upgrade my BIOS but ... One of the mroe serious problems I'm having with the ASUS ACPI is that it insists on assigning the nForce audio to the same interrupt as the built in GeForce 2 graphics core. I can't play music and run anything 3D at all. Yet, on 2.4.20, this was not a problem. [shrug]

Right now, I'm running a version 2.6.0-test4 with no ACPI compiled in. If I want to listen to music, I don't do anything else on the computer. Otherwise, I just leave music off.

But really, the 8/23 patch is definitely more stable on both my machines than just the 8/18 patch. I can only tell you what (sort of) works.

Best regards,
Guy
Comment 6 Samuel Greenfeld 2003-09-06 09:50:48 UTC
Just be a bit careful with the new patches since the 5 Sept. 2003 patch for the Nvidia 1.0-3123 driver/Linux 2.6 deletes its own Makefile.  (I was working on an ebuild for it; my PNY Geforce 3 Ti 200 does not tolerate the 4xxx drivers.)

I have notified Christian Zander (minion.de) of this, and await his response.
Comment 7 Samuel Greenfeld 2003-09-06 10:27:21 UTC
Bah; I'm being silly (he's following the format of the others, yet only showing the 4496 README).

Unfortunately, the 3123's Makefile.nvidia (what Gentoo prefers for userpriv/sandbox issues) create invalid modules.  I'll look into that.
Comment 8 Guy 2003-09-06 18:46:20 UTC
Further information:

Even with all patches upto and including 8/23, the nvidia.o driver might still not work with 2.6.0-test4-mm6 on some systems. It definitely does _not_ work on nforce chipsets.

From here, it looks like two different things are 'broken':

1. The nvidia drivers need a serious overhaul to work properly with the 2.6 kernel.

2. The ACPI module since > 2.6.0-test3-bk? sucks for nforce chipsets. I don't consider this to be a kernel.org problem however. I consider this to be a problem with NVidia Corp. IE: They need to either drop the proprietary BS or do a much better job of keeping the drivers up to date.

Work around: Go to the 2.4.20 kernel. This avoids some problems created in later  2.4 kernels by the back-porting 2.6 code.

Related bugs include # 28061 & # 28072

FWIW: I'm now running my home machine {ASUS A7N266-VM/AA mobo} with the latest BIOS {1007 as of 8/xx/03} using 2.4.20 with no hiccups whatsoever.

For those of you annoyed by the deleting of previous {other kernel} versions of the nvidia.o driver do the following immediately after your emerge of nvidia-kernel:

# touch /lib/modules/2.4.0-test4-mm6/video/nvidia.o

This will change the modification time of when Make.nvidia installed the driver. Subsequent emerges of nvidia-kernel will then leave the 'touched' driver(s) alone. You will need to do this for each version of the kernel for which you also do an 'emerge nvidia-kernel'
Comment 9 Martin Schlemmer (RETIRED) gentoo-dev 2003-09-07 07:58:22 UTC
Have a look at bug #28061, its a mm6 issue:

  http://marc.theaimsgroup.com/?l=linux-kernel&m=106277599818082&w=2

Comment 10 Guy 2003-09-07 18:10:08 UTC
Hi Martin,

Thanx for the pointer regarding -mm6. It saved me from wasting any more time on it.

I've tested both the -r2 and -r3 on my problem system using kernel -test4-bk3. -r3 is certainly much, much better. My feelings as far as the nvidia-kernel ebuilds are concerned are that you can close this bug out. As usual, you're A#1 in my book. :-)

I have other additional issues but they appear to be Nvidia specific and/or kernel org specific. ACPI is definitely a problem with the nForce chipset using the built-in GeForce graphics core. I suspect strongly, but haven't yet had a chance to test putting an independent nvidia graphics card in the AGP slot, that doing so will ease my nvidia problems a great deal. I will check this out as time permits.

Even with your latest -r3 ebuild, I'm still getting kernel messages. I only started getting them sometime after 2.5.75. Anyway, if you could give me advice on whether to report this and to who I would appreciate it.

I get more than enough kernel messages to overflow the dmesg buffer. They are all similar to:

Badness in pci_find_subsys at drivers/pci/search.c:132
Call Trace:
 [<c0229525>] pci_find_subsys+0xe5/0xf0
 [<c022955f>] pci_find_device+0x2f/0x40
 [<c0229418>] pci_find_slot+0x28/0x50
 [<f8b7d05f>] os_pci_init_handle+0x3a/0x67 [nvidia]
 [<f8b8f91f>] __nvsym00057+0x1f/0x24 [nvidia]
 [<f8c63bdb>] __nvsym04236+0x1f/0x24 [nvidia]
 [<f8c3bf1d>] __nvsym03984+0x215/0x1d88 [nvidia]
 [<f8c3da5b>] __nvsym03984+0x1d53/0x1d88 [nvidia]
 [<c01386a3>] __rmqueue+0xd3/0x130
 [<c0138989>] buffered_rmqueue+0xc9/0x170
 [<c0138abd>] __alloc_pages+0x8d/0x330
 [<c0142afa>] do_anonymous_page+0x12a/0x220
 [<c0143142>] handle_mm_fault+0xe2/0x180
 [<c0124906>] update_process_times+0x46/0x50
 [<c012477d>] update_wall_time+0xd/0x40
 [<c011917a>] default_wake_function+0x2a/0x30
 [<f8c3f05c>] __nvsym03993+0x15cc/0x15d8 [nvidia]
 [<f8c3bc56>] __nvsym03986+0x21a/0x2cc [nvidia]
 [<f8c4171d>] __nvsym04031+0xc5/0x418 [nvidia]
 [<f8c41610>] __nvsym04015+0x68/0xb0 [nvidia]
 [<f8c36ddd>] __nvsym00610+0x85/0x954 [nvidia]
 [<c0138d95>] __get_free_pages+0x35/0x40
 [<c0309d0e>] sock_aio_read+0xce/0xe0
 [<c01508f9>] do_sync_read+0x89/0xc0
 [<f8c6682a>] __nvsym00688+0x16a/0x338 [nvidia]
 [<f8b92029>] __nvsym00827+0xd/0x1c [nvidia]
 [<f8b936c4>] rm_isr_bh+0xc/0x10 [nvidia]
 [<c01208c6>] tasklet_action+0x46/0x70
 [<c01206d5>] do_softirq+0xa5/0xb0
 [<c010b966>] do_IRQ+0x116/0x160
 [<c0109d18>] common_interrupt+0x18/0x20
 [<c010937c>] system_call+0x4/0x2c


If I have ACPI support compiled into the kernel, I get the following results:

{no boot options}
My kdm logon dialog comes up. I can logon. System freezes dead within 30 seconds. The freezes appear to be associated with simultaneous graphic and audio activity.

{pci=noacpi pci=biosirq}

My kdm logon dialog comes up. I can logon. System is relatively stable. I can freeze the system by playing xmms and then starting up an OpenGL based screen saver.

{ACPI is not compiled in}
This is the most stable 2.6.0 configuration for me. Sound quality is a function of what other activity is going on on this system.

{2.4.20 kernel with acpi enabled}
No sound glitches at all. Graphics are slowed down enough not to impact sound quality.

Advice?
Comment 11 Martin Schlemmer (RETIRED) gentoo-dev 2003-09-07 19:03:55 UTC
You will have to check LKML, as I am not so familiar with all the issues
behind the nforce2 issues.  The basics is thus:

1)  nforce have acpi irq routing issues besides other things.

2)  pci=noacpi in 2.6 at least did not work (and still do not if
    a specific patch is not applied)

3)  The reports differ fairly, as some say they work ok with APM, and
    others not.  Some say pci=noacpi works, others not.

4)  I think I did see an acpi patch for 2.4, but not sure if its in 23_pre,
    or even in 2.6

Further, can you specify 'pci=foo pci=bar', or shouldn't it be 'pci=foo,bar' ?

Have a look at:

  http://marc.theaimsgroup.com/?l=linux-kernel&m=106217840031426&w=2
  http://marc.theaimsgroup.com/?l=linux-kernel&m=106095354216873&w=2
  http://marc.theaimsgroup.com/?l=linux-kernel&m=106072199910689&w=2

With this one having a temp solution maybe:

  http://marc.theaimsgroup.com/?l=linux-kernel&m=106146433525024&w=2

Which is all I can find with a quick search.
Comment 12 Guy 2003-09-08 13:07:56 UTC
Hi Martin,

I see what you're getting at with APIC. And in fact, I had left out APIC support at one point. With the -beta4-bk3 release, I've gotten much more reasonable messages regarding APIC than I used to. My 'cat /proc/interrupts' now looks very reasonable in fact.

I'll think what I'll do is build a few new versions of the -beta4-bk3 kernel with and without different combinations of APIC & ACPI support.

Re: 'pci=noacpi' working and not working. Both are true. I've experienced both. It seemed to depend on which version of the 2.6.0 kernel _and_ which APIC / ACPI options are compiled in. Currently -beta4-bk3 appears to respect the 'pci=noacpi' boot option. At least, that's my current experience.

'pci=foo pci=bar' versus 'pci=foo,bar'. The first way is the only way I've seen suggested. I imagine that both are valid. Boot options are space delimited and it is permitted to have multiple options. Since 'foo' and 'bar' are, in this case, separate flags, I would not expect a problem. In the case 'pci=foo pci=nofoo', then I would expect the last occurance of 'foo' in 'pci' to be the active flag. IE: 'nofoo'.

For reference, this is my machine at work:

# cat /proc/interrupts
           CPU0
  0:   18360138    IO-APIC-edge  timer
  1:      14295    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:     542038    IO-APIC-edge  i8042
 14:     123485    IO-APIC-edge  ide0
 15:         20    IO-APIC-edge  ide1
 16:    1613151   IO-APIC-level  uhci-hcd, uhci-hcd, nvidia
 17:     388290   IO-APIC-level  Intel ICH5
 18:     110644   IO-APIC-level  ide2, uhci-hcd, eth0
 19:          0   IO-APIC-level  uhci-hcd
NMI:          0
LOC:   18361276
ERR:          0
MIS:          1
Comment 13 Martin Schlemmer (RETIRED) gentoo-dev 2003-09-08 13:43:35 UTC
Rather use test4-bk7 (8 is a bit flaky), or bk9 if out, as they have fixes
for the acpi/apic/lapic kernel params ...
Comment 14 Guy 2003-09-08 18:11:19 UTC
Hi Martin,

Thanks. -test5 just became available, so I guess I play with that one. ;-) 

I'll check the -r3 nvidia-kernel ebuild on this system both with the built-in GeForce 2 graphics and a cheap Nvidia card in the AGP slot. Will let you know the results later.

:-)
Comment 15 Guy 2003-09-08 19:08:49 UTC
re: -beta5 - For reference:

ASUS A7N266-VM/AA mobo. APIC and ACPI compiled in kernel

# cat /proc/interrupts
           CPU0
  0:     419818          XT-PIC  timer
  1:        645    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 11:      13052    IO-APIC-edge  NVidia nForce, nvidia
 12:       6290    IO-APIC-edge  i8042
 14:       6957    IO-APIC-edge  ide0
 15:         42    IO-APIC-edge  ide1
 18:       3419   IO-APIC-level  ide2, ide3, eth0
 21:          0   IO-APIC-level  ohci-hcd, ohci-hcd
NMI:          0
LOC:     419612
ERR:          0
MIS:          0

dmesg:

Badness in pci_find_subsys at drivers/pci/search.c:132
Call Trace:
 [<c0229615>] pci_find_subsys+0xe5/0xf0
 [<c022964f>] pci_find_device+0x2f/0x40
 [<c0229508>] pci_find_slot+0x28/0x50
 [<f8b7f05f>] os_pci_init_handle+0x3a/0x67 [nvidia]
 [<f8b95f6b>] __nvsym00822+0x4fb/0xd60 [nvidia]
 [<f8b9191f>] __nvsym00057+0x1f/0x24 [nvidia]
 [<f8c65bdb>] __nvsym04236+0x1f/0x24 [nvidia]
 [<f8c3df1d>] __nvsym03984+0x215/0x1d88 [nvidia]
 [<f8b92cba>] __nvsym00257+0x12/0x18 [nvidia]
 [<f8cb2f59>] __nvsym01638+0x151/0x15c [nvidia]
 [<f8c8c046>] __nvsym00995+0x142/0x15c [nvidia]
 [<f8c2fb05>] __nvsym03871+0xa9/0x114 [nvidia]
 [<f8c24587>] __nvsym03751+0x85f/0x8a4 [nvidia]
 [<f8bc490a>] __nvsym00830+0x12/0x18 [nvidia]
 [<f8bac480>] __nvsym00805+0x2c/0x228 [nvidia]
 [<f8b92cba>] __nvsym00257+0x12/0x18 [nvidia]
 [<f8c689eb>] __nvsym00688+0x32b/0x338 [nvidia]
 [<f8b940fd>] __nvsym00727+0x31/0x38 [nvidia]
 [<f8b9402f>] __nvsym00827+0x13/0x1c [nvidia]
 [<f8b956c4>] rm_isr_bh+0xc/0x10 [nvidia]
 [<c0120b56>] tasklet_action+0x46/0x70
 [<c02d3c48>] serio_interrupt+0x58/0x60
 [<c011940a>] default_wake_function+0x2a/0x30
 [<c0119441>] __wake_up_common+0x31/0x50
 [<f8c4105c>] __nvsym03993+0x15cc/0x15d8 [nvidia]
 [<f8c3dc56>] __nvsym03986+0x21a/0x2cc [nvidia]
 [<f8c4371d>] __nvsym04031+0xc5/0x418 [nvidia]
 [<f8c43610>] __nvsym04015+0x68/0xb0 [nvidia]
 [<f8c38ddd>] __nvsym00610+0x85/0x954 [nvidia]
 [<c0138d75>] __get_free_pages+0x35/0x40
 [<c030a1ae>] sock_aio_read+0xce/0xe0
 [<c01507f9>] do_sync_read+0x89/0xc0
 [<f8c6882a>] __nvsym00688+0x16a/0x338 [nvidia]
 [<f8b94029>] __nvsym00827+0xd/0x1c [nvidia]
 [<f8b956c4>] rm_isr_bh+0xc/0x10 [nvidia]
 [<c0120b56>] tasklet_action+0x46/0x70
 [<c0120965>] do_softirq+0xa5/0xb0
 [<c010ba26>] do_IRQ+0x116/0x160
 [<c0109da8>] common_interrupt+0x18/0x20

Clean boot.
Comment 16 Guy 2003-09-08 20:03:33 UTC
Well, things are definitely improved.

I can still freeze this system solid by playing xmms and any GL based screensaver. But I just had my mind blown totally. When I killed xmms before my system could completely freeze up, my configure desktop dialog box closed as well. Now I'm wondering if KDE 3.1.3 could be involved in this mess too.

I'll try putting in a graphics card tomorrow into the AGP slot. See if that makes a difference. 

Martin, if you're keeping this open just for me, you don't need to. :-) It's definitely appreciated, but you've already fixed the nvidia-kernel ebuild. On the other hand, if you want me to try anything related to these issues, I'm certainly game.

BTW - I'm really impressed with what the kernel crew have done with the APIC and ACPI stuff. While I would not understand any of the code if it was shown to me, I certainly can see the results. The -beta5 kernel 'feels' really good at the moment. I just wish I understood more on how to use the debugging tools better.

Now, if I could just have my favorite GL based screen savers on screen while enjoying my OGG / MP3 collection ... at the same time.

Best regards and much thanks!
Comment 17 Martin Schlemmer (RETIRED) gentoo-dev 2003-09-14 07:16:38 UTC
Hi Guy.  Sorry for not being able to help more, as its a bit out of my league,
and I am generally swamped =)  Anyhow, Ill close this now, if there is fixes
nvidia-kernel side, just let me know.