158928 – forcedeth kernel panic

Bug 158928 - forcedeth kernel panic

Summary: forcedeth kernel panic

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Other

Importance:	High normal (vote)
Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-12-23 09:45 UTC by David Carlos Manuelda
Modified:	2007-01-11 19:38 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Kernel panic image (panic.jpg,874.27 KB, image/jpeg) 2006-12-23 19:23 UTC, David Carlos Manuelda	Details
emerge --info (eminfo.txt,8.81 KB, text/plain) 2006-12-25 16:44 UTC, David Carlos Manuelda	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Carlos Manuelda 2006-12-23 09:45:33 UTC

Hi, I am having this issue since at least 2 months, my system suddenly freezes after some period of time (could not tell how much). I though it was a beryl issue, but lately I have xdm stop when I am not in the comp, and in a tty I saw what was the problem.

I have a kernel panic related to forcedeth and always the back trace is the same when it happens. The only way I have to reproduce it is to have forcedeth eth active (for example with mldonkey) and to wait enough.

I will post the backtraces (via a smart phone photograph attach) soon (as I am not now in home).

When I post it, please recommend me wheter is a gentoo issue or an upstream issue (in that case I'll file a bug in kernel bug system).

Thanks.

Comment 1 David Carlos Manuelda 2006-12-23 09:47:54 UTC

This happens to me with <= last stable kernel (2.6.1?-r4) (note: ? = I don't remember now exact number) in my amd64 system.

Comment 2 Jakub Moc (RETIRED) gentoo-dev

2006-12-23 09:58:23 UTC

1/ Please, don't restrict bugs without any reason.

2/ Unless you can provide some information, like relevant part of /var/log/messages, the kernel panic output or whatever else relevant, we really can't guess.

Comment 3 David Carlos Manuelda 2006-12-23 19:10:55 UTC

Yes I know you can't guess, I said I will attach it and I will do it now.

The following is the kernel panic and backtrace I get:

Comment 4 David Carlos Manuelda 2006-12-23 19:23:37 UTC

Created attachment 104666 [details]
Kernel panic image

This is the kernel panic I can see.

Comment 5 Daniel Drake (RETIRED) gentoo-dev

2006-12-25 15:25:54 UTC

Please post "emerge --info" to every bug that you file.

Is this reproducible on 2.6.19 without any binary modules loaded? If so please post a new photo from there.

Comment 6 David Carlos Manuelda 2006-12-25 16:44:32 UTC

Created attachment 104722 [details]
emerge --info

Comment 7 David Carlos Manuelda 2006-12-25 16:47:15 UTC

Ok, as I don't know how much time I have to let my comp to have this panic, I'll try it this night removing the only one binary module I have (nvidia driver). If I can still get this, I'll post new image.

Note that I've seen this panic at least two times I remember and the backtrace is very similar (if not the same).

Comment 8 Daniel Drake (RETIRED) gentoo-dev

2006-12-25 17:07:08 UTC

Make sure you ensure nvidia has not been loaded *at all* since that boot, and remember to upgrade to 2.6.19

Comment 9 David Carlos Manuelda 2006-12-26 14:23:43 UTC

I booted my system with newly ~amd64 2.6.19-r2 and loaded *only* the modules you can get from kernel config.

System is up 18 hours now without getting this panic (I'll wait for another 12 hours more) but it seems that is unaffected this way.

I noted that with this kernel I get more warnings while compiling it and spca5xx does not compile now with this kernel (should I file a bug for this?)

A question more:
  - Maybe latest nvidia beta driver (wich is the one I am using) is causing problems? (but is weird, what has to be nvidia with tcp handling?)

Comment 10 Daniel Drake (RETIRED) gentoo-dev

2006-12-26 15:09:56 UTC

Maybe your problem is solved in 2.6.19, or it could be related to those modules. nvidia is not related to tcp, but any kernel module can screw with any kernel memory and nvidia has been known to do this in the past.

out-of-kernel package regressions are tracked on bug 156669

Comment 11 David Carlos Manuelda 2006-12-27 07:56:26 UTC

I finally modprobed nvidia module, and started xdm script to use my computer. I was using beryl bor a bit without hangs (specially when running kaffeine).

After this, I stopped xdm script and let my comp in a tty and I didn't see any panic now (possible fix with 2.6.19).

Always with mldonkey using forcedeth device (eth0).

I'll make another test before closing this bug: I'll leave xdm started when I'm sleeping and see (as it happened before) if comp hangs.

A question more, if all goes good, what would be the best resolution for this closure and to let other know that 2.6.19 possibly fix this issue?

Comment 12 David Carlos Manuelda 2006-12-28 08:43:54 UTC

I've been running comp for 2 days (and a few hours) without the issues I had with 2.6.18-r5.

I noted that 2.6.19-r2 is *even more stable* than 2.6.18-r5 (even beryl does not make hang computer as it did with 2.6.18-r5 sometimes), so I suggest stabilization  of this new kernel :)

Comment 13 David Carlos Manuelda 2006-12-28 11:40:18 UTC

As a last note for possible users having same problem. I found this in kernel.org changelog:

[PATCH] forcedeth: Disable INTx when enabling MSI in forcedeth
    
    At least some nforce cards continue to send legacy interrupts when MSI
    is enabled, and these interrupts are treated as unhandled by the
    kernel. This patch disables legacy interrupts explicitly when enabling
    MSI mode.
    
    The correct fix is to change the MSI infrastructure to disable legacy
    interrupts when enabling MSI, but this is potentially risky if the
    device isn't PCI-2.3 or is quirky, so the correct fix is going into
    mainline, while patches like this one go into -stable.
    
    Legend has it that it is most correct to disable legacy interrupts
    before enabling MSI, but the mainline patch does it in the other
    order, and this patch is "obviously" the same as mainline.
    
    Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
    Cc: Jeff Garzik <jeff@garzik.org>
    Cc: Greg KH <gregkh@suse.de>
    Signed-off-by: Chris Wright <chrisw@sous-sol.org>

Comment 14 David Carlos Manuelda 2007-01-05 17:24:56 UTC

Hi again, I've discovered the trigger wich allow this panic to occour and also, a finest way to reproduce it [for someone using forcedeth and amd64 (I don't know if it happens in x86 too)].

The problem wasn't nvidia binary module, nor Xorg nor beryl. The problem is caused by either /etc/init.d/vmware script or vmware-modules.

I couldn't reproduce it before, because when I was testing, I didn't have the need to use vmware at that moment, but I used it when this issue seemed fixed and I had same issue [and thus, this is the reason wich made me reopen this bug].


Reproduce: (Things inside "[" and "]" are, as I think optional).

1.- Have internet connection through forcedeth controlled hardware and doing always something (mldonkey for example)
2.- Have vmware[-server] installed and configured to use NAT (all options by default)
3.- Start an X session [with nvidia]
4.- Run as root "/etc/init.d/vmware start" (wich starts too xinetd)
5.- [Use a bit vmware machine]
6.- Run as root "/etc/init.d/vmware stop" (wich does not stop what had been started (xinetd) and leave xinetd started wich probably cause troubles - I didn't test if stopping xinetd too prevents this panic)
7.- (As you have forcedeth active) wait enough (probably less than 24-48h).

Please, I need someone else to test this bug.

Comment 15 Daniel Drake (RETIRED) gentoo-dev

2007-01-11 19:07:19 UTC

vmware is also a closed-source binary module. Is this still unreproducible without any binary modules loaded?

Comment 16 David Carlos Manuelda 2007-01-11 19:38:25 UTC

Yes, I can't reproduce without any binary modules, and with latest kernel It seems that I can't reproduce either,. So I mark it as fixed for now.