Hi, I am having this issue since at least 2 months, my system suddenly freezes after some period of time (could not tell how much). I though it was a beryl issue, but lately I have xdm stop when I am not in the comp, and in a tty I saw what was the problem. I have a kernel panic related to forcedeth and always the back trace is the same when it happens. The only way I have to reproduce it is to have forcedeth eth active (for example with mldonkey) and to wait enough. I will post the backtraces (via a smart phone photograph attach) soon (as I am not now in home). When I post it, please recommend me wheter is a gentoo issue or an upstream issue (in that case I'll file a bug in kernel bug system). Thanks.
This happens to me with <= last stable kernel (2.6.1?-r4) (note: ? = I don't remember now exact number) in my amd64 system.
1/ Please, don't restrict bugs without any reason. 2/ Unless you can provide some information, like relevant part of /var/log/messages, the kernel panic output or whatever else relevant, we really can't guess.
Yes I know you can't guess, I said I will attach it and I will do it now. The following is the kernel panic and backtrace I get:
Created attachment 104666 [details] Kernel panic image This is the kernel panic I can see.
Please post "emerge --info" to every bug that you file. Is this reproducible on 2.6.19 without any binary modules loaded? If so please post a new photo from there.
Created attachment 104722 [details] emerge --info
Ok, as I don't know how much time I have to let my comp to have this panic, I'll try it this night removing the only one binary module I have (nvidia driver). If I can still get this, I'll post new image. Note that I've seen this panic at least two times I remember and the backtrace is very similar (if not the same).
Make sure you ensure nvidia has not been loaded *at all* since that boot, and remember to upgrade to 2.6.19
I booted my system with newly ~amd64 2.6.19-r2 and loaded *only* the modules you can get from kernel config. System is up 18 hours now without getting this panic (I'll wait for another 12 hours more) but it seems that is unaffected this way. I noted that with this kernel I get more warnings while compiling it and spca5xx does not compile now with this kernel (should I file a bug for this?) A question more: - Maybe latest nvidia beta driver (wich is the one I am using) is causing problems? (but is weird, what has to be nvidia with tcp handling?)
Maybe your problem is solved in 2.6.19, or it could be related to those modules. nvidia is not related to tcp, but any kernel module can screw with any kernel memory and nvidia has been known to do this in the past. out-of-kernel package regressions are tracked on bug 156669
I finally modprobed nvidia module, and started xdm script to use my computer. I was using beryl bor a bit without hangs (specially when running kaffeine). After this, I stopped xdm script and let my comp in a tty and I didn't see any panic now (possible fix with 2.6.19). Always with mldonkey using forcedeth device (eth0). I'll make another test before closing this bug: I'll leave xdm started when I'm sleeping and see (as it happened before) if comp hangs. A question more, if all goes good, what would be the best resolution for this closure and to let other know that 2.6.19 possibly fix this issue?
I've been running comp for 2 days (and a few hours) without the issues I had with 2.6.18-r5. I noted that 2.6.19-r2 is *even more stable* than 2.6.18-r5 (even beryl does not make hang computer as it did with 2.6.18-r5 sometimes), so I suggest stabilization of this new kernel :)
As a last note for possible users having same problem. I found this in kernel.org changelog: [PATCH] forcedeth: Disable INTx when enabling MSI in forcedeth At least some nforce cards continue to send legacy interrupts when MSI is enabled, and these interrupts are treated as unhandled by the kernel. This patch disables legacy interrupts explicitly when enabling MSI mode. The correct fix is to change the MSI infrastructure to disable legacy interrupts when enabling MSI, but this is potentially risky if the device isn't PCI-2.3 or is quirky, so the correct fix is going into mainline, while patches like this one go into -stable. Legend has it that it is most correct to disable legacy interrupts before enabling MSI, but the mainline patch does it in the other order, and this patch is "obviously" the same as mainline. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Hi again, I've discovered the trigger wich allow this panic to occour and also, a finest way to reproduce it [for someone using forcedeth and amd64 (I don't know if it happens in x86 too)]. The problem wasn't nvidia binary module, nor Xorg nor beryl. The problem is caused by either /etc/init.d/vmware script or vmware-modules. I couldn't reproduce it before, because when I was testing, I didn't have the need to use vmware at that moment, but I used it when this issue seemed fixed and I had same issue [and thus, this is the reason wich made me reopen this bug]. Reproduce: (Things inside "[" and "]" are, as I think optional). 1.- Have internet connection through forcedeth controlled hardware and doing always something (mldonkey for example) 2.- Have vmware[-server] installed and configured to use NAT (all options by default) 3.- Start an X session [with nvidia] 4.- Run as root "/etc/init.d/vmware start" (wich starts too xinetd) 5.- [Use a bit vmware machine] 6.- Run as root "/etc/init.d/vmware stop" (wich does not stop what had been started (xinetd) and leave xinetd started wich probably cause troubles - I didn't test if stopping xinetd too prevents this panic) 7.- (As you have forcedeth active) wait enough (probably less than 24-48h). Please, I need someone else to test this bug.
vmware is also a closed-source binary module. Is this still unreproducible without any binary modules loaded?
Yes, I can't reproduce without any binary modules, and with latest kernel It seems that I can't reproduce either,. So I mark it as fixed for now.