With the sky2 driver in gentoo-source 2.6.15 there are no problems at all. With the updated sky2 drivers in 2.6.16 there are serious problems: with version 1.1 the network doesn't work at all (module loads but cannot even ping the router). With the new version 1.2 included in gentoo-sources-2.6.15-r6 the module loads but the network is extremely slow (even for local connections). For example, an ssh session to a locally networked machine gives 1 second delays on each keypress. A file transfer is limited to about 25kb/sec. Switching back to kernel 2.6.15 and the earlier sky2 driver and there are no problems.
Created attachment 86019 [details] dmesg output dmesg from gentoo-sources-2.6.16-r6 (r5 has a sky2 that doesn't compile properly).
Please post /proc/interrupts from a bad kernel
From 2.6.16-r6 (bad kernel) CPU0 0: 49149 IO-APIC-edge timer 1: 8 IO-APIC-edge i8042 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 1887 IO-APIC-edge ide0 15: 328 IO-APIC-edge ide1 16: 3 IO-APIC-level ehci_hcd:usb1 17: 603 IO-APIC-level ohci_hcd:usb2 18: 1 IO-APIC-level sky2 19: 0 IO-APIC-level EMU10K1 NMI: 0 LOC: 49059 ERR: 0 MIS: 0 From 2.6.15-r1 (good kernel): CPU0 0: 1076284 IO-APIC-edge timer 1: 10 IO-APIC-edge i8042 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 58963 IO-APIC-edge ide0 15: 9562 IO-APIC-edge ide1 16: 3 IO-APIC-level ehci_hcd:usb1 17: 56143 IO-APIC-level ohci_hcd:usb2 18: 14320 IO-APIC-level sky2 19: 800 IO-APIC-level EMU10K1 20: 83220 IO-APIC-level nvidia NMI: 0 LOC: 1076203 ERR: 0 MIS: 0
Please reproduce this without the nvidia binary driver loaded, and post a new dmesg. It's very unlikely that nvidia would have an effect, but the sky2 developer will be reluctant to look at this unless the kernel is not tainted.
The same problem exists with or without the nvidia driver loaded. In the above output for the bad kernel I had re-tested without loading X11 at all and had the same issue.
Created attachment 86029 [details] Clean boot with untainted drivers dmesg output as requested without nvidia driver.
Some further information. From lspci -v: 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) Subsystem: Micro-Star International Co., Ltd. Marvell 88E8053 Gigabit Ethernet Controller (MSI) Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at fdbfc000 (64-bit, non-prefetchable) [size=16K] I/O ports at bc00 [size=256] [virtual] Expansion ROM at fda00000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable- Capabilities: [e0] Express Legacy Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Also, on rebooting 2.6.16-r6 I've received some kernel panics. I cannot copy and paste the information but have recorded this: Unable to handle kernel NULL pointer dereference at virtual address 00000428 EIP is at sky2_poll+0x1da/0x870 [sky2] Call Trace: [truncated] ktime_get net_rx_action __do_softirq do_IRQ common_interrupt default_idle cpu_idle start_kernel unknown_bootoption Kernel panic - not syncing: Fatal exception in interrupt.
Stephen, Barry has reported a sky2 issue in the bug, which sounds the same as Jo
Stephen, Barry has reported a sky2 issue in the bug, which sounds the same as João Oliveirinha's issue in bug #131274 (recall that he's the guy with edge triggered interrupts). However, Barry has level triggered interrupts. You can find the /proc/interrupts info in comment #3 and dmesg output in attachment #86029 [details]. sky2 v1.2 addr 0xfdbfc000 irq 18 Yukon-EC (0xb6) rev 1 v0.15 (from 2.6.15): working v1.1 (from 2.6.17-rc1 or so): broken, can't ping anything v1.2 (from 2.6.17-rc3): working, but very very slowly What other info can we provide? Would it help if we walked through the post-0.15 changes until we found the bad one, or do you know which patch will have caused the initial regression?
Barry, Please file a separate bug report for the oops in comment #7. Thanks.
Stephen posted a new sky2 version here: http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2
(In reply to comment #10) > Stephen posted a new sky2 version here: > > http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2 > I haven't had time to do full testing, but so far this seems to have fixed the problem: it cleanly compiles against 2.6.16-r6 and the network delays seem to have disappeared.
While the obvious problems seem to have disappeared, this seems a little odd: With 2.6.15 (never any problems) - pinging the router: PING 10.0.0.138 (10.0.0.138) 56(84) bytes of data. 64 bytes from 10.0.0.138: icmp_seq=1 ttl=64 time=0.604 ms 64 bytes from 10.0.0.138: icmp_seq=2 ttl=64 time=0.467 ms 64 bytes from 10.0.0.138: icmp_seq=3 ttl=64 time=0.452 ms 64 bytes from 10.0.0.138: icmp_seq=4 ttl=64 time=0.467 ms 64 bytes from 10.0.0.138: icmp_seq=5 ttl=64 time=0.471 ms 64 bytes from 10.0.0.138: icmp_seq=6 ttl=64 time=0.474 ms 64 bytes from 10.0.0.138: icmp_seq=7 ttl=64 time=0.476 ms 64 bytes from 10.0.0.138: icmp_seq=8 ttl=64 time=0.466 ms 64 bytes from 10.0.0.138: icmp_seq=9 ttl=64 time=0.483 ms 64 bytes from 10.0.0.138: icmp_seq=10 ttl=64 time=0.465 ms --- 10.0.0.138 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8998ms rtt min/avg/max/mdev = 0.452/0.482/0.604/0.046 ms With 2.6.16-r6 with new sky2 driver: PING 10.0.0.138 (10.0.0.138) 56(84) bytes of data. 64 bytes from 10.0.0.138: icmp_seq=1 ttl=64 time=132 ms 64 bytes from 10.0.0.138: icmp_seq=2 ttl=64 time=33.8 ms 64 bytes from 10.0.0.138: icmp_seq=3 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=4 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=5 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=6 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=7 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=8 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=9 ttl=64 time=32.9 ms 64 bytes from 10.0.0.138: icmp_seq=10 ttl=64 time=32.9 ms --- 10.0.0.138 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8998ms rtt min/avg/max/mdev = 32.994/43.080/132.977/29.967 ms
Further testing reveals this isn't fixed after all. A local file transfer is limited to about 600kb/sec, eg with a local rsync: sky2 v1.3rc-1 livecd-i686-installer-2006.0.iso 21004288 2% 624.00kB/s 0:18:56 sky2 v0.12: livecd-i686-installer-2006.0.iso 205881344 28% 9.86MB/s 0:00:51 I also noticed that in /proc/interrupts, sky2 remains on '1' with the new driver, while with the old it increases.
Ok, so the interrupts aren't getting delivered, as Stephen suspected. Which 2.6.15 kernels do you have available?
I've been using gentoo-sources-2.6.15-r1 (the previous stable x86 release) and never had a problem with any kernel until the updated sky2 driver appeared in the 2.6.16 releases.
Created attachment 86445 [details, diff] sky2-v1.3-rc1 for 2.6.15-gentoo-r1 Please apply this to your 2.6.15-gentoo-r1 kernel and see what happens then
Patch applied to 2.6.15-r1. Exact same problem occurs: network slows down to max ~600kb/sec, ping latencies drop and value in /proc/interrupts does not increase.
More testing: I copied sky2.c and sky2.h from the vanilla 2.6.16 sources (version 0.15) onto gentoo-sources-2.6.16-r6. Absolutely fine: # ping -c5 10.0.0.4 PING 10.0.0.4 (10.0.0.4) 56(84) bytes of data. 64 bytes from 10.0.0.4: icmp_seq=1 ttl=64 time=0.312 ms 64 bytes from 10.0.0.4: icmp_seq=2 ttl=64 time=0.315 ms 64 bytes from 10.0.0.4: icmp_seq=3 ttl=64 time=0.307 ms 64 bytes from 10.0.0.4: icmp_seq=4 ttl=64 time=0.304 ms 64 bytes from 10.0.0.4: icmp_seq=5 ttl=64 time=0.304 ms rsync: livecd-i686-installer-2006.0.iso 295993344 40% 9.65MB/s 0:00:43
Does it make any difference if you use the "disable_msi" parameter on the 2.6.15 kernel patched with 1.3-rc1?
Yon need to retest with the 1.3 version of the driver, and with this patch posted yesterday. http://www.spinics.net/lists/netdev/msg04377.html The patch fixes a problem that could cause interrupt mask to get set to disable all status interrupts.
Having applied this patch to 1.3-rc1 against 2.6.16-gentoo-r7 things are looking a lot better: ping latencies are back to normal and I can transfer locally at around 9mb/sec again (performance just seems slightly down on v0.15 where I was getting around 10mb/sec). The sky2 value in /proc/interrupts is also increasing.
Thanks for testing. Fixed in gentoo-sources-2.6.16-r8 (genpatches-2.6.16-10)