Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 132056 - Extremely slow network with sky2 driver
Summary: Extremely slow network with sky2 driver
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-02 13:46 UTC by Barry Shilliday
Modified: 2006-05-21 08:35 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg output (sky2.txt,15.11 KB, text/plain)
2006-05-02 13:48 UTC, Barry Shilliday
Details
Clean boot with untainted drivers (dmesg2.txt,15.11 KB, text/plain)
2006-05-02 16:02 UTC, Barry Shilliday
Details
sky2-v1.3-rc1 for 2.6.15-gentoo-r1 (sky2-v1.3-rc1.patch,71.20 KB, patch)
2006-05-08 14:19 UTC, Daniel Drake (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Barry Shilliday 2006-05-02 13:46:28 UTC
With the sky2 driver in gentoo-source 2.6.15 there are no problems at all. With the updated sky2 drivers in 2.6.16 there are serious problems: with version 1.1 the network doesn't work at all (module loads but cannot even ping the router).

With the new version 1.2 included in gentoo-sources-2.6.15-r6 the module loads but the network is extremely slow (even for local connections). For example, an ssh session to a locally networked machine gives 1 second delays on each keypress. A file transfer is limited to about 25kb/sec.

Switching back to kernel 2.6.15 and the earlier sky2 driver and there are no problems.
Comment 1 Barry Shilliday 2006-05-02 13:48:22 UTC
Created attachment 86019 [details]
dmesg output

dmesg from gentoo-sources-2.6.16-r6 (r5 has a sky2 that doesn't compile properly).
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2006-05-02 14:54:40 UTC
Please post /proc/interrupts from a bad kernel
Comment 3 Barry Shilliday 2006-05-02 15:24:59 UTC
From 2.6.16-r6 (bad kernel)

           CPU0
  0:      49149    IO-APIC-edge  timer
  1:          8    IO-APIC-edge  i8042
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 14:       1887    IO-APIC-edge  ide0
 15:        328    IO-APIC-edge  ide1
 16:          3   IO-APIC-level  ehci_hcd:usb1
 17:        603   IO-APIC-level  ohci_hcd:usb2
 18:          1   IO-APIC-level  sky2
 19:          0   IO-APIC-level  EMU10K1
NMI:          0
LOC:      49059
ERR:          0
MIS:          0

From 2.6.15-r1 (good kernel):

           CPU0
  0:    1076284    IO-APIC-edge  timer
  1:         10    IO-APIC-edge  i8042
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 14:      58963    IO-APIC-edge  ide0
 15:       9562    IO-APIC-edge  ide1
 16:          3   IO-APIC-level  ehci_hcd:usb1
 17:      56143   IO-APIC-level  ohci_hcd:usb2
 18:      14320   IO-APIC-level  sky2
 19:        800   IO-APIC-level  EMU10K1
 20:      83220   IO-APIC-level  nvidia
NMI:          0
LOC:    1076203
ERR:          0
MIS:          0
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2006-05-02 15:45:49 UTC
Please reproduce this without the nvidia binary driver loaded, and post a new dmesg. It's very unlikely that nvidia would have an effect, but the sky2 developer will be reluctant to look at this unless the kernel is not tainted.
Comment 5 Barry Shilliday 2006-05-02 15:52:48 UTC
The same problem exists with or without the nvidia driver loaded. In the above output for the bad kernel I had re-tested without loading X11 at all and had the same issue.
Comment 6 Barry Shilliday 2006-05-02 16:02:58 UTC
Created attachment 86029 [details]
Clean boot with untainted drivers

dmesg output as requested without nvidia driver.
Comment 7 Barry Shilliday 2006-05-02 16:07:37 UTC
Some further information.

From lspci -v:

03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
        Subsystem: Micro-Star International Co., Ltd. Marvell 88E8053 Gigabit Ethernet Controller (MSI)
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at fdbfc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at bc00 [size=256]
        [virtual] Expansion ROM at fda00000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting

Also, on rebooting 2.6.16-r6 I've received some kernel panics. I cannot copy and paste the information but have recorded this:

Unable to handle kernel NULL pointer dereference at virtual address 00000428
EIP is at sky2_poll+0x1da/0x870 [sky2]

Call Trace: [truncated]
ktime_get
net_rx_action
__do_softirq
do_IRQ
common_interrupt
default_idle
cpu_idle
start_kernel
unknown_bootoption

Kernel panic - not syncing: Fatal exception in interrupt.
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2006-05-02 16:13:46 UTC
Stephen,

Barry has reported a sky2 issue in the bug, which sounds the same as Jo
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2006-05-02 16:13:46 UTC
Stephen,

Barry has reported a sky2 issue in the bug, which sounds the same as João Oliveirinha's issue in bug #131274 (recall that he's the guy with edge triggered interrupts).

However, Barry has level triggered interrupts. You can find the /proc/interrupts info in comment #3 and dmesg output in attachment #86029 [details].

sky2 v1.2 addr 0xfdbfc000 irq 18 Yukon-EC (0xb6) rev 1

v0.15 (from 2.6.15): working
v1.1 (from 2.6.17-rc1 or so): broken, can't ping anything
v1.2 (from 2.6.17-rc3): working, but very very slowly

What other info can we provide? Would it help if we walked through the post-0.15 changes until we found the bad one, or do you know which patch will have caused the initial regression?
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2006-05-02 16:14:33 UTC
Barry,

Please file a separate bug report for the oops in comment #7. Thanks.
Comment 11 Daniel Drake (RETIRED) gentoo-dev 2006-05-03 14:17:38 UTC
Stephen posted a new sky2 version here:

http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2
Comment 12 Barry Shilliday 2006-05-04 02:25:40 UTC
(In reply to comment #10)
> Stephen posted a new sky2 version here:
> 
> http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2
> 

I haven't had time to do full testing, but so far this seems to have fixed the problem: it cleanly compiles against 2.6.16-r6 and the network delays seem to have disappeared.
Comment 13 Barry Shilliday 2006-05-04 15:18:03 UTC
While the obvious problems seem to have disappeared, this seems a little odd:

With 2.6.15 (never any problems) - pinging the router:

PING 10.0.0.138 (10.0.0.138) 56(84) bytes of data.
64 bytes from 10.0.0.138: icmp_seq=1 ttl=64 time=0.604 ms
64 bytes from 10.0.0.138: icmp_seq=2 ttl=64 time=0.467 ms
64 bytes from 10.0.0.138: icmp_seq=3 ttl=64 time=0.452 ms
64 bytes from 10.0.0.138: icmp_seq=4 ttl=64 time=0.467 ms
64 bytes from 10.0.0.138: icmp_seq=5 ttl=64 time=0.471 ms
64 bytes from 10.0.0.138: icmp_seq=6 ttl=64 time=0.474 ms
64 bytes from 10.0.0.138: icmp_seq=7 ttl=64 time=0.476 ms
64 bytes from 10.0.0.138: icmp_seq=8 ttl=64 time=0.466 ms
64 bytes from 10.0.0.138: icmp_seq=9 ttl=64 time=0.483 ms
64 bytes from 10.0.0.138: icmp_seq=10 ttl=64 time=0.465 ms

--- 10.0.0.138 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8998ms
rtt min/avg/max/mdev = 0.452/0.482/0.604/0.046 ms

With 2.6.16-r6 with new sky2 driver:

PING 10.0.0.138 (10.0.0.138) 56(84) bytes of data.
64 bytes from 10.0.0.138: icmp_seq=1 ttl=64 time=132 ms
64 bytes from 10.0.0.138: icmp_seq=2 ttl=64 time=33.8 ms
64 bytes from 10.0.0.138: icmp_seq=3 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=4 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=5 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=6 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=7 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=8 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=9 ttl=64 time=32.9 ms
64 bytes from 10.0.0.138: icmp_seq=10 ttl=64 time=32.9 ms

--- 10.0.0.138 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8998ms
rtt min/avg/max/mdev = 32.994/43.080/132.977/29.967 ms
Comment 14 Barry Shilliday 2006-05-06 02:13:50 UTC
Further testing reveals this isn't fixed after all. A local file transfer is limited to about 600kb/sec, eg with a local rsync:

sky2 v1.3rc-1

livecd-i686-installer-2006.0.iso
    21004288   2%  624.00kB/s    0:18:56

sky2 v0.12:

livecd-i686-installer-2006.0.iso
   205881344  28%    9.86MB/s    0:00:51


I also noticed that in /proc/interrupts, sky2 remains on '1' with the new driver, while with the old it increases.
Comment 15 Daniel Drake (RETIRED) gentoo-dev 2006-05-08 13:51:37 UTC
Ok, so the interrupts aren't getting delivered, as Stephen suspected. Which 2.6.15 kernels do you have available?
Comment 16 Barry Shilliday 2006-05-08 13:57:58 UTC
I've been using gentoo-sources-2.6.15-r1 (the previous stable x86 release) and never had a problem with any kernel until the updated sky2 driver appeared in the 2.6.16 releases.
Comment 17 Daniel Drake (RETIRED) gentoo-dev 2006-05-08 14:19:20 UTC
Created attachment 86445 [details, diff]
sky2-v1.3-rc1 for 2.6.15-gentoo-r1

Please apply this to your 2.6.15-gentoo-r1 kernel and see what happens then
Comment 18 Barry Shilliday 2006-05-09 03:02:40 UTC
Patch applied to 2.6.15-r1. Exact same problem occurs: network slows down to max ~600kb/sec, ping latencies drop and value in /proc/interrupts does not increase.
Comment 19 Barry Shilliday 2006-05-09 03:22:53 UTC
More testing:

I copied sky2.c and sky2.h from the vanilla 2.6.16 sources (version 0.15) onto gentoo-sources-2.6.16-r6. Absolutely fine:

# ping -c5 10.0.0.4
PING 10.0.0.4 (10.0.0.4) 56(84) bytes of data.
64 bytes from 10.0.0.4: icmp_seq=1 ttl=64 time=0.312 ms
64 bytes from 10.0.0.4: icmp_seq=2 ttl=64 time=0.315 ms
64 bytes from 10.0.0.4: icmp_seq=3 ttl=64 time=0.307 ms
64 bytes from 10.0.0.4: icmp_seq=4 ttl=64 time=0.304 ms
64 bytes from 10.0.0.4: icmp_seq=5 ttl=64 time=0.304 ms

rsync:

livecd-i686-installer-2006.0.iso
   295993344  40%    9.65MB/s    0:00:43 
Comment 20 Daniel Drake (RETIRED) gentoo-dev 2006-05-10 08:28:40 UTC
Does it make any difference if you use the "disable_msi" parameter on the 2.6.15 kernel patched with 1.3-rc1?
Comment 21 Stephen Hemminger 2006-05-10 10:03:01 UTC
Yon need to retest with the 1.3 version of the driver, and with this
patch posted yesterday.

http://www.spinics.net/lists/netdev/msg04377.html

The patch fixes a problem that could cause interrupt mask to get
set to disable all status interrupts.
Comment 22 Barry Shilliday 2006-05-10 11:57:09 UTC
Having applied this patch to 1.3-rc1 against 2.6.16-gentoo-r7 things are looking a lot better: ping latencies are back to normal and I can transfer locally at around 9mb/sec again (performance just seems slightly down on v0.15 where I was getting around 10mb/sec). The sky2 value in /proc/interrupts is also increasing.
Comment 23 Daniel Drake (RETIRED) gentoo-dev 2006-05-21 08:35:59 UTC
Thanks for testing. Fixed in gentoo-sources-2.6.16-r8 (genpatches-2.6.16-10)