Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 271498 - sky2 kernel driver crash "tx timeout" 2.6.29-gentoo-r5
Summary: sky2 kernel driver crash "tx timeout" 2.6.29-gentoo-r5
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-28 07:07 UTC by Rand Aijala
Modified: 2009-09-09 18:15 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rand Aijala 2009-05-28 07:07:41 UTC
dmesg output:

[ 4624.689098] ------------[ cut here ]------------
[ 4624.689102] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xe7/0x148()
[ 4624.689104] Hardware name: System Product Name
[ 4624.689106] NETDEV WATCHDOG: eth0 (sky2): transmit timed out
[ 4624.689107] Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss bonding fuse uinput vboxnetflt vboxdrv w83627ehf hwmon_vid coretemp hwmon snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep nvidia(P) snd_pcm snd_timer snd sky2 joydev soundcore snd_page_alloc scsi_wait_scan
[ 4624.689128] Pid: 0, comm: swapper Tainted: P           2.6.29-gentoo-r5 #1
[ 4624.689130] Call Trace:
[ 4624.689131]  <IRQ>  [<ffffffff81050a18>] warn_slowpath+0xd3/0xf2
[ 4624.689140]  [<ffffffff81508e1b>] ? _spin_unlock_irqrestore+0x3a/0x3c
[ 4624.689143]  [<ffffffff8104567c>] ? task_rq_unlock+0xc/0xe
[ 4624.689144]  [<ffffffff8104b0fb>] ? try_to_wake_up+0x1d3/0x1e5
[ 4624.689146]  [<ffffffff81508e12>] ? _spin_unlock_irqrestore+0x31/0x3c
[ 4624.689148]  [<ffffffff8104b11a>] ? default_wake_function+0xd/0xf
[ 4624.689150]  [<ffffffff810d62af>] ? pollwake+0x4b/0x52
[ 4624.689151]  [<ffffffff8104b10d>] ? default_wake_function+0x0/0xf
[ 4624.689153]  [<ffffffff81508e1b>] ? _spin_unlock_irqrestore+0x3a/0x3c
[ 4624.689155]  [<ffffffff8106a68f>] ? clocksource_read+0x7/0x9
[ 4624.689156]  [<ffffffff8106acdd>] ? getnstimeofday+0x5a/0xbb
[ 4624.689158]  [<ffffffff81508aab>] ? _spin_unlock+0x2f/0x3a
[ 4624.689160]  [<ffffffff8143569b>] ? __netif_tx_unlock+0x14/0x16
[ 4624.689161]  [<ffffffff814356eb>] ? netif_tx_lock+0x4e/0x67
[ 4624.689162]  [<ffffffff8143576e>] ? dev_watchdog+0x0/0x148
[ 4624.689164]  [<ffffffff81435855>] dev_watchdog+0xe7/0x148
[ 4624.689165]  [<ffffffff8106a68f>] ? clocksource_read+0x7/0x9
[ 4624.689167]  [<ffffffff8106acdd>] ? getnstimeofday+0x5a/0xbb
[ 4624.689169]  [<ffffffff81059666>] run_timer_softirq+0x119/0x18f
[ 4624.689171]  [<ffffffff81055725>] __do_softirq+0x77/0x11f
[ 4624.689173]  [<ffffffff810272ec>] call_softirq+0x1c/0x28
[ 4624.689175]  [<ffffffff81028461>] do_softirq+0x34/0x77
[ 4624.689177]  [<ffffffff81055659>] irq_exit+0x3f/0x94
[ 4624.689179]  [<ffffffff81034402>] smp_apic_timer_interrupt+0x77/0x84
[ 4624.689181]  [<ffffffff81026d23>] apic_timer_interrupt+0x13/0x20
[ 4624.689182]  <EOI>  [<ffffffff813025e0>] ? acpi_idle_enter_bm+0x21f/0x257
[ 4624.689186]  [<ffffffff813025d6>] ? acpi_idle_enter_bm+0x215/0x257
[ 4624.689188]  [<ffffffff81406cee>] ? cpuidle_idle_call+0x73/0xad
[ 4624.689190]  [<ffffffff81025046>] ? cpu_idle+0x52/0x9f
[ 4624.689193]  [<ffffffff815033a6>] ? start_secondary+0x18a/0x18e
[ 4624.689194] ---[ end trace 23e45fa83d1a5448 ]---
[ 4624.689195] sky2 eth0: tx timeout
[ 4624.689199] sky2 eth0: transmit ring 265 .. 225 report=265 done=265
[ 4624.689207] sky2 eth0: disabling interface
[ 4624.694483] sky2 eth0: enabling interface
[ 4627.697335] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both

hardware:

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)

This happens at random whenever there is a large amount of local network activity. rsync, scp, ftp, etc.  It always happens during the first significant network transfer after boot.  After the driver is reloaded and interface restarted it may work normal for a while.

Reproducible: Always

Steps to Reproduce:
1. boot gentoo
2. start net.eth0
3. scp a large file, rsync to local portage mirror, or any significant network traffic
4. wait for transfer to stall
Comment 1 Andrew Gaffney (RETIRED) gentoo-dev 2009-05-28 13:25:45 UTC
I think one belongs to you guys.
Comment 2 Rand Aijala 2009-05-28 14:37:48 UTC
I apologize, i screwed up setting the product/component when submitting this.

I should add that this problem as persisted since the 2.6.27 kernel.
Comment 3 Mike Pagano gentoo-dev 2009-06-04 01:06:48 UTC
Possibly related reports of this being fixed with a firmware update:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/83009

Have you research firmware updates or know what version you are running?
Comment 4 Rand Aijala 2009-06-04 03:11:55 UTC
(In reply to comment #3)
> Possibly related reports of this being fixed with a firmware update:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/83009
> 
> Have you research firmware updates or know what version you are running?
> 

(In reply to comment #3)
> Possibly related reports of this being fixed with a firmware update:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/83009
> 
> Have you research firmware updates or know what version you are running?
> 

I've been googling all afternoon and have not found any firmware downloads available for my 88E8056.  Maybe i'm blind and dumb but i cant find any firmware downloads at all on Marvell's site.
Comment 5 Mike Pagano gentoo-dev 2009-08-07 16:31:58 UTC
Some users have determined that moving to firmware 2.2 or newer resolves this issue.

Your motherboard vendor has to supply you the updated firmware. 
Comment 6 Mike Pagano gentoo-dev 2009-08-09 23:34:32 UTC
Let us know what happens after you've located and installed the new firmware
Comment 7 Denis Cheong 2009-08-31 13:39:38 UTC
(In reply to comment #6)
> Let us know what happens after you've located and installed the new firmware
 
Installing new firmware into an on-board BIOS is a non-trivial exercise, and actually requires specialised BIOS-modification software that is not generally available.

For what it's worth, I have also been having a very difficult time with the same issue, and almost identical crash trace.  I have now disabled jumbo frames (MTU was previously set to 9000) and this has stabilised the system somewhat (it now runs for more than 30 minutes without crashing but more testing is required).

To Rand Aijala, the original poster - can you confirm if you had jumbo frames enabled, and if so, whether turning jumbo frames off makes any difference?  Other experts - can anybody confirm whether jumbo frames may be the cause of this issue?

My motherboard is an Asus P6T Deluxe V2 (Core i7 - very recent system running latest available Asus BIOS).  I am currently running 2.6.30 but was running 2.6.29 previously with the same behaviour.

My dmesg output -
Aug 31 22:22:13 dino [  330.539234] ------------[ cut here ]------------
Aug 31 22:22:13 dino [  330.539244] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x122/0x1ca()
Aug 31 22:22:13 dino [  330.539247] Hardware name: System Product Name
Aug 31 22:22:13 dino [  330.539249] NETDEV WATCHDOG: eth0 (sky2): transmit timed out
Aug 31 22:22:13 dino [  330.539251] Modules linked in: snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss ipv6 bridge stp vboxnetadp vboxnetflt vboxdrv snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus pcspkr snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd serio_raw joydev nvidia(P) iTCO_wdt iTCO_vendor_support i2c_i801 sky2 i2c_core asus_atk0110 tg3 e1000 dm_bbr sl811_hcd ohci_hcd uhci_hcd ehci_hcd sx8 scsi_wait_scan b1 kernelcapi
Aug 31 22:22:13 dino [  330.539294] Pid: 0, comm: swapper Tainted: P           2.6.30-sabayon-r1 #1
Aug 31 22:22:13 dino [  330.539297] Call Trace:
Aug 31 22:22:13 dino [  330.539299]  <IRQ>  [<ffffffff8090a518>] ? dev_watchdog+0x122/0x1ca
Aug 31 22:22:13 dino [  330.539310]  [<ffffffff8023f75f>] warn_slowpath_common+0x77/0xa4
Aug 31 22:22:13 dino [  330.539315]  [<ffffffff8023f801>] warn_slowpath_fmt+0x64/0x66
Aug 31 22:22:13 dino [  330.539320]  [<ffffffff80238346>] ? default_wake_function+0xd/0xf
Aug 31 22:22:13 dino [  330.539325]  [<ffffffff8022fe23>] ? __wake_up_common+0x46/0x76
Aug 31 22:22:13 dino [  330.539329]  [<ffffffff8023154a>] ? __wake_up+0x43/0x50
Aug 31 22:22:13 dino [  330.539333]  [<ffffffff808f6f31>] ? netdev_drivername+0x43/0x4a
Aug 31 22:22:13 dino [  330.539337]  [<ffffffff8090a518>] dev_watchdog+0x122/0x1ca
Aug 31 22:22:13 dino [  330.539342]  [<ffffffff8024832e>] ? cascade+0x68/0x81
Aug 31 22:22:13 dino [  330.539347]  [<ffffffff8090a3f6>] ? dev_watchdog+0x0/0x1ca
Aug 31 22:22:13 dino [  330.539351]  [<ffffffff8024854a>] run_timer_softirq+0x157/0x1c6
Aug 31 22:22:13 dino [  330.539356]  [<ffffffff80255b5b>] ? ktime_get_ts+0x49/0x4e
Aug 31 22:22:13 dino [  330.539361]  [<ffffffff8025bccf>] ? clockevents_program_event+0x73/0x7c
Aug 31 22:22:13 dino [  330.539365]  [<ffffffff802446b5>] __do_softirq+0xa7/0x166
Aug 31 22:22:13 dino [  330.539369]  [<ffffffff8020c03c>] call_softirq+0x1c/0x28
Aug 31 22:22:13 dino [  330.539373]  [<ffffffff8020d88c>] do_softirq+0x34/0x72
Aug 31 22:22:13 dino [  330.539376]  [<ffffffff802443bb>] irq_exit+0x3f/0x79
Aug 31 22:22:13 dino [  330.539382]  [<ffffffff8021e019>] smp_apic_timer_interrupt+0x88/0x96
Aug 31 22:22:13 dino [  330.539388]  [<ffffffff8020ba53>] apic_timer_interrupt+0x13/0x20
Aug 31 22:22:13 dino [  330.539390]  <EOI>  [<ffffffff80211d32>] ? mwait_idle+0xb4/0xeb
Aug 31 22:22:13 dino [  330.539398]  [<ffffffff802565a1>] ? atomic_notifier_call_chain+0x13/0x15
Aug 31 22:22:13 dino [  330.539402]  [<ffffffff8020a2e2>] ? cpu_idle+0x52/0x93
Aug 31 22:22:13 dino [  330.539407]  [<ffffffff809ba442>] ? start_secondary+0x195/0x19a
Aug 31 22:22:13 dino [  330.539410] ---[ end trace 43e1fc3556e7ae6b ]---
Aug 31 22:22:13 dino [  330.539413] sky2 eth0: tx timeout
Aug 31 22:22:13 dino [  330.539418] sky2 eth0: transmit ring 402 .. 362 report=402 done=402
Aug 31 22:22:13 dino [  330.539425] sky2 eth0: disabling interface
Aug 31 22:22:13 dino [  330.544048] sky2 eth0: enabling interface
Aug 31 22:22:14 dino ifplugd(eth0)[7110]: Link beat lost.
Aug 31 22:22:16 dino [  333.284355] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
Aug 31 22:22:16 dino ifplugd(eth0)[7110]: Link beat detected.
Comment 8 Denis Cheong 2009-09-06 05:49:27 UTC
This definitely seems to be caused in my instance by jumbo frames, as the system in question has run stably for 4 days with jumbo frames disabled, re-enabling them caused it to crash within 30 seconds as previously described above.  

Either the Linux driver for this card has an bug with jumbo frames, or the combination of the firmware and the Linux driver is causing the crashes when jumbo frames are enabled.
Comment 9 George Kadianakis (RETIRED) gentoo-dev 2009-09-08 13:18:12 UTC
(In reply to comment #8)
> This definitely seems to be caused in my instance by jumbo frames, as the
> system in question has run stably for 4 days with jumbo frames disabled,
> re-enabling them caused it to crash within 30 seconds as previously described
> above.  
> 
> Either the Linux driver for this card has an bug with jumbo frames, or the
> combination of the firmware and the Linux driver is causing the crashes when
> jumbo frames are enabled.
> 

Interesting, thanks for the report. Do you have a 88E8056 Ethernet controller like Rand does?

Rand, could you try disabling jumbo frames and see if the problem persists?

Thanks!

Comment 10 Denis Cheong 2009-09-09 11:42:10 UTC
(In reply to comment #9)
> Interesting, thanks for the report. Do you have a 88E8056 Ethernet controller
> like Rand does?

Doubly-Affirmative.

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)

(Dual ethernet ports on an Asus P6T Deluxe V2 motherboard)
Comment 11 Rand Aijala 2009-09-09 18:15:06 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Interesting, thanks for the report. Do you have a 88E8056 Ethernet controller
> > like Rand does?
> 
> Doubly-Affirmative.
> 
> 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E
> Gigabit Ethernet Controller (rev 12)
> 06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E
> Gigabit Ethernet Controller (rev 12)
> 
> (Dual ethernet ports on an Asus P6T Deluxe V2 motherboard)
> 

Mine is the Asus P6T Deluxe.  I'll try disabling jumbo frames again and see if it helps.