Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 299718 - kernel 2.6.31-gentoo-r6 module mii/sis190 - SIS 190 ethernet card transmit queue fails under heavy throughput
Summary: kernel 2.6.31-gentoo-r6 module mii/sis190 - SIS 190 ethernet card transmit qu...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High major
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard: linux-2.6.31,linux-2.6.32
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-05 10:09 UTC by MrFluffy
Modified: 2010-02-03 14:04 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description MrFluffy 2010-01-05 10:09:29 UTC
I searched for sis190 as I have been following this bug recently through various kernel releases but either Im driving the search wrong or its not on the hot topics anymore. Apologies if its a dup of a existing one.

Uname -a 
Linux mybox 2.6.31-gentoo-r6 #1 SMP Thu Dec 31 02:05:18 CET 2009 i686 Intel(R) Celeron(R) CPU E1400 @ 2.00GHz GenuineIntel GNU/Linux

Under heavy sustained ethernet useage the network card goes away, the error in dmesg is :-

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x101/0x190()
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (sis190): transmit queue 0 timed out
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 usbhid usb_storage snd_hda_codec_realtek snd_hda_intel snd_hda_codec ohci_hcd ehci_hcd snd_pcm ssb usbcore snd_timer pcmcia rtc_cmos snd rtc_core pcspkr pcmcia_core rtc_lib sis190 shpchp snd_page_alloc sg mii sis_agp pci_hotplug agpgart
Pid: 0, comm: swapper Not tainted 2.6.31-gentoo-r6 #1
Call Trace:
 [<c012f6c9>] warn_slowpath_common+0x60/0x90
 [<c012f72d>] warn_slowpath_fmt+0x24/0x27
 [<c043220c>] dev_watchdog+0x101/0x190
 [<c013db90>] ? insert_work+0x78/0x81
 [<c013e1f5>] ? __queue_work+0x26/0x2b
 [<c043210b>] ? dev_watchdog+0x0/0x190
 [<c0137a4c>] run_timer_softirq+0x124/0x17c
 [<c013438b>] __do_softirq+0xac/0x14e
 [<c01342df>] ? __do_softirq+0x0/0x14e
 <IRQ>  [<c014a1db>] ? tick_periodic+0x6c/0x6e
 [<c0133efc>] ? irq_exit+0x29/0x2b
 [<c0112776>] ? smp_apic_timer_interrupt+0x6f/0x7d
 [<c0103236>] ? apic_timer_interrupt+0x2a/0x30
 [<c0108079>] ? mwait_idle+0x8a/0xc3
 [<c0149b29>] ? clockevents_register_device+0x99/0x9d
 [<c0101b46>] ? cpu_idle+0x3f/0x55
 [<c048ce37>] ? start_secondary+0x19e/0x1a3
---[ end trace 8663259883d76652 ]---

The card is a onboard on my headless multi pvr card equipped "mythtv" server box, there are no pci slots left to insert a external ethernet card as a temporary fix and a spare usb based nic I tried doesnt have sufficient throughput to do the job needed (ie more than a 10Mb but less than 100Mb required).

This was present in 2.6.29-r5, 2.6.30-r6 and older kernels also. The behaviour has changed in that the ethernet card used to go deaf and just lock up (I had a shell script to watch for that and do a network restart in earlier kernels) although in the current release the watchdog does its task correctly but the box looses ethernet for 10-15 seconds while it happens.

Upstream of the box is a cisco catalyst, ive tried swapping the actual ethernet cable out to another device on the same port of the switch and putting the same throughput level onto the other box using nfs and the infrastructure seems to be fine.
Im not certain what additional info would help you or just serve to swamp this bug report, so I will await someones input before adding to the noise.
Flagging it as major because I cant find a easy workround apart from purchasing an alternative motherboard and it doesnt work as it should.

Reproducible: Always

Steps to Reproduce:
1. Fire up mythtv remote front end and watch live tv / pull large files from nfs server
2. Wait for ethernet to lock up
3. Wait for box to recover - repeat 1-3 aprox every 30 minutes of sustained >10Mb throughput

Actual Results:  
watchdog message appears in DMESG output and the ethernet goes away then comes back a short period later

Expected Results:  
box continues to respond to the network despite heavier throughput
Comment 1 MrFluffy 2010-01-05 10:24:57 UTC
# emerge --info
Portage 2.1.6.13 (default/linux/x86/10.0/server, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.31-gentoo-r6 i686)
=================================================================
System uname: Linux-2.6.31-gentoo-r6-i686-Intel-R-_Celeron-R-_CPU_E1400_@_2.00GHz-with-glibc2.0
Timestamp of tree: Wed, 30 Dec 2009 23:30:01 +0000
app-shells/bash:     3.2_p39
dev-lang/python:     2.5.4-r2
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.13, 2.63
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/sandbox.d /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-O2 -march=i686 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LDFLAGS="-Wl,-O1"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa apache2 avahi berkdb bindist branding bzip2 cli cracklib crypt curl cxx dbus dri exif expat ffmpeg fortran gdbm gnome gpm gtk hal iconv ipv6 ldap libextractor modules mudflap mysql ncurses nfs nls nptl nptlonly openmp pam pcre perl pppd python qt3 readline reflection session snmp spl ssl svg sysfs taglib tcpd truetype unicode usb v4l vcd vorbis wma x86 xine xml xorg xv xvid xvmc zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="sis"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Comment 2 Mike Pagano gentoo-dev 2010-01-06 00:31:58 UTC
Does this occur with gentoo-sources-2.6.32-r1
Comment 3 Mike Pagano gentoo-dev 2010-01-06 00:32:25 UTC
Maybe similar
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/472097
Comment 4 MrFluffy 2010-01-06 14:12:31 UTC
I unmasked and emerge'd in 2.6.32-gentoo-r1 to test. The other bug report referenced specifically mentions the skge module, which isnt present in this case.

Uname now is :- 
Linux mybox 2.6.32-gentoo-r1 #1 SMP Wed Jan 6 12:55:59 CET 2010 i686 Intel(R) Celeron(R) CPU E1400 @ 2.00GHz GenuineIntel GNU/Linux

It has froze badly once and dumped into the dmesg during a short test, but my beta testers are settling down for a afternoon of kids tv while we're snowed in, so will tail the logs and report how many times my name is uttered in vain over the period.

Trace from the dmesg :-

WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x101/0x190()
Hardware name: System Product Name
NETDEV WATCHDOG: eth0 (sis190): transmit queue 0 timed out
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 usbhid tuner_simple tuner_types tuner msp3400 saa7115 snd_hda_codec_realtek ivtv i2c_algo_bit snd_hda_intel cx2341x v4l2_common snd_hda_codec videodev v4l1_compat tveeprom ohci_hcd rtc_cmos ssb rtc_core pcspkr rtc_lib snd_pcm ehci_hcd pcmcia sg sis190 snd_timer i2c_core pcmcia_core usbcore shpchp snd sis_agp snd_page_alloc agpgart mii pci_hotplug
Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r1 #1
Call Trace:
 [<c0131dbd>] warn_slowpath_common+0x60/0x90
 [<c0131e21>] warn_slowpath_fmt+0x24/0x27
 [<c044a8ac>] dev_watchdog+0x101/0x190
 [<c0336010>] ? blk_rq_timed_out_timer+0xd3/0xdb
 [<c013c774>] run_timer_softirq+0x16b/0x1eb
 [<c044a7ab>] ? dev_watchdog+0x0/0x190
 [<c0137087>] __do_softirq+0xac/0x151
 [<c0136fdb>] ? __do_softirq+0x0/0x151
 <IRQ>  [<c014f42b>] ? tick_periodic+0x6c/0x6e
 [<c0136c7b>] ? irq_exit+0x29/0x2b
 [<c0113805>] ? smp_apic_timer_interrupt+0x6f/0x7d
 [<c01032d6>] ? apic_timer_interrupt+0x2a/0x30
 [<c0108012>] ? mwait_idle+0x7d/0x88
 [<c0101b5d>] ? cpu_idle+0x3f/0x56
 [<c04988af>] ? rest_init+0x53/0x55
 [<c0664807>] ? start_kernel+0x2b5/0x2ba
 [<c0664091>] ? i386_start_kernel+0x91/0x96
---[ end trace 0c332a5f34cd2185 ]---

Since then the following have appeared during shorter sync losses but no further traces, I have discovered these are triggered when my script see's that the interface has gone down by non response from a few hosts on the local subnet and restarts it from the net.eth0 init script. If I disable my watcher script from cron it just goes deaf and requires a manual restart. 
The log entries (taken from dmesg) support the infrastructure not being the culprit as on restart it always negotiates a 100fdx connection to the switch :-

eth0: mii ext = 0000.
eth0: mii lpa=41e1 adv=01e1 exp=0001.
eth0: link on 100 Mbps Full Duplex mode.
eth0: no IPv6 routers present

The core problem just seems to be the ethernet card goes deaf under sustained load, and its only my script recovering it to be a pause rather than a proper outage.

Comment 5 MrFluffy 2010-01-06 15:57:52 UTC
update, the problem is MUCH worse under 2.6.32, Ive just had to reboot back to 2.6.29-gentoo-r5 as its our sole source of television feed.
The people trying it said the network issue occurs aprox every 2 minutes under 2.6.32-gentoo-r1.

To be clear this is the gigabit version of the network card not the faste version, although my switch is only 100fdx capable on that port.

Is there anything more I can add, or will this be something to take up with the sis190 driver maintainer in the mainstream kernel lists? I have a feeling from googling the problem its more of a core linux issue than a gentoo specific variant.
Comment 6 George Kadianakis (RETIRED) gentoo-dev 2010-01-11 22:21:00 UTC
(In reply to comment #5)
> Is there anything more I can add, or will this be something to take up with the
> sis190 driver maintainer in the mainstream kernel lists? I have a feeling from
> googling the problem its more of a core linux issue than a gentoo specific
> variant.
> 

Personally, I think that reporting this issue upstream is the best current course of action.
Do go ahead ;)
Comment 7 Mike Pagano gentoo-dev 2010-02-03 14:04:57 UTC
Please file upstream at bugzilla.kernel.org and post the link back here and we will follow the upstream bug.