Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 142085 - kernel bug in 2.6.17-gentoo-r4: xirc2ps_cs do_reset uses msleep
Summary: kernel bug in 2.6.17-gentoo-r4: xirc2ps_cs do_reset uses msleep
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Daniel Drake (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-29 07:31 UTC by Jörg Ahrens
Modified: 2006-09-09 19:37 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
patch against 2.6.17-gentoo-r4 (xirc2ps_cs_watchdog.diff,1.39 KB, patch)
2006-07-29 07:34 UTC, Jörg Ahrens
Details | Diff
corrected patch against 2.6.17-gentoo-r4 (xirc2ps_cs_watchdog_take2.diff,1.95 KB, patch)
2006-08-14 15:24 UTC, Jörg Ahrens
Details | Diff
patch against 2.6.17-gentoo-r4 (xirc2ps_cs_watchdog_take4.diff,1.81 KB, patch)
2006-08-18 16:33 UTC, Jörg Ahrens
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Ahrens 2006-07-29 07:31:51 UTC
I am using a Xircom CEM33 pcmcia NIC which has occasional hardware problems. If the netdev watchdog detects a transmit timeout, do_reset is called which msleeps. As dev_watchdog holds a spinlock, this msleep is fatal:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out
scheduling while atomic: sh/0x00000100/27135
 [<c02c30e6>] schedule+0x536/0x610
 [<c01314df>] __do_IRQ+0x8f/0xa0
 [<c01314c4>] __do_IRQ+0x74/0xa0
 [<c02c3855>] schedule_timeout+0x45/0xa0
 [<c011c7f0>] process_timeout+0x0/0x10
 [<c4c51bdf>] hardreset+0x5f/0x70 [xirc2ps_cs]
 [<c011ca66>] msleep+0x26/0x30
 [<c4c51c22>] do_reset+0x32/0x3a0 [xirc2ps_cs]
 [<c4c5170c>] do_tx_timeout+0x2c/0x80 [xirc2ps_cs]
 [<c02811c0>] dev_watchdog+0x0/0x90
 [<c0281245>] dev_watchdog+0x85/0x90
 [<c011c554>] run_timer_softirq+0xb4/0x180
 [<c0118cb2>] __do_softirq+0x42/0xa0
 [<c0118d36>] do_softirq+0x26/0x30
 [<c010421f>] do_IRQ+0x1f/0x30
 [<c0102afa>] common_interrupt+0x1a/0x20

I looked at other NIC drivers and created a patch unsing schedule_work which solves this problem.

I think this is not a problem of any gentoo patches as I found the same buggy code in vanilla kernel 2.6.15.
Comment 1 Jörg Ahrens 2006-07-29 07:34:35 UTC
Created attachment 93003 [details, diff]
patch against 2.6.17-gentoo-r4
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2006-08-13 02:49:47 UTC
Looks good, but the work structure should be initialised during xirc2ps_probe rather than when the interface is brought up. Can you correct the patch?
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2006-08-13 02:50:34 UTC
Also the ugly (void (*)(void *)) casts are not needed
Comment 4 Jörg Ahrens 2006-08-14 15:24:12 UTC
Created attachment 94275 [details, diff]
corrected patch against 2.6.17-gentoo-r4 

of course we could get rid of the #ifdef HAVE_TX_TIMEOUT too
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2006-08-18 03:34:18 UTC
Why have you added the extra changes?

do_tx_timeout is not used unless HAVE_TX_TIMEOUT is defined, so no need to provide the "alternate version". As it is static, the compiler will remove it if HAVE_TX_TIMEOUT is not defined (but it always will be). Just provide xirc2ps_tx_timeout_task in similar fashion.

Also, tx_errors should be incremented immediately, not when the workqueue gets around to scheduling the task.
Comment 6 Jörg Ahrens 2006-08-18 16:33:33 UTC
Created attachment 94565 [details, diff]
patch against 2.6.17-gentoo-r4 

You are right, the additional conditional stuff is crap. 
The forward declaration of xirc2ps_tx_timeout_task is necessary as INIT_WORK
went down.
I changed the argument of xirc2ps_tx_timeout_task to what it is supposed to be
to avoid a a compile warning at INIT_WORK.
As a consequence I inserted the local variable *dev in xirc2ps_tx_timeout_task
to avoid two type casts.
I hope it's OK now.
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2006-08-20 13:52:04 UTC
thanks, sent upstream
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2006-08-29 19:55:39 UTC
This is now merged into Linus' tree for 2.6.18-rc6 or 2.6.18 (depending which comes next!)

Comment 9 Daniel Drake (RETIRED) gentoo-dev 2006-09-09 19:37:41 UTC
Fixed in gentoo-sources-2.6.17-r8 (genpatches-2.6.17-10). Thanks for the patch.