Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 107734 - Adaptec starfire module panics on 2.6.13 till 2.6.14_rc2
Summary: Adaptec starfire module panics on 2.6.13 till 2.6.14_rc2
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Daniel Drake (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-30 13:18 UTC by hvjunk
Modified: 2005-10-13 03:52 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Patch from Herbert Xu fixing the panics from starfire module (patch-starfire,681 bytes, patch)
2005-10-01 06:56 UTC, hvjunk
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description hvjunk 2005-09-30 13:18:59 UTC
Being looked into upstream, but would appreciate a temporary fix in
Gentoo-sources :)

From: Andrew Morton <akpm@osdl.org>
To: Hendrik Visage <hvjunk@gmail.com>
Cc: linux-net@vger.kernel.org, linux-kernel@vger.kernel.org, ionut@badula.org,
   Jeff Garzik <jgarzik@pobox.com>
Subject: Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server
Message-Id: <20050930104046.4685e975.akpm@osdl.org>
In-Reply-To: <d93f04c70509300901s3836b8afw4792d16c589b4fc4@mail.gmail.com>
References: <d93f04c70509292036x269df799y7b51c5be9c3356d6@mail.gmail.com>
	<20050929211649.69eaddee.akpm@osdl.org>
	<d93f04c70509300901s3836b8afw4792d16c589b4fc4@mail.gmail.com>
X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i386-redhat-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, hits=0 required=5 tests=
X-Spam-Checker-Version: SpamAssassin 2.63-osdl_revision__1.45__
X-MIMEDefang-Filter: osdl$Revision: 1.118 $
X-Scanned-By: MIMEDefang 2.36

Hendrik Visage <hvjunk@gmail.com> wrote:
>
> On 9/30/05, Andrew Morton <akpm@osdl.org> wrote:
> 
> > The starfire changes in 2.6.12->2.6.13 look fairly innocuous.  Need that
> > trace, please.
> 
> See attached :)
> 

It helps, thanks.


> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at net/core/dev.c:1099
> invalid operand: 0000 [1] PREEMPT 
> CPU 0 
> Modules linked in: nvidia nfsd exportfs lockd sunrpc rfcomm l2cap hci_usb
bluetooth starfire mii snd_ac97_bus soundcore snd_page_alloc forcedeth
i2c_nforce2 dm_mirror dm_mod sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd
usb_storage usbhid ehci_hcd usbcore
> Pid: 11252, comm: nfsd Tainted: P      2.6.14-rc2 #3
> RIP: 0010:[<ffffffff802cc7ed>] <ffffffff802cc7ed>{skb_checksum_help+157}
> RSP: 0000:ffff81003a0bd998  EFLAGS: 00010246
> RAX: ffff81003ff01624 RBX: ffff81003ca7f180 RCX: 00000000b7e42194
> RDX: 00000000b7e42194 RSI: ffff81003ff01624 RDI: ffff81003b026080
> RBP: ffff81003a0bd9b8 R08: 0000000000000000 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff81003ca7f180 R15: ffff81003d462218
> FS:  00002aaaaade6ae0(0000) GS:ffffffff804fe800(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00002aaaaaac2000 CR3: 000000003d5a2000 CR4: 00000000000006e0
> Process nfsd (pid: 11252, threadinfo ffff81003a0bc000, task ffff81003e0ed0c0)
> Stack: ffffffff804cd720 ffff81003d462000 ffff81003d4623e0 ffff81003ca7f180 
>        ffff81003a0bda08 ffffffff88104944 ffff81003d462218 000000013a2a8600 
>        ffff81003d462000 ffff81003d462000 
> Call Trace:<ffffffff88104944>{:starfire:start_tx+164}
<ffffffff802db0fc>{qdisc_restart+268}
>        <ffffffff802ccad0>{dev_queue_xmit+288}
<ffffffff802d29b0>{neigh_resolve_output+672}
>        <ffffffff802ebb27>{ip_finish_output+455}
<ffffffff802ec5ff>{ip_fragment+863}
>        <ffffffff802eb960>{ip_finish_output+0} <ffffffff802eca6c>{ip_output+108}


yep, there's something wrong with the skb which starfire fed into
skb_checksum_help().

	offset = skb->tail - skb->h.raw;
	if (offset <= 0)
		BUG();

And that's a post-2.6.12 driver change.  You can probably work around
it by deleting the #define ZEROCOPY line.

Reproducible: Always
Steps to Reproduce:
1. Compile starfire into kernel
2. use NFS to output through the starfire interface on a post-2.6.12 x86_64 kernel
3.

Actual Results:  
Kernel panics :)

Expected Results:  
Kernel works ;^P

It's actually worse with pre-empt turned off :(
Comment 1 hvjunk 2005-10-01 06:56:19 UTC
Created attachment 69632 [details, diff]
Patch from Herbert Xu fixing the panics from starfire module

To: Hendrik Visage <hvjunk@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-net@vger.kernel.org,
	linux-kernel@vger.kernel.org, ionut@badula.org,
	Jeff Garzik <jgarzik@pobox.com>, netdev@vger.kernel.org
Subject: Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server
Message-ID: <20050930223915.GA17562@gondor.apana.org.au>
References: <d93f04c70509292036x269df799y7b51c5be9c3356d6@mail.gmail.com>
<20050929211649.69eaddee.akpm@osdl.org>
<d93f04c70509300901s3836b8afw4792d16c589b4fc4@mail.gmail.com>
<20050930104046.4685e975.akpm@osdl.org>
<d93f04c70509301310y4bde1189wbcaef40124af6766@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="fdj2RfSjLxBAspz7"
Content-Disposition: inline
In-Reply-To: <d93f04c70509301310y4bde1189wbcaef40124af6766@mail.gmail.com>
User-Agent: Mutt/1.5.9i
From: Herbert Xu <herbert@gondor.apana.org.au>


--fdj2RfSjLxBAspz7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Sep 30, 2005 at 08:10:59PM +0000, Hendrik Visage wrote:
>
> Anycase, here is a non-PREEMPT traceback. What makes this one
> interesting, is that
> in the preempt case, I had to push the NFS output to get the panic, but the
> non-preempt case attached, sorta just happened, ie. when the clients
> just checked on the server's status :(

You must never call skb_checksum_help unless the packet is meant to
be checksummed by the hardware.  So starfire is the guilty party here.

This patch makes it do the check and also check for errors from
skb_checksum_help.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2005-10-10 08:53:05 UTC
Here is the patch which was merged into Linus' tree:
http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=67974231d4354fe26aaa39a3153b5c0945b94858;hp=32fa2bfcf882f8901ca206e33b0d8975cc8e89a2

Any chance you could test it instead of the one you posted and confirm it solves
the problem?
Comment 3 hvjunk 2005-10-10 09:28:47 UTC
I've been testing (successfully) those versions, but would please like it in
2.6.13 while we wait for 2.6.14
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2005-10-10 14:05:46 UTC
Yep - the only reason I asked that was because we try and stick to backporting
patches from Linus' tree only
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2005-10-13 03:52:15 UTC
Fixed in gentoo-sources-2.6.13-r3 (genpatches-2.6.13-6)