Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 124715 - kernel crashes when tg3 is used
Summary: kernel crashes when tg3 is used
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://bugzilla.kernel.org/show_bug.c...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-02 15:47 UTC by Konstantin Agouros
Modified: 2006-04-22 13:01 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Agouros 2006-03-02 15:47:10 UTC
gentoo-sources-2.6.15-r1 machine crashed here is crashdump:

[17372366.136000] Call Trace:
[17372366.136000]  [<f8839624>] tg3_rx+0x184/0x358 [tg3]
[17372366.136000]  [<f883986e>] tg3_poll+0x76/0x129 [tg3]
[17372366.136000]  [<c028ab1d>] net_rx_action+0x69/0xf0
[17372366.136000]  [<c011a43d>] __do_softirq+0x55/0xbd
[17372366.136000]  [<c011a4d2>] do_softirq+0x2d/0x31
[17372366.136000]  [<c0104447>] do_IRQ+0x47/0x4f
[17372366.136000]  [<c0102e52>] common_interrupt+0x1a/0x20
[17372366.136000]  [<c010088c>] default_idle+0x0/0x55
[17372366.136000]  [<c01008b8>] default_idle+0x2c/0x55
[17372366.136000]  [<c010094f>] cpu_idle+0x5a/0x6f
[17372366.136000]  [<c03dc795>] start_kernel+0x14d/0x14f
[17372366.136000] Code: 83 7c 24 20 00 74 1e 0f b6 43 6d c7 83 30 01 00 00 01 0
[17372366.136000]  <0>Kernel panic - not syncing: Fatal exception in interrupt
[17372366.328000]
Comment 1 Konstantin Agouros 2006-03-02 15:53:18 UTC
Here is the missing part of the crash message on the console


17372366.136000] SMP
[17372366.136000] Modules linked in: af_packet autofs4 parport_pc lp parport md3
[17372366.136000] CPU:    0
[17372366.136000] EIP:    0060:[<c028596b>]    Not tainted VLI
[17372366.136000] EFLAGS: 00010246   (2.6.15-gentoo-r1)
[17372366.136000] EIP is at __alloc_skb+0xc5/0x130
[17372366.136000] eax: d98a8e80   ebx: d703d180   ecx: 00000000   edx: d98a8e00
[17372366.136000] esi: 00000080   edi: d703d200   ebp: 00000020   esp: c03dbf00
[17372366.136000] ds: 007b   es: 007b   ss: 0068
[17372366.136000] Process swapper (pid: 0, threadinfo=c03da000 task=c0347b20)
[17372366.136000] Stack: f7b947a8 00010000 f5a19e80 00000122 00000042 f8839624
[17372366.136000]        00000000 f7f703e0 f6daa440 00000000 01230000 00000122
[17372366.136000]        f7f70380 f7f70000 c03dbf74 f883986e f7f70380 00000040
[17372366.136000] Call Trace:
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2006-03-04 04:43:42 UTC
Please post "emerge --info" output to every bug that you file.

How often does the crash occur? Is there a way to reliably reproduce it?
Comment 3 Konstantin Agouros 2006-03-04 04:46:25 UTC
It did happen after probably two days of running. We fell back to 2.4.32 since this is our central fileserver.

Since we don't know what happened beforehand (although I guess backup just started) we can't say if we can reproduce this.

Just a bit of extra info (called from the 2.4.32 running now):
ethtool -i eth0
driver: tg3
version: 3.26
firmware-version: 
bus-info: 01:00.0

if this is any help
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2006-03-04 05:36:57 UTC
I can report the oops but it probably wouldn't get much attention without more details on the reproducability. We would also need to demonstrate that the latest development kernel (currently vanilla-sources-2.6.16_rc5) is affected.

Would you be able to perform further testing on that kernel?
Comment 5 Konstantin Agouros 2006-03-04 05:39:21 UTC
If it wasn't our prodcution server I would be. But since this is a mission critical system I can't play with it. Unfortunately I don't have another box with the same nic in it so we can try to run that one on gentoo-2.6.15-r1

Have there been any changes in this particular driver after 2.6.15-r1?
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2006-03-04 05:47:49 UTC
Yes, but it's hard to say whether they would affect your problem. It is also hard to say whether this problem would reappear in days, months, or even years, even on 2.6.15. If it is *that* rare there is always a chance that 2.4.32 is also affected.

I will file a report upstream, but first I need a little more information. You can safely get this info while running 2.4.32. Please follow this sequence:

emerge -n --oneshot gdb
cd /usr/src/linux-2.6.15-gentoo-r1
rm drivers/net/tg3.o
make CONFIG_DEBUG_INFO=y drivers/net/tg3.o
gdb drivers/net/tg3.o
(at gdb prompt:) list *tg3_rx+0x184

Please paste the gdb output here.
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2006-03-04 05:49:47 UTC
Also, you are missing the very start of the oops report. Here's an example of the kind of thing you'd expect at the top:

Unable to handle kernel paging request at virtual address 40000010
 printing eip:
c022d0b9
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: etc

Also, please post "emerge --info" output
Comment 8 Konstantin Agouros 2006-03-04 05:59:11 UTC
first the gdb-output:

(gdb) list *tg3_rx+0x184
0x3624 is in tg3_rx (skbuff.h:314).
309     extern struct sk_buff *__alloc_skb(unsigned int size,
310                                        gfp_t priority, int fclone);
311     static inline struct sk_buff *alloc_skb(unsigned int size,
312                                             gfp_t priority)
313     {
314             return __alloc_skb(size, priority, 0);
315     }
316
317     static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
318                                                    gfp_t priority)


And emerge --info

Portage 2.0.54 (default-linux/x86/2006.0, gcc-3.4.5, glibc-2.3.5-r2, 2.4.32 i686)
=================================================================
System uname: 2.4.32 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz
Gentoo Base System version 1.6.14
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [disabled]
dev-lang/python:     2.3.5-r2, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O3 -march=pentium3 -fprefetch-loop-arrays -funroll-loops -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /opt/tomcat/conf /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/bind /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium3 -fprefetch-loop-arrays -funroll-loops -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/export/netshare/portagetmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 X acl apache2 apm arts audiofile avi berkdb bitmap-fonts bzip2 crypt cups curl eds emboss encode esd ethereal expat fam foomaticdb fortran gd gdbm gif glut gnome gpm gstreamer gtk2 idn imap imlib ipv6 java jpeg junit kde lcms ldap libg++ libwww mad mbox mikmod mmx mng motif mp3 mpeg ncurses nls nptl ogg opengl oss pam pcre pdflib perl png postgres python qt quicktime readline samba sdl slang snmp spell sse ssl svga tcpd tetex tiff truetype truetype-fonts type1-fonts udev usb vorbis xml xml2 xmms xv zlib userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, MAKEOPTS, PORTDIR_OVERLAY
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2006-03-04 16:39:25 UTC
Is it possible to see the very start of the oops report?
Comment 10 Konstantin Agouros 2006-03-05 01:56:02 UTC
this is all I found in the buffer:

This is crom.netage.de (Linux i686 2.4.32) 08:12:33

crom login: Oops: 0000
CPU:    0
EIP:    0010:[<c0117466>]    Not tainted
EFLAGS: 00010206
eax: 00000013   ebx: 1f800000   ecx: c0324554   edx: 00003268
esi: 00000000   edi: ee5b6000   ebp: 00000011   esp: ee5b7d68
ds: 0018   es: 0018   ss: 0018
Process irc (pid: 2561, stackpage=ee5b7000)
Stack: c02dd5c3 1f800163 ee5b7dac 00000001 f0fd0018 d7c20018 ffffff10 f557069c
       00000010 00000286 f46c49a0 00030001 00000286 00000001 f46c499c f46c4000
       f07c8811 f07c8000 c02158af f46c4000 0000001d 00020001 f07c8000 00000000
Call Trace:    [<c02158af>] [<c028a673>] [<c02803e1>] [<c0116f50>] [<c01073b0>]
  [<c0263ee5>] [<c0263f77>] [<c0139049>] [<c02640e6>] [<c028b686>] [<c02ac2ba>]
  [<c02601bb>] [<c026031b>] [<c0141470>] [<c01072bf>]

Code: 8b 9c ab 00 00 00 c0 c7 04 24 d9 d5 2d c0 89 5c 24 04 e8 83
Comment 11 Daniel Drake (RETIRED) gentoo-dev 2006-03-05 04:32:51 UTC
That looks like a totally separate oops - one that occurred under 2.4.32. To make any sense of it you need to run it through ksymoops (you can find this in portage).
Comment 12 Konstantin Agouros 2006-03-05 07:30:09 UTC
Here is the complete oops

[17372366.136000] Oops: 0003 [#1]
[17372366.136000] SMP
[17372366.136000] Modules linked in: af_packet autofs4 parport_pc lp parport md3
[17372366.136000] CPU:    0
[17372366.136000] EIP:    0060:[<c028596b>]    Not tainted VLI
[17372366.136000] EFLAGS: 00010246   (2.6.15-gentoo-r1)
[17372366.136000] EIP is at __alloc_skb+0xc5/0x130
[17372366.136000] eax: d98a8e80   ebx: d703d180   ecx: 00000000   edx: d98a8e00
[17372366.136000] esi: 00000080   edi: d703d200   ebp: 00000020   esp: c03dbf00
[17372366.136000] ds: 007b   es: 007b   ss: 0068
[17372366.136000] Process swapper (pid: 0, threadinfo=c03da000 task=c0347b20)
[17372366.136000] Stack: f7b947a8 00010000 f5a19e80 00000122 00000042 f8839624
[17372366.136000]        00000000 f7f703e0 f6daa440 00000000 01230000 00000122
[17372366.136000]        f7f70380 f7f70000 c03dbf74 f883986e f7f70380 00000040
[17372366.136000] Call Trace:
[17372366.136000]  [<f8839624>] tg3_rx+0x184/0x358 [tg3]
[17372366.136000]  [<f883986e>] tg3_poll+0x76/0x129 [tg3]
[17372366.136000]  [<c028ab1d>] net_rx_action+0x69/0xf0
[17372366.136000]  [<c011a43d>] __do_softirq+0x55/0xbd
[17372366.136000]  [<c011a4d2>] do_softirq+0x2d/0x31
[17372366.136000]  [<c0104447>] do_IRQ+0x47/0x4f
[17372366.136000]  [<c0102e52>] common_interrupt+0x1a/0x20
[17372366.136000]  [<c010088c>] default_idle+0x0/0x55
[17372366.136000]  [<c01008b8>] default_idle+0x2c/0x55
[17372366.136000]  [<c010094f>] cpu_idle+0x5a/0x6f
[17372366.136000]  [<c03dc795>] start_kernel+0x14d/0x14f
[17372366.136000] Code: 83 7c 24 20 00 74 1e 0f b6 43 6d c7 83 30 01 00 00 01 0
[17372366.136000]  <0>Kernel panic - not syncing: Fatal exception in interrupt
[17372366.328000]
Comment 13 Daniel Drake (RETIRED) gentoo-dev 2006-03-05 07:54:46 UTC
Thanks, that is looking better. You are still missing a few lines from the very top though. I'll paste the sample again for reference:

Unable to handle kernel paging request at virtual address 40000010
 printing eip:
c022d0b9
*pde = 00000000
Oops: 0000 [#1] <--- your log starts here
Comment 14 Konstantin Agouros 2006-03-05 07:56:25 UTC
I am afraid that's all the serial console gave me

Konstantin
Comment 15 Daniel Drake (RETIRED) gentoo-dev 2006-03-05 08:12:35 UTC
Filed a bug upstream. I'm not sure if anything can be done without further testing on your side. Either way, thanks for the report.
Comment 16 Daniel Drake (RETIRED) gentoo-dev 2006-03-10 05:51:27 UTC
Another question. The crash message shows that you have a "md3" module loaded. Where has this come from?
Comment 17 Daniel Drake (RETIRED) gentoo-dev 2006-04-22 13:01:32 UTC
Upstream bug marked invalid as it is not clear where md3 comes from.