Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 122994 - madwifi-ng (or wlan_*?) modules instability
Summary: madwifi-ng (or wlan_*?) modules instability
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High major (vote)
Assignee: Mobile Herd (OBSOLETE)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-16 01:42 UTC by Andreas Ntaflos
Modified: 2006-04-04 17:25 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Ntaflos 2006-02-16 01:42:42 UTC
Current setup: gentoo-sources-2.6.15-r4, madwifi-driver-0.1443.20060207, madwifi-tools-0.1443.20060207, wpa_supplicant-0.5.1, udev-084, baselayout-1.12.0_pre15. Wireless LAN using WPA-PSK.

Tried: various combinations of gentoo-sources-2.6.15-r[1234] with all versions of the new madwifi-ng packages, same (non-)results.

Here is the gist of what I posted in a thread (see references) in the forums about stability issues with the madwifi (or wlan_*) modules. I think it describes the issues well enough, considering that it's not that trivial to reproduce the problems:

I seem to having some problems with the stability of the madwifi modules. When using madwifi-driver-0.1401.20060117 with gentoo-sources-2.6.15-r1 I thought everything was stable and it really seemed that way. Unfortunately I had to reinstall Gentoo on my laptop (the EXT-3 filesystem was failing, apparently) and from that point forward the problems started, I think. 

I don't know what causes the madwifi module to crash and fail, but it is easily recognized that it crashed, just by running dmesg and seeing a lot of stack trace messages and stuff. After such a crash happens the whole system becomes quite flaky. aclocal segfaulting, firefox segfaulting, segfaults when using ANY open file dialog, Azureus crashing and burning, emerge hanging, etc. Some times the whole X window system freezes hard as well. I have to reboot and everything is fine again, until the next crash of the madwifi modules (or whatever it exactly is). The system is, however, very stable and running nicely when I use the wired interface and not load the madwifi modules. 

Here's the strange thing: the problems manifest themselves apparently only when using Azureus, the popular and nice Java bittorrent client (http://azureus.sf.net). At least one other user (Da Fox in the forums) experiences exactly that, too. It's somehow like this: start Azureus, load a torrent or more, wait some time. Next thing that happens is Azureus quitting with a segfault and an error message from the VM. Then dmesg shows lots of the usual messages that appear when something kernel-related crashed (stack trace, call trace, modules linked in, etc). After that everything seems to go to hell, as described above.

This does not, repeat not, happen when connecting via the wired interface. I have Azureus running for hours now, using the wired interface, everything's stable.

I had two ideas why that instability manifests itself when using Azureus: the first was about the amount of data transferred via the wireless interface, but that is not it; I downloaded several Gentoo istallation ISOs over HTTP and all was well. The second idea is that the number of concurrent TCP connections, which can be quite high using such a P2P application, is triggering the crash.

A third idea would try to connect the Java VM with the madwifi modules, which seems somehow absurd :)

Unfortunately I can't test or reproduce any of this at the moment since I have no wireless LAN available right now. I just thought it would be a good idea to mention this problem to more knowledgeable people and hope for some constructive input on how to debug this. Also, testing and debugging such an issue is not very healty for the system itself :)

What I am going to do, as soon as I can connect to a WLAN again, is lower the max. number of TCP connections in Azurues to 1 and then increase it steadily after some time of running stable. Another thing to test is to disable WPA and use an unencrypted link, maybe the whole thing is somehow related to WPA.

I know this must sound very weird and unreasonable, particularily because I can't find any mention of such problems in the official madwifi.org tickets or mailing lists. But it's a fact that something causes the madwifi or wlan_* modules to crash. Unfortunately I can't tell which is the first module to go down since the dmesg buffer is completely filled after Azureus quits hard.

Hopefully I've come to the right place (or should I have posted this to the madwifi-lists first?).

References:
http://forums.gentoo.org/viewtopic-t-408550-postdays-0-postorder-asc-start-125.html
On that page: 
http://forums.gentoo.org/viewtopic-p-3054620.html#3054620
http://forums.gentoo.org/viewtopic-p-3103582.html#3103582
http://forums.gentoo.org/viewtopic-p-3108624.html#3108624
http://forums.gentoo.org/viewtopic-p-3108924.html#3108924
Comment 1 Henrik Brix Andersen 2006-02-16 02:15:43 UTC
You forgot to attach the output of `emerge --info` and without the actual error messages from you dmesg there's really nothing we can do to help...

Please reopen once you've provided the missing information.
Comment 2 Andreas Ntaflos 2006-02-18 06:48:00 UTC
(In reply to comment #1)
> You forgot to attach the output of `emerge --info` and without the actual error
> messages from you dmesg there's really nothing we can do to help...
> 
> Please reopen once you've provided the missing information.
> 

I am sorry for the missing information. I was now able to somehow reproduce the errors again: loaded Azureus, loaded a torrent, waited a few minutes and Azureus crashed (here: http://daffit.meownz.info/madwifi/azureus_error.txt is the error message, just for the sake of completeness) and although it didn't crash the madwifi modules directly it made the wlan interface drop the connection. Then, upon unloading the madwifi modules with

rmmod ath_pci ath_rate_sample ath_hal wlan_scan_sta wlan_tkip wlan

something crashed: http://daffit.meownz.info/madwifi/dmesg_madwifi.txt

As you can see I reloaded the modules a few times since booting the laptop, but never with Azureus running. But after Azureus crashed unloading the modules resulted in what you can see in the dmesg.

It has definately something to do with Azureus and the wireless connection. Yesterday I ran Azureus and although the connection was stable (as stable as it can get with that Netgear POS router) Firefox segfaulted regularily when opening new tabs.

Is there any other information I could provide?

emerge --info:
Portage 2.1_pre4-r1 (default-linux/x86/2005.1, gcc-3.4.5, glibc-2.3.6-r2, 2.6.15-gentoo-r4 i686)
=================================================================
System uname: 2.6.15-gentoo-r4 i686 Intel(R) Pentium(R) M processor 1700MHz
Gentoo Base System version 1.12.0_pre15
dev-lang/python:     2.3.5-r2, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=pentium-m -mtune=pentium-m -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/splash /etc/terminfo /etc/texmf/web2c /etc/env.d"
CXXFLAGS="-march=pentium-m -mtune=pentium-m -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.switch.ch/ftp/mirror/gentoo http://gentoo.inode.at ftp://gentoo.inode.at/source"
LANG="english"
LC_ALL="en_IE.utf8"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 X alsa apm bash-completion browserplugin cdr crypt cups doc dri dvd dvdread ethereal firefox gcj gif gnome gphoto2 gtk gtk2 i8x0 jpeg latex logrotate madwifi mmx mp3 ncurses nls nptl nptlonly offensive opengl oss pcmcia pda pdf perl png ppds python radeon samba scanner sse sse2 ssl tetex truetype truetype-fonts udev unicode usb userlocales wifi win32codecs xcomposite xinerama xml xosd xprint xv xvid elibc_glibc kernel_linux userland_GNU"
Unset:  ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, LDFLAGS, LINGUAS
Comment 3 Henrik Brix Andersen 2006-02-18 09:33:32 UTC
If both Azureus and firefox crash and segfault on your system, I think there are bigger problems than your wireless network drivers.
Comment 4 Andreas Ntaflos 2006-02-19 09:29:26 UTC
(In reply to comment #3)
> If both Azureus and firefox crash and segfault on your system, I think there
> are bigger problems than your wireless network drivers.
> 

I was afraid you were going to say that. 

However, any problems only occur when I use both a wireless connection and Azureus in combination. When I don't use Azureus on a WLAN everything's fine. When I use Azureus on a wired connection everything's fine, too.

Firefox and other programs only segfault *after* the wlan modules crashed. The wlan modules, in turn, only crash after Azureus died. There has to be a relation somewhere, somehow. I don't get it.

I take it you don't have any problems running Azureus (2.3.0.4 or 2.3.0.6) on a WPA-PSK protected WLAN?

Do you have any idea how I could debug this a little further?
Comment 5 Henrik Brix Andersen 2006-02-19 09:52:59 UTC
I've never used Azureus or any other bittorrent client. Betelgeuse, any idea about this?

Please report this upstream at the madwifi bug tracker and paste the URL here.


Comment 6 Ryan Hill (RETIRED) gentoo-dev 2006-02-19 14:03:43 UTC
have you tried using the Sun Java VM? sun-jdk or sun-jre-bin?  blackdown is notoriously bad with azureus, though why that should leak into kernelspace i don't know.

also try taking -fforce-addr -frename-registers -fprefetch-loop-arrays out of your C[XX]FLAGS.  the first two shouldn't matter but the third is broken with gcc 3.4.
Comment 7 Henrik Brix Andersen 2006-02-20 02:48:11 UTC
NEEDINFO
Comment 8 Andreas Ntaflos 2006-02-20 17:13:38 UTC
I tested this again with sun-jdk 1.4 from portage, but no improvements. I upgraded to sun-jdk 1.5 (didn't set it as the default system VM of course) and Azureus was running quite stable for a longer time, so that's definately an improvement. 

However, this whole thing is still pretty weird and I am a little lost here. 

Using the wireless drivers really seems to make the whole system unstable. An example I can reproduce very reliably is this: Boot, start wireless interface, associate, connect, start Firefox, go to www.torrentspy.com and see Firefox crash with an error message like "Error: Object 'drawingArea' does not have windowed ancestor". This happens on other sites as well (www.cineplexx.at for example). And again: when I connect via the wire nothing of this happens (I have tested this over and over again now).

I know this all sounds much like nonsense and inaptitude on my side, but I can swear any of this only happens when I load and use the madwifi wireless drivers.

I will try and report this at madwifi.org and hopefully I can provide some more information there. As it is now I can't really use my wireless interface which is ... unfortunate on a laptop.
Comment 9 Andreas Ntaflos 2006-02-20 17:19:37 UTC
Oh and I forgot to mention I recompiled the kernel as well as the madwifi stuff without the CFLAGS mentioned in comment #6 but it didn't seem to make any difference. As soon as I have the time I'll recompile the rest of system and world without these flags (but I doubt it will make any difference).
Comment 10 Andreas Ntaflos 2006-04-04 17:22:42 UTC
Just for the record, it *seems* to me now, after a few days of testing, that my problems are gone when using the latest madwifi-drivers and -tools (0.1485.20060325), still with a 2.6.15 kernel. No instability, no segfaults, no reboots needed, nothing. Maybe it was the kernel corruption bug mentioned elsewhere (http://madwifi.org/ticket/408) that was fixed as of 0.1485.20060312, manifesting itself in the odd way described above.

Hopefully that's the end of that. Close bug?
Comment 11 Henrik Brix Andersen 2006-04-04 17:25:28 UTC
Ok, glad that it works for you now. Thank you for reporting back.
Comment 12 Henrik Brix Andersen 2006-04-04 17:25:48 UTC
Closing as WORKSFORME.