I am running a little server at home but since the last big system update I ran out of memory overy two weeks. To identify the process who is eating the memory I kill all processes so that there are still only the 6 terminal gettys but /usr/bin/free shows that memory is not released anymore So I activated the kernel memory leak detection tool and it shows (if my unterstanding is correct) that memory is lost in the modul dahdi_hfcs. Reproducible: Always Steps to Reproduce: 1. Load the module dahdi_hfcs 2. Use e.g. asterisk to use the dahdi subsystem Expected Results: Memory should be freed ;) Portage 2.1.11.60 (hardened/linux/amd64, gcc-4.6.3, glibc-2.16.0, 3.8.6-gentoo x86_64) ================================================================= System uname: Linux-3.8.6-gentoo-x86_64-Intel-R-_Atom-TM-_CPU_330_@_1.60GHz-with-gentoo-2.2 KiB Mem: 3572368 total, 1914452 free KiB Swap: 8388604 total, 8388604 free Timestamp of tree: Sun, 07 Apr 2013 12:45:01 +0000 ld GNU ld (GNU Binutils) 2.23.1 distcc 3.1 x86_64-pc-linux-gnu [disabled] ccache version 3.1.9 [enabled] app-shells/bash: 4.2_p42 dev-lang/python: 2.7.3-r3, 3.2.3-r2 dev-util/ccache: 3.1.9 dev-util/cmake: 2.8.10.2-r1 dev-util/pkgconfig: 0.28 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.6 sys-devel/autoconf: 2.13, 2.69 sys-devel/automake: 1.11.6, 1.12.6, 1.13.1 sys-devel/binutils: 2.23.1 sys-devel/gcc: 4.5.4, 4.6.3, 4.7.2-r1 sys-devel/gcc-config: 1.8 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.8 (virtual/os-headers) sys-libs/glibc: 2.16.0 Repositories: gentoo my-local-ebuilds AzP voip ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -march=core2 -fomit-frame-pointer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/lib64/fax /usr/share/easy-rsa /usr/share/gnupg/qualified.txt /var/spool/fax/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-O2 -pipe -march=core2 -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--quiet-build=n" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs ccache collision-protect config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://de-mirror.org/gentoo http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage /var/lib/layman/AzP /var/lib/layman/voip" SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" USE="acl acpi alsa amd64 berkdb bzip2 cli cracklib crypt cups cxx dbus dri exif gdbm gpm gsm h323 hal hardened iconv ipv6 jpeg jpeg2k jpg justify laptop logrotate mmx mmxext modules mp3 mudflap multilib ncurses nls noinfo nptl nptlonly ogg openmp openvpn pam pax_kernel pcre png readline samba session smp span speex sqlite sse sse2 ssl ssse3 subversion svg tcpd unicode urandom vorbis zlib" ABI_X86="64" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dbd deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif so speling status unique_id userdir usertrack vhost_alias cgi proxy proxy_balancer proxy_connect proxy_ftp proxy_http" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="de en" MISDN_CARDS="hfcpci" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" QEMU_SOFTMMU_TARGETS="i386 x86_64" QEMU_USER_TARGETS="i386 x86_64" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="vesa nv" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON Some lines of /sys/kernel/debug/kmemleak unreferenced object 0xffff8800a7328c00 (size 256): comm "hardirq", pid 0, jiffies 4294962837 (age 147.096s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8134f746>] kmemleak_alloc+0x21/0x3e [<ffffffff810c9442>] slab_post_alloc_hook+0x28/0x2a [<ffffffff810cba68>] kmem_cache_alloc+0x96/0xa2 [<ffffffff812c7d00>] build_skb+0x31/0xb2 [<ffffffff812cb354>] __netdev_alloc_skb+0x5d/0xaf [<ffffffffa0205684>] 0xffffffffa0205684 [<ffffffffa02059c6>] 0xffffffffa02059c6 [<ffffffff81085fca>] handle_irq_event_percpu+0x2a/0x125 [<ffffffff810860f9>] handle_irq_event+0x34/0x53 [<ffffffff810886a4>] handle_fasteoi_irq+0x73/0xa6 [<ffffffff8100374e>] handle_irq+0x126/0x130 [<ffffffff81003364>] do_IRQ+0x48/0xa0 [<ffffffff8136106a>] ret_from_intr+0x0/0xe [<ffffffff812ab729>] cpuidle_enter_tk+0x10/0x13 [<ffffffff812ab432>] cpuidle_enter_state+0xf/0x38 [<ffffffff812ab500>] cpuidle_idle_call+0xa5/0xc4 unreferenced object 0xffff8800a007b300 (size 256): comm "hardirq", pid 0, jiffies 4294965337 (age 137.096s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8134f746>] kmemleak_alloc+0x21/0x3e [<ffffffff810c9442>] slab_post_alloc_hook+0x28/0x2a [<ffffffff810cba68>] kmem_cache_alloc+0x96/0xa2 [<ffffffff812c7d00>] build_skb+0x31/0xb2 [<ffffffff812cb354>] __netdev_alloc_skb+0x5d/0xaf [<ffffffffa0205684>] 0xffffffffa0205684 [<ffffffffa02059c6>] 0xffffffffa02059c6 [<ffffffff81085fca>] handle_irq_event_percpu+0x2a/0x125 [<ffffffff810860f9>] handle_irq_event+0x34/0x53 [<ffffffff810886a4>] handle_fasteoi_irq+0x73/0xa6 [<ffffffff8100374e>] handle_irq+0x126/0x130 [<ffffffff81003364>] do_IRQ+0x48/0xa0 [<ffffffff8136106a>] ret_from_intr+0x0/0xe [<ffffffff812ab729>] cpuidle_enter_tk+0x10/0x13 [<ffffffff812ab432>] cpuidle_enter_state+0xf/0x38 [<ffffffff812ab500>] cpuidle_idle_call+0xa5/0xc4 unreferenced object 0xffff8800d71cbe00 (size 256): comm "hardirq", pid 0, jiffies 4294967838 (age 127.116s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8134f746>] kmemleak_alloc+0x21/0x3e [<ffffffff810c9442>] slab_post_alloc_hook+0x28/0x2a [<ffffffff810cba68>] kmem_cache_alloc+0x96/0xa2 [<ffffffff812c7d00>] build_skb+0x31/0xb2 [<ffffffff812cb354>] __netdev_alloc_skb+0x5d/0xaf [<ffffffffa0205684>] 0xffffffffa0205684 [<ffffffffa02059c6>] 0xffffffffa02059c6 [<ffffffff81085fca>] handle_irq_event_percpu+0x2a/0x125 [<ffffffff810860f9>] handle_irq_event+0x34/0x53 [<ffffffff810886a4>] handle_fasteoi_irq+0x73/0xa6 [<ffffffff8100374e>] handle_irq+0x126/0x130 [<ffffffff81003364>] do_IRQ+0x48/0xa0 [<ffffffff8136106a>] ret_from_intr+0x0/0xe [<ffffffff812ab729>] cpuidle_enter_tk+0x10/0x13 [<ffffffff812ab432>] cpuidle_enter_state+0xf/0x38 [<ffffffff812ab500>] cpuidle_idle_call+0xa5/0xc4 root # cat /proc/modules | grep 0xffffffffa0205 dahdi_hfcs 13944 6 - Live 0xffffffffa0205000 (O) root # asterisk -V Asterisk 11.2.1
This driver is part of 98-non-digium-hardware-and-oslec.diff which means Digium will not provide support for it. I will ask Oliver Jaksch to provide commentary on what your options are, and what drivers you can use. I note that you have not yet provided lspci -nnv output for the adapter board in question, I do suspect he will need that in order to help you further. The only "fix" I could offer you right now is to remove the patchset in question, which I suspect is not what you want.
(In reply to comment #1) > I will ask Oliver Jaksch to provide > commentary on what your options are, and what drivers you can use. Thanks for CC'in. This behavior is unknown to me - I'm running Gentoo stable and since committing dahdi_hfcs everything is working as expected and stable to me. No Ooopses nor anything. What I see in his setup is that he's running unstable; maybe there is another component that is playing a role to this backtrace? If really necessary please remove dahdi_hfcs from 98-non-digium-hardware-and-oslec.diff for security and stableness as there aren't any news on <code.google.com/p/zaphfc/source/browse/branches> nor on <https://sourceforge.net/projects/dahdi-hfcs/>. I can live with this and my local portage for good ;)
Created attachment 344792 [details] emerge --info
Created attachment 344794 [details] lspci of my isdn card
Created attachment 344812 [details] My ISDN Card
I need the patch to use my low budget ISDN card (Longshine LCS-8051A). I will downgrade to the latest stable kernel in portage (3.7.10) and check if this error is still there.
The main PCI IDs of your adapters look the same. Oliver, could it be a false positive?
(In reply to comment #7) > The main PCI IDs of your adapters look the same. Oliver, could it be a false > positive? I think so. From the time on I diff'ed the dahdi_hfcs patch, this kernel module is in use in my SOHO environment without problems. Just as i wrote these lines I notice that my gentoo is still running kernel 3.6.11. I'll check to upgrade to 3.7.10 and re-test... @S. Lorsbach: If I understand it correctly your kernel crashes just in the moment as you start asterisk (which in turn activates dahdi subsystem)? What about 'cat /proc/interrupts'? Mine shows... CPU0 CPU1 CPU2 CPU3 20: 1524977766 1 270 137078 IO-APIC-fasteoi dahdi_hfcs ...so no shared interrupt to isdn card. Do you ever tried to use any of the dahdi-tools, ie dahdi_speed, dahdi_test and dahdi_tool? Any tragic occurance?
Hi, Asterisk or my system does not crash. I saw after two weeks of running the server is running out of memory. I notice this, because other services on the same box getting slower and slower. I saw that swap was heavily used, so I shutdown every process. But memory (RAM) was not freed as it should be - only swap. At the end I only have the 6 console gettys up running but the free-command (+- buffers/cache) shows me only ~100MB free memory. For example if I start the box with all daemons started there are ~2.7GB free memory: root # free -h total used free shared buffers cached Mem: 3.4G 1.1G 2.3G 0B 87M 385M -/+ buffers/cache: 684M 2.7G Swap: 8.0G 0B 8.0G So I decide to activate kmemleak in the kernel-config and saw after a restart messages like the following ones: unreferenced object 0xffff8800d2e80700 (size 256): comm "hardirq", pid 0, jiffies 4294903510 (age 675.996s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8135125f>] kmemleak_alloc+0x21/0x3e [<ffffffff810c9d0d>] slab_post_alloc_hook+0x28/0x2a [<ffffffff810cba5e>] kmem_cache_alloc+0x92/0x9e [<ffffffff812c9ae1>] build_skb+0x31/0xb2 [<ffffffff812cc582>] __netdev_alloc_skb+0x5a/0xa4 [<ffffffffa01fb236>] 0xffffffffa01fb236 [<ffffffffa01fc720>] 0xffffffffa01fc720 [<ffffffff8108600f>] handle_irq_event_percpu+0x2b/0x127 [<ffffffff8108613f>] handle_irq_event+0x34/0x51 [<ffffffff810886fa>] handle_fasteoi_irq+0x77/0xac [<ffffffff810036a5>] handle_irq+0x121/0x12f [<ffffffff810032d0>] do_IRQ+0x48/0xa0 [<ffffffff8136276a>] ret_from_intr+0x0/0xe [<ffffffff812acd3c>] cpuidle_enter_tk+0x10/0x14 [<ffffffff812aca42>] cpuidle_enter_state+0xf/0x38 [<ffffffff812acb00>] cpuidle_idle_call+0x95/0xc3 And if I look at the address I notice that memory must be allocated by dahdi_hfcs: # cat /proc/modules | grep 0xffffffffa01fb dahdi_hfcs 14024 6 - Live 0xffffffffa01fb000 (O) In the last days I've done some downgrades of the kernel and the dahdi-drivers. Starting with kernel (gentoo-sources) 3.8.6, 3.7.10, 3.6.11, 3.4.34 and now on 3.3.1, also I changed on every kernel release the dahdi drivers starting by 2.6.2 down to 2.5.0.2-r4 (to get the old 2.5 series running on newer kernels I removed the drivers for the digium-cards.) After every change I rebooted and check the output of kmemleak and saw entries like the ones I reported first. Until now I have not found any combination without that kmemleak reports memory leaks. So it is possible that kmemleak reports false positives, but then I may ask where the memory goes? Are there any other tools that could be used? For now I've switched back to gentoo-sources-3.8.6 and dahdi-drivers 2.6.2. The server runs fine besides the needed manual reboots every 14 days ;) root # dahdi_speed Count: 877730 root # dahdi_test Opened pseudo dahdi interface, measuring accuracy... 99.998% 99.988% 99.993% 99.995% 99.995% 99.995% 99.998% 99.989% 99.995% 99.992% 99.995% 99.995% 99.995% 99.995% 99.995% 99.995% 99.998% 99.990% 99.994% 99.995% 99.995% 99.995% 99.995% 99.995% ^C --- Results after 24 passes --- Best: 99.998% -- Worst: 99.988% -- Average: 99.994347% Cummulative Accuracy (not per pass): 99.994 root # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 19: 1512207 0 0 0 IO-APIC-fasteoi dahdi_hfcs
I agree with assessment, however, that stack trace is NOT going to make things easy for us. S Lorbach - since you've taught me a few new tricks in that post I'm going to have to assume that your knowledge regarding kernel-level work (and understanding of the linux kernel) is significantly better than anybody else who commented on this thread. From what I can tell is that we will have to look at the source file and probably manually audit all uses and possible paths that utilizes netdev_alloc_skb(), as well as what happens with those skb's.
I upgraded to 3.7.10 and btw I downgraded the entire server by half (removing unnecessary packets which gathered over the last years). System still fast and stable. While using stable ebuilds entirely I'm using ~net-libs/libpri-1.4.14 and ~net-misc/dahdi-2.6.2 now. No occurrences nor memory leaks with dahdi_hfcs since upgrading to 3.7.10 the last days. Meanwhile I checked the gentoo-systems of some of my customers (which are NOT using tdm- or multiport-cards but our same/similar cheap hfc-s), but tedium silence in all logs. I'm sorry, but can't reproduce your problem...
Oliver is unable to reproduce the problem in question. High usage of RAM, without use of swap, is normal linux behaviour and not a fault.
Back again ;) @Oliver: Did you enable kmemleak in the config of your kernel? You will only get messages about memory leaks if the kmemleak option in kernel hacking is enabled. However, in the last days I produced a cleaner backtrace: all memoryleaks comes from function hfc_frame_arrived (line 1203, hfcs/base.c) Also I think the function is not correct, because a skb struct is allocated but the data which is putted into the allocated skb structure is not freed and the data is never used(?). If you look at line 1283 where the skb gets filled, incoming frames are handled directly by dahdi. so why copy the data into the skb struct? If you look at the else part, the frame is also retrieved and put into the skb, but same thing - the data will never get freed. If I remove the skb alloc and only implement a temporary data buffer, memory leaks have gone and asterisk is also working fine. I think the same could be achieved if a dev_kfree_skb(skb) is called after the if-statement. But this is also a workaround. I think if there are incoming frames something should happen with them ;)
Created attachment 350138 [details, diff] My Workaround Patch which throws away incoming frames if the device is not open by dahdi.
(In reply to S. Lorsbach from comment #13) > Back again ;) > > @Oliver: Did you enable kmemleak in the config of your kernel? You will only > get messages about memory leaks if the kmemleak option in kernel hacking is > enabled. Funny, thought of you and your problem with dahdi_hfcs these days and what happened in the meantime... So welcome back, Stephan :) And yes, I've CONFIG_HAVE_DEBUG_KMEMLEAK=y in .config of 3.8.13-gentoo what is not explicit done be my - maybe a default or depending on another options I don't know off. > [...] > But this is also a workaround. I think if there are incoming frames > something should happen with them ;) Maybe these frames are some isdn-data arriving from d-channel (see <http://en.wikipedia.org/wiki/D_channel> or the more-explaining article in german language at <http://de.wikipedia.org/wiki/D-Kanal>)? Anyway, since our last post I switched to 3.8.13-gentoo as mentioned earlier and my hfc is still working flawless. I've absolute no idea whats wrong on your site. Maybe it's worth trying a BN2S0 or BN4S0 as these are supported by dahdi natively - and you get some extra isdn-ports for playing :)
Hi, The SKB stuff probably comes from the fact that DAHDI drivers really are network drivers. On your patch (I assume you did test that this actually works), this looks like a rather crazy patch. Not sure it's the way to go, but I don't know enough about DAHDI either to know for sure. You're basically replacing a SKB (which carries lots of meta information, deals with checksums and other) to a normal kmalloc()ed block. Does this really solve the problem? And moving it to only allocate if we need it? Then, since you're about to receive data into the kmalloc()ed area, is the memset() call to clear the memory required? Isn't there a kzmalloc() that does both of these in one? In principle I don't see anything obvious wrong with this patch. I can also confirm that this definitely looks like a memory leak in the unpatched driver. I pretty much don't see how this cannot manifest in one way or another (see later). Looking at *just* the patch, I think that the *intent* may have been something like: skb = dev_alloc_skb(frame_size - 3); hfc_fifo_get_frame(&card->chans[D].rx, skb_put(skb, frame_size - 3) frame_size - 3) == -1) if (chan->open_by_dahdi) memcpy(skb_get???(skb, frame_size - 3), frame_size - 3); However: Note that the one call to hfc_fifo_get_frame() has frame_size - 1 and the other has frame_size - 3. This *might* be somehow the reason for the memory leak, if the one code path manifests in the leak and the other not. So my *gut* would say that the -3 one is the wrong one, but then the provided patch should still give the same result ... What does skb_put do anyway? Doesn't that somehow dispatch the frame to the networking stack for handling there? Haven't worked with this stuff for years, so my knowledge is definitely rusty, but unless skb_put does some kind of magic with the skb to dispatch it somewhere we are most likely looking at a memory leak.