Home | Docs | Forums | Lists | Bugs | Planet | Store | GMN | Get Gentoo!
Not eligible to see or edit group visibility for this bug.
View Bug Activity | Format For Printing | XML | Clone This Bug
I have an NFS server with 512MB of RAM. When it boots, the memory usage is around 100MB and the rest is used for caching and buffering. However, after a few days the NFS rpc.mountd process increases its memory consumption to the point where the RAM doesn't cache anything anymore. rpc.mountd consumes most of the RAM and causes memory to be swapped. After a week, I have over 1GB of swap and rpc.mountd's virtual memory usage keeps growing. Restarting the nfs daemon once in a while fixes it before the kernel runs out of memory and kills every process, but this really looks like a severe leak. My memory usage for the last month looks like a toothsaw with weekly peaks over 1.5GB (see URL). Here are the options I use : (/etc/export) /usr/portage 192.168.0.0/16(rw,sync,no_root_squash,no_subtree_check) (/etc/fstab) server:/usr/portage /usr/portage nfs async,soft,intr,rw,lock,rsize=8192,wsize=8192 0 0 I got the leak with 1.0.10 and 1.0.12 as well, which is why I tried the latest unstable after seeing in the changelog that it fixed a(nother) leak. Reproducible: Always Steps to Reproduce: 1. /etc/init.d/nfs start 2. Wait a day or two to let it consume most of the RAM and start swapping 3. Wait over a week and the kernel runs out of memory and crashes Actual Results: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22037 root 16 0 385m 312m 480 S 0.0 64.9 0:02.89 rpc.mountd after a day Expected Results: should never need to swap on this system Portage 2.1.2.2 (default-linux/amd64/2006.1/server, gcc-4.1.1, glibc-2.5-r0, 2.6.19-gentoo-r7 x86_64) ================================================================= System uname: 2.6.19-gentoo-r7 x86_64 AMD Turion(tm) 64 Mobile Technology ML-30 Gentoo Base System release 1.12.9 Timestamp of tree: Wed, 25 Apr 2007 09:50:01 +0000 distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.15-r1 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.19.2-r2 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=k8 -O2 -pipe -msse3 -fomit-frame-pointer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache1-php5/ext-active/ /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c" CXXFLAGS="-march=k8 -O2 -pipe -msse3 -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="-bk" FEATURES="distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.ca.gentoo.org/gentoo-portage" USE="acpi amd64 apache2 bash-completion berkdb bitmap-fonts bzip2 calendar cdr clamav cli cracklib crypt dbus doc ffmpeg fortran gcj gdbm gif gpm hal iconv isdnlog ldap libg++ lm_sensors lzo mailwrapper midi mime mysql ncurses nls nptl nptlonly pam pcre perl ppds pppd python readline reflection sasl session snmp sockets spell spl ssl symlink tcpd truetype truetype-fonts type1-fonts unicode usb vcd videos xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" Unset: CTARGET, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Just to give a better idea, this is the memory usage ~84 hours after restarting nfs. Note how the server is completely idle and the second most memory-consuming process is apache2. top - 00:23:11 up 14 days, 13:27, 1 user, load average: 0.07, 0.12, 0.09 Tasks: 100 total, 1 running, 99 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 493068k total, 487016k used, 6052k free, 39208k buffers Swap: 1992016k total, 682540k used, 1309476k free, 69320k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22037 root 15 0 930m 303m 472 S 0.0 63.1 0:07.18 rpc.mountd 14905 apache 15 0 97200 9596 3552 S 0.0 1.9 0:02.02 apache2 All it does is centralize the portage tree for the LAN over NFS and serve a single diskless client. However there's no sudden memory increase; it's a very constant hike which makes me think of some recurrent event memory not being freed.
I can confirm the bug on my server: rpc.mountd is now taking 1.5 G of memory: root 8039 0.1 76.5 1591236 1588468 ? Ss Apr28 1:14 /usr/sbin/rpc.mountd I have digged around a bit for information and found the following thread on nfs-utils mailing list: http://sourceforge.net/mailarchive/message.php?msg_id=1174248657.21998.6.camel%40blackwidow.nbk which cites this bug on debian bugzilla: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=413661 From these two links it appears that the main responsible of the leak is libblkid, which is part of e2fsprogs. nfs-utils1.0.12 is the first version of nfs-utils to make use of this library which has a serious memory leak issue. It appears from the bug in debian bugzilla that the leak is fixed in e2fsprogs 1.4.0 (gentoo only has 1.3.9). I therefore suggest that the maintainers of e2fsprogs be notified of this bug.
Nice find, that makes sense. Could it be fixed in unstable e2fsprogs-1.39-r2? *e2fsprogs-1.39-r2 (24 Mar 2007) 24 Mar 2007; Mike Frysinger <vapier@gentoo.org> +files/e2fsprogs-1.39-blkid-memleak.patch, +e2fsprogs-1.39-r2.ebuild: Grab fix from upstream for blkid memleak #171844 by Andrej Filipcic However, while there might be a leak there, there must be another problem because I experienced a leak with nfs-utils-1.0.10 as well. I believe I found the solution to my problem but I don't see how it may be responsible for it. I was using the ~amd64 gentoo-sources-2.6.19r7 and just upgraded to the new amd64 gentoo-sources-2.6.20r7 and the bug vanished. I know the kernel is responsible for some parts of NFS, but I don't see how it could stop a userland process from leaking. Any idea? PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6093 root 16 0 8124 420 316 S 0.0 0.1 0:00.24 rpc.mountd total used free shared buffers cached Mem: 494660 488500 6160 0 17796 395948 Its memory usage remained that low and stable for almost a day now. The server is much more responsive with 400MB of cache! :)
(In reply to comment #3) > Nice find, that makes sense. Could it be fixed in unstable e2fsprogs-1.39-r2? > > *e2fsprogs-1.39-r2 (24 Mar 2007) > > 24 Mar 2007; Mike Frysinger <vapier@gentoo.org> > +files/e2fsprogs-1.39-blkid-memleak.patch, +e2fsprogs-1.39-r2.ebuild: > Grab fix from upstream for blkid memleak #171844 by Andrej Filipcic > Upgrading to e2fsprogs-1.39-r2 fixed the problem for me. Don't know what to say about your solution with the kernel. Anyway I think that this bug should be marked as a duplicate of bug #171844 and e2fsprogs-1.39-r2 should be made stable as soon as possible
Upgrading to the ~ masked e2fsprogs doesn't help me. I've tried the following: On latest stable e2fsprogs nfs-utils-1.0.12 nfs-utils-1.0.12-r1 nfs-utils-1.0.12-r3 On e2fsprogs-1.39-r2: nfs-utils-1.0.12-r1 nfs-utils-1.0.12-r3 It's still leaking. I left the last combo of e2fsprogs-1.39-r2 and nfs-utils-1.0.12-r1 running since yesterday afternoon, and have just checked and its using around 11% of memory, with only one machine connected to around 4 shares. I'm running 2.6.15-gentoo-r7 kernel. Emerge --info: Portage 2.1.2.4 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.5-r1, 2.6.15-gentoo-r7 i686) ================================================================= System uname: 2.6.15-gentoo-r7 i686 AMD Athlon(tm) Gentoo Base System release 1.12.9 Timestamp of tree: Mon, 30 Apr 2007 19:00:10 +0000 ccache version 2.4 [enabled] dev-java/java-config: 1.3.7, 2.0.31-r5 dev-lang/python: 2.4.4 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.4-r6 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.17 sys-devel/gcc-config: 1.3.14 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -mtune=athlon -pipe -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -mtune=athlon -pipe -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="candy ccache distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="http://ftp.citylink.co.nz/gentoo" LINGUAS="en_GB" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="apache2 apm authdaemond berkdb bitmap-fonts cli cracklib crypt dri eds emboss encode fam foomaticdb fortran gdbm gif gpm gstreamer iconv imlib isdnlog jpeg libg++ libwww mad midi mikmod motif mp3 mpeg ncurses nls nptl nptlonly ogg opengl pam pcre perl png pppd python qt3 qt4 quicktime readline reflection samba sasl sdl session spell spl ssl tcpd truetype truetype-fonts type1-fonts unicode vorbis x86 xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en_GB" USERLAND="GNU" VIDEO_CARDS="s3 s3virge nv nvidia vesa vga" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
*** This bug has been marked as a duplicate of bug 171844 ***