Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 176063
Alias:
Product:
Component:
Status: RESOLVED
Resolution: DUPLICATE of bug 171844
Assigned To: Network Filesystems <net-fs@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Martin Bailey <martin@pcalpha.com>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 176063 depends on: Show dependency tree
Bug 176063 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2007-04-26 04:00 0000
I have an NFS server with 512MB of RAM.
When it boots, the memory usage is around 100MB and the rest is used for
caching and buffering.
However, after a few days the NFS rpc.mountd process increases its memory
consumption to the point where the RAM doesn't cache anything anymore.
rpc.mountd consumes most of the RAM and causes memory to be swapped. After a
week, I have over 1GB of swap and rpc.mountd's virtual memory usage keeps
growing. Restarting the nfs daemon once in a while fixes it before the kernel
runs out of memory and kills every process, but this really looks like a severe
leak. My memory usage for the last month looks like a toothsaw with weekly
peaks over 1.5GB (see URL).

Here are the options I use :
(/etc/export) /usr/portage
192.168.0.0/16(rw,sync,no_root_squash,no_subtree_check)
(/etc/fstab) server:/usr/portage /usr/portage nfs
async,soft,intr,rw,lock,rsize=8192,wsize=8192 0 0 

I got the leak with 1.0.10 and 1.0.12 as well, which is why I tried the latest
unstable after seeing in the changelog that it fixed a(nother) leak.

Reproducible: Always

Steps to Reproduce:
1. /etc/init.d/nfs start
2. Wait a day or two to let it consume most of the RAM and start swapping
3. Wait over a week and the kernel runs out of memory and crashes

Actual Results:  
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
22037 root      16   0  385m 312m  480 S  0.0 64.9   0:02.89 rpc.mountd 

after a day

Expected Results:  
should never need to swap on this system

Portage 2.1.2.2 (default-linux/amd64/2006.1/server, gcc-4.1.1, glibc-2.5-r0,
2.6.19-gentoo-r7 x86_64)
=================================================================
System uname: 2.6.19-gentoo-r7 x86_64 AMD Turion(tm) 64 Mobile Technology ML-30
Gentoo Base System release 1.12.9
Timestamp of tree: Wed, 25 Apr 2007 09:50:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632)
[disabled]
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.15-r1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.19.2-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe -msse3 -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache1-php5/ext-active/
/etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/
/etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo
/etc/texmf/web2c"
CXXFLAGS="-march=k8 -O2 -pipe -msse3 -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="-bk"
FEATURES="distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress
--force --whole-file --delete --delete-after --stats --timeout=180
--exclude=/distfiles --exclude=/local --exclude=/packages
--filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.ca.gentoo.org/gentoo-portage"
USE="acpi amd64 apache2 bash-completion berkdb bitmap-fonts bzip2 calendar cdr
clamav cli cracklib crypt dbus doc ffmpeg fortran gcj gdbm gif gpm hal iconv
isdnlog ldap libg++ lm_sensors lzo mailwrapper midi mime mysql ncurses nls nptl
nptlonly pam pcre perl ppds pppd python readline reflection sasl session snmp
sockets spell spl ssl symlink tcpd truetype truetype-fonts type1-fonts unicode
usb vcd videos xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem
bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel
intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci"
ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file
hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route
share shm softvol" ELIBC="glibc" KERNEL="linux" LCD_DEVICES="bayrad cfontz
cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU"
Unset:  CTARGET, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS,
PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

------- Comment #1 From Martin Bailey 2007-04-28 07:35:55 0000 -------
Just to give a better idea, this is the memory usage ~84 hours after restarting
nfs. Note how the server is completely idle and the second most
memory-consuming process is apache2.

top - 00:23:11 up 14 days, 13:27,  1 user,  load average: 0.07, 0.12, 0.09
Tasks: 100 total,   1 running,  99 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    493068k total,   487016k used,     6052k free,    39208k buffers
Swap:  1992016k total,   682540k used,  1309476k free,    69320k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
22037 root      15   0  930m 303m  472 S  0.0 63.1   0:07.18 rpc.mountd         
14905 apache    15   0 97200 9596 3552 S  0.0  1.9   0:02.02 apache2

All it does is centralize the portage tree for the LAN over NFS and serve a
single diskless client. However there's no sudden memory increase; it's a very
constant hike which makes me think of some recurrent event memory not being
freed.

------- Comment #2 From sbriglie@gmail.com 2007-04-29 13:29:50 0000 -------
I can confirm the bug on my server: rpc.mountd is now taking 1.5 G of memory:
root      8039  0.1 76.5 1591236 1588468 ?     Ss   Apr28   1:14
/usr/sbin/rpc.mountd

I have digged around a bit for information and found the following thread on
nfs-utils mailing list:
http://sourceforge.net/mailarchive/message.php?msg_id=1174248657.21998.6.camel%40blackwidow.nbk

which cites this bug on debian bugzilla:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=413661

From these two links it appears that the main responsible of the leak is
libblkid, which is part of e2fsprogs. nfs-utils1.0.12 is the first version of
nfs-utils to make use of this library which has a serious memory leak issue. It
appears from the bug in debian bugzilla that the leak is fixed in e2fsprogs
1.4.0 (gentoo only has 1.3.9). I therefore suggest that the maintainers of
e2fsprogs be notified of this bug.

------- Comment #3 From Martin Bailey 2007-04-29 17:57:45 0000 -------
Nice find, that makes sense. Could it be fixed in unstable e2fsprogs-1.39-r2?

*e2fsprogs-1.39-r2 (24 Mar 2007)

  24 Mar 2007; Mike Frysinger <vapier@gentoo.org>
  +files/e2fsprogs-1.39-blkid-memleak.patch, +e2fsprogs-1.39-r2.ebuild:
  Grab fix from upstream for blkid memleak #171844 by Andrej Filipcic

However, while there might be a leak there, there must be another problem
because I experienced a leak with nfs-utils-1.0.10 as well. I believe I found
the solution to my problem but I don't see how it may be responsible for it.

I was using the ~amd64 gentoo-sources-2.6.19r7 and just upgraded to the new
amd64 gentoo-sources-2.6.20r7 and the bug vanished. I know the kernel is
responsible for some parts of NFS, but I don't see how it could stop a userland
process from leaking. Any idea?

 PID USER   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
6093 root   16   0  8124  420  316 S  0.0  0.1   0:00.24 rpc.mountd

             total       used       free     shared    buffers     cached
Mem:        494660     488500       6160          0      17796     395948

Its memory usage remained that low and stable for almost a day now. The server
is much more responsive with 400MB of cache! :)

------- Comment #4 From sbriglie@gmail.com 2007-04-30 09:20:26 0000 -------
(In reply to comment #3)
> Nice find, that makes sense. Could it be fixed in unstable e2fsprogs-1.39-r2?
> 
> *e2fsprogs-1.39-r2 (24 Mar 2007)
> 
>   24 Mar 2007; Mike Frysinger <vapier@gentoo.org>
>   +files/e2fsprogs-1.39-blkid-memleak.patch, +e2fsprogs-1.39-r2.ebuild:
>   Grab fix from upstream for blkid memleak #171844 by Andrej Filipcic
>

Upgrading to e2fsprogs-1.39-r2 fixed the problem for me. Don't know what to say
about your solution with the kernel.
Anyway I think that this bug should be marked as a duplicate of bug #171844 and
e2fsprogs-1.39-r2 should be made stable as soon as possible

------- Comment #5 From VinnieNZ 2007-05-01 00:29:27 0000 -------
Upgrading to the ~ masked e2fsprogs doesn't help me.

I've tried the following:
On latest stable e2fsprogs
nfs-utils-1.0.12
nfs-utils-1.0.12-r1
nfs-utils-1.0.12-r3

On e2fsprogs-1.39-r2:
nfs-utils-1.0.12-r1
nfs-utils-1.0.12-r3


It's still leaking.  I left the last combo of e2fsprogs-1.39-r2 and
nfs-utils-1.0.12-r1 running since yesterday afternoon, and have just checked
and its using around 11% of memory, with only one machine connected to around 4
shares.

I'm running 2.6.15-gentoo-r7 kernel.



Emerge --info:

Portage 2.1.2.4 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.5-r1,
2.6.15-gentoo-r7 i686)
=================================================================
System uname: 2.6.15-gentoo-r7 i686 AMD Athlon(tm)
Gentoo Base System release 1.12.9
Timestamp of tree: Mon, 30 Apr 2007 19:00:10 +0000
ccache version 2.4 [enabled]
dev-java/java-config: 1.3.7, 2.0.31-r5
dev-lang/python:     2.4.4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.4-r6
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.17
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -mtune=athlon -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf
/etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -mtune=athlon -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="candy ccache distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://ftp.citylink.co.nz/gentoo"
LINGUAS="en_GB"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress
--force --whole-file --delete --delete-after --stats --timeout=180
--exclude=/distfiles --exclude=/local --exclude=/packages
--filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="apache2 apm authdaemond berkdb bitmap-fonts cli cracklib crypt dri eds
emboss encode fam foomaticdb fortran gdbm gif gpm gstreamer iconv imlib isdnlog
jpeg libg++ libwww mad midi mikmod motif mp3 mpeg ncurses nls nptl nptlonly ogg
opengl pam pcre perl png pppd python qt3 qt4 quicktime readline reflection
samba sasl sdl session spell spl ssl tcpd truetype truetype-fonts type1-fonts
unicode vorbis x86 xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp
atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968
fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx
via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop
empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi
null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="evdev
keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780
lb216 lcdm001 mtxorb ncurses text" LINGUAS="en_GB" USERLAND="GNU"
VIDEO_CARDS="s3 s3virge nv nvidia vesa vga"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS,
MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

------- Comment #6 From SpanKY 2007-05-02 18:42:54 0000 -------

*** This bug has been marked as a duplicate of bug 171844 ***

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug