I am seeing a slow memory leak in the kernel. I am using gentoo-sources 2.6.11-r6, but also observed it in 2.6.11-r4. Over the course of several days, the server in question has the amount of available memory (free minus buffers+cache) gradually decrease. The rate is about 150MB per day (the system has 2GB of RAM total). The working set of processes remains the same through the whole period at between 50-150MB (depending on if you count VSZ or RSS). Nothing shows up in dmesg except for a couple of one-time lockd and nfs messages (the system uses two remote filesystems). The local filesystems are ReiserFS on a 3Ware 7500-4 controller, and the NIC is an Intel E100.
total used free shared buffers cached
Mem: 2076180 2024068 52112 0 166760 93200
-/+ buffers/cache: 1764108 312072
Swap: 1028152 56 1028096
# cat /proc/meminfo
MemTotal: 2076180 kB
MemFree: 63080 kB
Buffers: 158776 kB
Cached: 91664 kB
SwapCached: 4 kB
Active: 1055244 kB
Inactive: 874660 kB
HighTotal: 1179072 kB
HighFree: 640 kB
LowTotal: 897108 kB
LowFree: 62440 kB
SwapTotal: 1028152 kB
SwapFree: 1028096 kB
Dirty: 768 kB
Writeback: 0 kB
Mapped: 12648 kB
Slab: 69872 kB
CommitLimit: 2066240 kB
Committed_AS: 26316 kB
PageTables: 1492 kB
VmallocTotal: 114680 kB
VmallocUsed: 4700 kB
VmallocChunk: 109784 kB
Module Size Used by
nfs 91180 2
lockd 58920 2 nfs
sunrpc 125764 5 nfs,lockd
e100 31872 0
mii 4352 1 e100
0000:00:00.0 Host bridge: Intel Corp. E7500 Memory Controller Hub (rev 03)
0000:00:00.1 Class ff00: Intel Corp. E7500/E7501 Host RASUM Controller (rev 03)
0000:00:02.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 03)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 42)
0000:00:1f.0 ISA bridge: Intel Corp. 82801CA LPC Interface Controller (rev 02)
0000:00:1f.1 IDE interface: Intel Corp. 82801CA Ultra ATA Storage Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus Controller (rev 02)
0000:01:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 03)
0000:01:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 03)
0000:01:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 03)
0000:01:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 03)
0000:03:01.0 RAID bus controller: 3ware Inc 3ware Inc 3ware 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
0000:04:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
0000:04:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d)
0000:04:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d)
I would be happy to provide any additional information. As it stands, I have to reboot about once a week to clear the RAM or else it thrashes itself to death.
Steps to Reproduce:
# emerge info
Portage 188.8.131.52 (default-linux/x86/2005.0, gcc-3.3.5-20050130,
glibc-184.108.40.20641102-r1, 2.6.11-gentoo-r6 i686)
System uname: 2.6.11-gentoo-r6 i686 Intel(R) Xeon(TM) CPU 2.00GHz
Gentoo Base System version 1.4.16
Python: dev-lang/python-2.3.4-r1 [2.3.4 (#1, Mar 28 2005, 01:06:34)]
sys-apps/sandbox: [Not Present]
sys-devel/autoconf: 2.59-r6, 2.13
sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4
CFLAGS="-O3 -mcpu=i686 -fomit-frame-pointer -fstack-protector"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/local/clockspeed/etc /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -mcpu=i686 -fomit-frame-pointer -fstack-protector"
FEATURES="autoaddcvs autoconfig ccache collision-protect digest distlocks
notitles sandbox sfperms strict userpriv usersandbox"
USE="x86 alsa apache2 apm berkdb bitmap-fonts crypt emacs emboss encode ethereal
fortran gdbm gif gtk2 imlib ipv6 jpeg libg++ libwww mp3 mysql ncurses nls pam
perl png python readline skey snmp spell ssl tcpd truetype-fonts type1-fonts
xml2 zlib userland_GNU kernel_linux elibc_glibc"
Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS
Please test vanilla-sources-2.6.12_rc3
I have booted vanilla-sources-2.6.12_rc3, and it still appears to be leaking, possibly worse than before. I am down to just over 1GB of free memory after two days of uptime. Anything else?
Next suggestion would be to mail the linux kernel list like you have already done. Provide any info that they ask for, and reopen this bug once you find a solution for the problem.
I have been running vanilla-sources-2.6.12_rc3 (with one small patch to track
page ownership) for almost 10 days now, and no leaks are showing up. The only
kernels I can conclusively state leaked memory are the gentoo-sources series,
specifically 2.6.11-r4 and -r6.
I have presumably the same problem with practically all kernel versions of
gentoo-sources and hardened-sources, at least since 2.6.8* (I haven't tried
earlier ones yet) on an amd64 with both, 32 and 64 bit, kernels/installations.
I am wondering why nobody else seems to have this problem. Unfortunately, the
reproducibility is not so good and the computer has to run rather long until
the problem happens (I tried with many kernel configurations, and sometimes I
had thought the problem has vanished, but then all of a sudden it was back).
However, in my case the memory usually fills (sometimes) when compiling c++
projects. For example, a complete kde compile will often not succeed without
killing some random processes (usually some compiler tasks itself are killed
so that the emerge ends during "make" with "internal error: killed").
Surprisingly, increasing the swap space seems to have no influence at all: in
one test a task was getting killed even after 30 minutes of uptime even with
an additional 16 gig swapfile (although the kernel swapped like crazy). [For a
while I was thinking about a thermal hardware problem, but this does not seem
to be the case either, since "nicing" the processes and limiting the cpu
frequency while simultaneously opening the tower and using an additional
cooling also had no influence. Moreover, the reproducibility seems too good to
be a hardware problem.]
So, Bruce, maybe it helps you to provoke/speed up the problem by compiling kde
several times? (Do not forget to make sure that no compiler cache is used by
renaming /usr/bin/ccache in the case that you installed it - IIRC only
removing ccache from the FEATURES list was not enough).
Maybe this bug is a duplicate of 58969 (at least my above comments seem to
have a relation with that bug). Please see my comments there.
I observed the problem now also with vanilla-sources (I tested with 2.6.12_rc5
and used genkernel --udev without changing anything in the default
If you can reproduce it on 2.6.12-rc5 then it is an upstream issue, not one
caused by gentoo's kernel patches.
Read Bruce's discussion and gather some information about your problem:
Then write your own report to the linux kernel mailing list.
(In reply to comment #7)
> If you can reproduce it on 2.6.12-rc5 then it is an upstream issue, not one
> caused by gentoo's kernel patches.
Yes, it is not caused by the *kernel* patches. But the problem only happens
with the Gentoo-compiled kernel: It seems that when I boot my SuSE system and
chroot to the Gentoo partition, there are no problems (it *might* be
accidental, but I retried several times, compiling successfully the "usual
And today I observed something even stranger: I copied from an old backup the
kernel generated from gentoo-sources-2.6.9-r14 and it also worked! However,
after recompiling the *same* version (well, almost: I recompiled 2.6.9-r9
because the other one is not in the portage tree anymore),
using /proc/config.gz from the running 2.6.9-r14-configuration (and using
genkernel), I got a kernel which exhibits the memory leak again!
I have really no idea how this is possible (but I tried both kernels several
times, and always the "old" 2.6.9-r14 worked and the "newly compiled" 2.6.9-r9
failed). My only idea is that my toolchain produces a wrong kernel which,
however, works perfectly except for this memory leak - this does not sound
very likely to me.
I am currently re-bootstrapping my toolchain (using only the most stable
versions with no optimization) and will then recompile the kernel. When I find
something new, I will let you know (but I am very busy these days, so it might
take some time).
Just for the records: No difference with the current stable toolchain.
It's very unlikely - nothing in userspace can directly cause a kernel memory
leak (but then again, you haven't actually posted any numbers, so it might not
be the kernel that is leaking...)
It's not a fair comparison with suse unless you are running exactly the same
kernel on both. Are you?
There is also no point playing with old kernels like 2.6.9. Reproduce it on the
current development version and provide some numbers to the kernel developers.
Thats the only way this will get solved.
Created attachment 60657 [details]
Output of free, proc/meminfo, proc/slabinfo
This is the output after many "emerge"s when the system is almost swapping dead
for no apparent reason.
You need to post this to the Linux kernel list like Bruce did.
Somehow my additional comment seemed to get lost, so I repost it (sorry if
this should be doubled now).
(In reply to comment #12)
> You need to post this to the Linux kernel list like Bruce did.
I understood what you mean, but as I wrote, the SuSE kernel and the old gentoo
kernel (from practically the same sources with the same .config) seems to
work, but a kernel freshly compiled under gentoo does not. So the reason
probably is not in the gentoo/vanilla-sources but more in its interplay with
gentoo - to me it is completely mysterious. But if there are no other ideas
maybe I will write to the kernel list anyway.
(In reply to comment #10)
> It's very unlikely - nothing in userspace can directly cause a kernel memory
> leak (but then again, you haven't actually posted any numbers, so it might
> not be the kernel that is leaking...)
I wrote this thing about the toolchain, because the only explanation for the
different behaviour for me seems that something is wrong with the compilation
process itself. But even after re-bootstrapping the toolchain (i.e.
re-emerging linux-headers,gcc,binutils,glibc sufficiently often) a freshly
compiled kernel does not work (and I tried several kernel versions - older and
Concerning the missing data: There are actually two effects which I believe
have the same cause, but I might be wrong:
1. The only effect which I can provoke is that when compiling
certain .cc-files with makeopts="-j2" and optimization C*FLAGS usually
compilation dies with "internal error: killed" (or sometimes also processes of
other users are killed instead).
2. The other effect happens only after compiling many (~100 or
more) .cc-projects: The system slows down dramatically with lots of harddisk
acces and often is practically dead (response time for a keypress maybe
minutes). The output of comment #11 is from such a situation.
If in 2. the system is not dead, effect 1. happens much more often - that's
why I believe it is actually the same problem.
> It's not a fair comparison with suse unless you are running exactly the same
> kernel on both. Are you?
I did not want to compare; but I simply have no explanation: SuSE's and the
old gentoo kernel (which I now lost due to a stupid mistake) were the only
"working" kernels which did not show the effect of 1. - instead, they start
swapping at about the same time during compilation as the new compiled kernels
(older and newer) would usually start killing random processes.
Regardless of which distro you see a leak on, if the latest unmodified
development kernel (vanilla-sources-2.6.12_rc5) is leaking then it is a kernel
bug. This may be triggered by a scenario present in Gentoo that is not present
in SUSE but no user space program should be able to make the kernel leak (and if
this is the case, then its a kernel bug). If a big leak is triggered in user
space, it is usually regarded as a DoS (denial of service) attack because a
standard user account can easily bring down the box.
I found the main cause: The nvidia-kernel module (the problem occured also
without X - therefore I had not thought of this cause - but I had the nvidia
module listed in /etc/modules.autoload.d and my scripts had always compiled
the module). The earlier gentoo and SuSE kernels used of course different
nvidia-kernel versions which explains the different behaviour.
With nvidia-kernel-1.0.7664 the reproducible part of the problem has vanished.
Anyway, there still seems to vanish some memory, but currently I have not time
for further investigations (and it seems hopeless anyway, since the vanishing
is too slow for systematic experiments).