I am currently running gentoo-sources 2.6.25-r7 and, since some time during my use of 2.6.23 or 2.6.24 (I don't remember which), I've been experiencing what appears to be a kernel memory leak. Over the course of a week or two, the memory usage slowly but steadily climbs until my machine starts swapping constantly and it may take over a minute just to get UI responses so I can reboot it. Keep in mind that I have 4GiB of RAM which I acquired as part of a (still un-started) plan to experiment with virtualization, so I suspect a more typical machine would have to reboot after less than a week. Using top and/or free's -/+ buffers/cache line, I can confirm that memory usage remains at the 3GiB+ mark even when I've killed off everything except the bare minimum. (/usr/bin/init, six copies of agetty, a copy of /usr/bin/login, a copy of /bin/sh, and one or two other processes like udevd which I don't know how to safely kill) Here are the first five entries in the output of slabtop: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 15459451 15459430 99% 0.10K 417823 37 1671292K sysfs_dir_cache 10821639 10821604 99% 0.05K 161517 67 646068K size-32 58021 52384 90% 0.23K 3413 17 13652K dentry 48870 46212 94% 0.12K 1629 30 6516K buffer_head 47285 45109 95% 0.77K 9457 5 37828K ext3_inode_cache I'm willing to help however possible, but I don't know where to go from here and most of my programming experience is in Python, bash script, and PHP, so I can't really poke around in the kernel source. Reproducible: Always Steps to Reproduce:
Oh, I forgot to mention. I did google around for a while, but beyond enabling CONFIG_DEBUG_SLAB_LEAK and discovering slabtop, I wasn't really able to find anything helpful. I did discover kmemleak, but the newest patch is for 2.6.20-rc1 and I didn't want to try getting it to apply while I still had safer options. (Given that I have to perform my usual day-to-day activities on this thing)
Wow, that is a huge amount of memory, especially for sysfs_dir_cache Please post your kernel config, as some driver that you use is the most likely culprit. Also, for a debugging strategy: Start with your bare minimum of services and use slabtop to make sure memory is not increasing. Then start turning services on one by one while keeping an eye on slabtop, and see when usage starts to climb.
Created attachment 164156 [details] Kernel config, fresh from /proc/config.gz In addition to this kernel config, I also have the nVidia binary drivers, LIRC, gspca, and the zaptel driver... though the problem has been occurring for a while and the zaptel driver was only added recently.
Oops. Sorry about that. I'm used to bugzilla setups which autodetect the mimetype. You'll have to manually gunzip it.
I've identified one of the triggers for the problem. When I killed sanebuttonsd (from kscannerbuttons in my local overlay), the leak stopped. However, I know it wasn't the only one because the leak was going on before I added sanebuttonsd, so something I killed before sanebuttonsd must also be triggering the leak. (On the plus side, at least I know that sanebuttonsd is a major contributor to the problem, accounting for exactly 111 leaked sysfs_dir_cache structures per slabtop update interval) I may have next to no experience with C and C++, but I'll see if I can find time to take a look inside sanebuttonsd some time in the next few days. Given how consistently precise the leak rate is, I suspect whatever system call is leaking (or poorly designed, but I hope not because that's a lot harder to get fixed) is being called from inside a polling loop.
Oh, I forgot to mention. I also tried building my kernel with SLUB instead of SLAB a few weeks ago and there was no change in behaviour.
Looks like a related problem was reported against scanbuttond... http://www.uwsg.iu.edu/hypermail/linux/kernel/0708.2/2879.html Wonder if you guys are using the same scanner; apparently it's not all USB scanners because Andrew Morton failed to reproduce this bug with his scanner
No clue what the other guy's using, but I'm using a Canon CanoScan LiDE20 flatbed. (plustek driver) Of course, there's just as much chance that it's something else which differs between our setups and Andrew Morton's. Definitely one of the more annoying parts of computing technology. My summer vacation just ended, so I'm not sure how long it'll take me, but I'll see if I can find time to poke around in the scanbuttond source code at some point.
Have you seen this happen with gentoo-sources-2.6.27
I usually wait for gentoo-sources to go stable first, so I'm still on 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes without memory leaking. I'll try to clear some time in the next week or two to test it out.
So you are running scanbuttond as well? Does the leak stop if you stop running scanbuttond?
I'm not running either at the moment and haven't been since my last reboot. (50 days ago) I value uptime a lot more than scanner buttons.
(In reply to comment #10) > I usually wait for gentoo-sources to go stable first, so I'm still on > 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes > without memory leaking. > > I'll try to clear some time in the next week or two to test it out. > Could you paste your emerge --info or say what arch you are using? gources-2.6.entoo-s26-r3 is marked as stable under x86 and amd64.Please see if you can reproduce the bug with this kernel.
(In reply to comment #13) > (In reply to comment #10) > > I usually wait for gentoo-sources to go stable first, so I'm still on > > 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes > > without memory leaking. > > > > I'll try to clear some time in the next week or two to test it out. > > > > Could you paste your emerge --info or say what arch you are using? > > gources-2.6.entoo-s26-r3 is marked as stable under x86 and amd64.Please see if > you can reproduce the bug with this kernel. > That did not come out right for some reason: gentoo-sources-2.6.26-r3
I'm on amd64 stable and 2.6.26 has been stable for a little while now, but I messed up my time management and I'm currently rushing to get my assignments in and my materials studied in prep for exams, so the absolute earliest I can allocate time to configure a new kernel and reboot my system is December 19th... possibly as late as January 1st. I'll leave the e-mail notification of your request in my inbox as a TODO note and get to it then.
Please reopen when you have time to test the latest kernel, which will be 2.6.28 very soon, or 2.6.29-rc1 in about 2 weeks time.