Summary: | gentoo-sources 2.6.25-r7 (and many earlier versions) appear to leak sysfs_dir_cache and size-32 structures | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Stephan Sokolow <only_bugzilla_automail.era.ssokolow> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED NEEDINFO | ||
Severity: | major | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Kernel config, fresh from /proc/config.gz |
Description
Stephan Sokolow
2008-08-31 00:44:44 UTC
Oh, I forgot to mention. I did google around for a while, but beyond enabling CONFIG_DEBUG_SLAB_LEAK and discovering slabtop, I wasn't really able to find anything helpful. I did discover kmemleak, but the newest patch is for 2.6.20-rc1 and I didn't want to try getting it to apply while I still had safer options. (Given that I have to perform my usual day-to-day activities on this thing) Wow, that is a huge amount of memory, especially for sysfs_dir_cache Please post your kernel config, as some driver that you use is the most likely culprit. Also, for a debugging strategy: Start with your bare minimum of services and use slabtop to make sure memory is not increasing. Then start turning services on one by one while keeping an eye on slabtop, and see when usage starts to climb. Created attachment 164156 [details]
Kernel config, fresh from /proc/config.gz
In addition to this kernel config, I also have the nVidia binary drivers, LIRC, gspca, and the zaptel driver... though the problem has been occurring for a while and the zaptel driver was only added recently.
Oops. Sorry about that. I'm used to bugzilla setups which autodetect the mimetype. You'll have to manually gunzip it. I've identified one of the triggers for the problem. When I killed sanebuttonsd (from kscannerbuttons in my local overlay), the leak stopped. However, I know it wasn't the only one because the leak was going on before I added sanebuttonsd, so something I killed before sanebuttonsd must also be triggering the leak. (On the plus side, at least I know that sanebuttonsd is a major contributor to the problem, accounting for exactly 111 leaked sysfs_dir_cache structures per slabtop update interval) I may have next to no experience with C and C++, but I'll see if I can find time to take a look inside sanebuttonsd some time in the next few days. Given how consistently precise the leak rate is, I suspect whatever system call is leaking (or poorly designed, but I hope not because that's a lot harder to get fixed) is being called from inside a polling loop. Oh, I forgot to mention. I also tried building my kernel with SLUB instead of SLAB a few weeks ago and there was no change in behaviour. Looks like a related problem was reported against scanbuttond... http://www.uwsg.iu.edu/hypermail/linux/kernel/0708.2/2879.html Wonder if you guys are using the same scanner; apparently it's not all USB scanners because Andrew Morton failed to reproduce this bug with his scanner No clue what the other guy's using, but I'm using a Canon CanoScan LiDE20 flatbed. (plustek driver) Of course, there's just as much chance that it's something else which differs between our setups and Andrew Morton's. Definitely one of the more annoying parts of computing technology. My summer vacation just ended, so I'm not sure how long it'll take me, but I'll see if I can find time to poke around in the scanbuttond source code at some point. Have you seen this happen with gentoo-sources-2.6.27 I usually wait for gentoo-sources to go stable first, so I'm still on 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes without memory leaking. I'll try to clear some time in the next week or two to test it out. So you are running scanbuttond as well? Does the leak stop if you stop running scanbuttond? I'm not running either at the moment and haven't been since my last reboot. (50 days ago) I value uptime a lot more than scanner buttons. (In reply to comment #10) > I usually wait for gentoo-sources to go stable first, so I'm still on > 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes > without memory leaking. > > I'll try to clear some time in the next week or two to test it out. > Could you paste your emerge --info or say what arch you are using? gources-2.6.entoo-s26-r3 is marked as stable under x86 and amd64.Please see if you can reproduce the bug with this kernel. (In reply to comment #13) > (In reply to comment #10) > > I usually wait for gentoo-sources to go stable first, so I'm still on > > 2.6.25-r7. Also, I disabled sanebuttonsd because I've needed long runtimes > > without memory leaking. > > > > I'll try to clear some time in the next week or two to test it out. > > > > Could you paste your emerge --info or say what arch you are using? > > gources-2.6.entoo-s26-r3 is marked as stable under x86 and amd64.Please see if > you can reproduce the bug with this kernel. > That did not come out right for some reason: gentoo-sources-2.6.26-r3 I'm on amd64 stable and 2.6.26 has been stable for a little while now, but I messed up my time management and I'm currently rushing to get my assignments in and my materials studied in prep for exams, so the absolute earliest I can allocate time to configure a new kernel and reboot my system is December 19th... possibly as late as January 1st. I'll leave the e-mail notification of your request in my inbox as a TODO note and get to it then. Please reopen when you have time to test the latest kernel, which will be 2.6.28 very soon, or 2.6.29-rc1 in about 2 weeks time. |