| Summary: | sys-libs/glibc: nscd constantly crashes with memory errors | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Loren Bandiera <lorenb> |
| Component: | [OLD] Core system | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | duncanphilipnorman, ellingsw+20942, ian_milligan, jkt, lance, m.debruijne, mark, mjinks, mlspamcb, romans.heimanis, sbriesen, zephyrus.271, zima |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: | patch to fix memory usage in nscd | ||
|
Description
Loren Bandiera
2008-05-22 13:27:05 UTC
please retest with glibc-2.8 I installed sys-libs/glibc-2.8_p20080602 but it still crashes. Not always at 10mins now, sometimes faster or slightly longer (longest time in my test was 12mins; shortest was 2mins). Still get the same error: nscd: mem.c:392: gc: Assertion `off_alloc == off_allocend' failed. This might be a glibc bug, because I had this problem on gentoo a few years ago, since then I moved to archlinux where I have the same problem. I have reported a bug there, http://bugs.archlinux.org/task/11165. Sorry - pressed enter when the cursor was on the button rather than in the input text. My version of glibc is 2.8 - but looking around for the bug on google reveals this happens on debian and suse too, which makes me think that glibc should look at their code more closely. I've put unscd into my portage overlay which might be of interest: http://svn.hurikhan.ath.cx/gentoo/trunk/sys-libs/unscd/ It's a redesigned implementation immune to most issues causing the nscd bugs. I'm currently test-running it on my machines. and it's in the portage tree now I have installed glibc sys-libs/glibc-2.8_p20080602-r1 on about 10 machines, everywhere nscd crashes. Two days ago I updated glibc on a number of machines, from version 2.6.1 to version 2.8_p20080602-r1. The next morning I found that many of the updated machines (but not quite all) had crashed nscd processes. Nothing in the log, so I restarted the crashed instances in the foreground (nscd -d). Since then, I've recorded four more crashes. The exact error has varied but one instance looked just like the debug output captured above; the error does always come from mem.c; and immediately before it there has always been a "remove" line referring to the host's own address, either BYADDR or BYNAME. I'll paste the last two lines of each case here: 9191: remove GETHOSTBYNAME entry "filthy" nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed. 25961: remove GETHOSTBYNAME entry "problems-test" nscd: mem.c:392: gc: Assertion `off_alloc == off_allocend' failed. 12740: remove GETHOSTBYNAME entry "time" nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed. 12592: remove GETHOSTBYADDR entry "10.135.119.10" nscd: mem.c:310: gc: Assertion `off_alloc <= db->head->first_free' failed. I tried searching the glibc bugzilla for some sign that this has already been reported, but didn't turn anything up. you might want to look into switching to unscd ... Same problem here. Profile: hardened/linux/amd64/2008.0/server glibc-2.8_p20080602-r1 We're hitting the same issue at the OSL with glibc-2.8 + hardened. I did some digging around and it appears that Fedora [1] & Ubuntu [2] have a patch that appears to fix it. I haven't had a chance to test it myself but I can do that in the next few days. Worst case we'll switch to unscd but in the meantime I'd like to at least give this a shot. I'll post a patch if it does indeed fix the problem. [1] https://bugzilla.redhat.com/show_bug.cgi?id=430324 [2] https://bugs.launchpad.net/ubuntu/intrepid/+source/glibc/+bug/256157 Created attachment 193324 [details, diff]
patch to fix memory usage in nscd
The previous patch incorporates the following memory fixes: http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00147.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00148.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00149.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00150.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00151.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00311.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00313.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00314.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00316.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00318.html http://sources.redhat.com/ml/glibc-cvs/2008-q2/msg00320.html I have tested this patch against glibc-2.8_p20080602-r1 and appears to fix the memory issues we were having with nscd. I also used to experience frequent random nscd crashes with glibc-2.8_p20080602. The crashes used to happen in less than one hour from (re)starting nscd. With the new glibc-2.10.1 the problem seems to be fixed for me. No crash has happened since the upgrade which is more than a week ago. Fort what's it's worth I'm seeing the crashes with glibc-2.10.1, on a rather fresh ~amd64 system, gcc-4.4. Every time I connect to a network, any network, through either ethernet or wifi, nscd crashes right away. If I restart it manually it sticks. It gets old though. Denis. on my system I lately had an nscd that had consumed nearly 300MB of memory... seems quite a bit for a laptop.... I'll try unscd nscd seems to behave itself in 2.11.2 |