Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 52522 - distcc causes kernel panic on remote host
Summary: distcc causes kernel panic on remote host
Status: RESOLVED DUPLICATE of bug 62739
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Lisa Seelye (RETIRED)
URL: http://seclists.org/lists/linux-kerne...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-30 21:55 UTC by Duke
Modified: 2005-07-17 13:06 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
distcc patch for Linux kernel 2.4.26 (linux.patch,1.71 KB, patch)
2004-06-28 14:41 UTC, Duke
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Duke 2004-05-30 21:55:46 UTC
I'm a little low on details, but this is a known issue with kernel 2.4.26 and is discussed on the lkml found at the link I included.  I thought it may be of interest to the gentoo team.

When using distcc, one of the remote machines that are being used to compile code may encounter a kernel panic.  It's strange, however.  I've got two 2.4.26 machines doing all the compiling for a slower machine, and one crashes while the other does not.  The first line of the error is

kernel BUG at page_alloc.c:98

It's easily reproducible - also discussed on the lkml.  I'd suggest checking the details there, because I can reproduce the error on one machine, while I cannot on the other.  Basically all I have to do is compile something with distcc, and within minutes (or less) I get a kernel panic.

Sorry I havn't included too much info.  It seems to be a known kernel issue and so not entirely related to Gentoo.  My intent was only to notify Gentoo developers of this problem so they can decide on an appropriate action to take.  Be that marking distcc as unstable, or issuing a patch to (all?) the kernel sources.  I hope this is acceptable.

This may be related to Bug #51689 - I see tg3 mentioned in that report and in the lkml posts.  Though they're talking different kernel versions.


Reproducible: Always
Steps to Reproduce:
Comment 1 Lisa Seelye (RETIRED) gentoo-dev 2004-06-04 14:48:14 UTC
Martin, just a heads up in case you don't read lkml.
Comment 2 Martin Pool 2004-06-04 16:51:59 UTC
Thanks Lisa

I did hear of that in passing but had not noticed it was in 2.4.26.

All I can suggest is that you try different options in distcc to try to localize it: mmap on and off, compression on and off, corks on and off, and so on.
Comment 3 Duke 2004-06-28 14:41:45 UTC
Created attachment 34357 [details, diff]
distcc patch for Linux kernel 2.4.26

The patch wouldn't apply using the patch program, so I had to apply it
manually.
Comment 4 Duke 2004-06-28 14:43:36 UTC
Comment on attachment 34357 [details, diff]
distcc patch for Linux kernel 2.4.26

This patch was on the lkml thread.  I gave it a try this weekend on a couple
machines and they've been compiling for another for the last couple of days
without crashing.
Comment 5 Martin Pool 2004-07-05 19:31:36 UTC
Based on the kernel list thread, I think you can avoid this bug by just doing

export DISTCC_MMAP=0 DISTCC_SENDFILE=0 

on all the affected machines before starting distcc and distccd. (e.g. put it in /etc/profile.local or somewhere like that.)
Comment 6 Lisa Seelye (RETIRED) gentoo-dev 2004-08-16 08:31:07 UTC
Still an issue here?  Or was this fixed in a newer version of distcc? (i don't know what version the OP was using).
Comment 7 Harald Leiner 2004-08-17 05:33:17 UTC
I'm new to Bugzilla / bug reporting so please bare with me.

I've tried the workaround from #5, by adding the vars to both /etc/env.d/02distcc and /etc/conf.d/distccd.

i currently use sys-devel/distcc-2.13-r1

with 2.4.25_pre7-gss-r9 the workaround helped, at least I had no kernel panics for ~7 days. I did some "emerge -vu world"s on a 2nd machine using distcc during this time.

after updating the kernel to 2.4.25_pre7-gss-r11 i get the kernel panic again, the workaround is still in place.

The affected machine also uses the tg3 module, the OP mentioned for the onboard  Broadcom NetXtreme BCM5702X Gigabit nic.

my current CFLAGS are 
CFLAGS="-O3 -march=i686 -mcpu=athlon-xp -funroll-loops -pipe -fomit-frame-pointer"

Would it be helpful to recompile distcc with less aggressive CFLAGS?
If so, what else should I recompile...?

Any more info I could possibly provide?
Comment 8 Duke 2004-08-17 17:44:20 UTC
The problem isn't with distcc and how it's compiled.  The problem is with the kernel.

Even if you had some poorly designed program, it shouldn't cause a kernel panic.

I'm in the process of updating to Linux kernel 2.4.27, I'll give distcc a run and see how the new kernel does.
Comment 9 Martin Pool 2004-08-17 17:52:07 UTC
Yes, it's a kernel bug.  

http://distcc.samba.org/faq.html#gbe-panic

Some people have reported that the patch linked from there fixes it.  I think it's in 2.4.27 but I haven't checked.  Confirmation either way from people seeing the bug would be welcome.  (I don't see it on my machines.)
Comment 10 Harald Leiner 2004-08-18 09:22:31 UTC
Hmm, should have looked beyond bugzilla.

Thanks for the pointers and clarifying things for me.

Will go and try to patch 2.4.25_pre7-gss-r11.

The only thing that astounds me, is that 2.4.25_pre7-gss-r11 panics while 2.4.25_pre7-gss-r9 does not?
I guess, there shouldn't be to much difference here.
But ignore me, I'm no programmer, and have no real insight to kernels besides basic theoretical stuff.
Comment 11 Dirk-Lüder Kreie 2004-09-03 12:18:08 UTC
Try the ebuild/patch from bug 62739
Comment 12 Lisa Seelye (RETIRED) gentoo-dev 2004-09-04 15:42:31 UTC
see bug 62739 ... nothing to see here

*** This bug has been marked as a duplicate of 62739 ***