Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 64657 - Can MAKEOPTS="-j... be reduced when distcc fails?
Summary: Can MAKEOPTS="-j... be reduced when distcc fails?
Status: RESOLVED WONTFIX
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Third-Party Tools (show other bugs)
Hardware: x86 Linux
: High enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-19 09:41 UTC by chris-gentoo
Modified: 2005-07-24 02:00 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chris-gentoo 2004-09-19 09:41:55 UTC
I use distcc to spread compiling over two very fast PC's. But both of them have little memory. Setting MAKEOPTS="-j4" makes for very fast distributed building, but if a build fails to make use of distcc (eg mythtv), then the memory gets exhausted, and compiling grinds to a crawl while the swap gets thrashed.
For example, tonight I tried to compile the latest mythtv, kicked it off and then left the computers on for several hours, came back and discovered (to my frustration) almost no progress had been made.

Is it possible to get portage to somehow drop the value of -j4 in these situations?

Reproducible: Always
Steps to Reproduce:
1.Install and configure distcc on multiple hosts
2.Configure portage for a -jN appropriate to the number of nodes
2.Start a build that doesn't make use of distcc


Actual Results:  
Multiple (N) concurrent gcc processes will run on the local machine, causing 
poor performance as they compete for the CPU and memory. 

Expected Results:  
The number of concurrent processes should be dropped.
Comment 1 Nicholas Jones (RETIRED) gentoo-dev 2004-10-09 14:51:02 UTC
There is no way to detect failure. You could use -l instead of -j.
Check the manpage.
Comment 2 Peter Renchen 2005-02-22 05:30:47 UTC
According to Nicholas Jones, this is not a matter of failure detection, it is an matter of cpu/memory power.

Runing a compile with: FEATURES="distcc" MAKEOPTS="-j10" distributed on 6 distcc-server leads to a very fast compile. If distributed compiling of the programm you want to compile was tuned off by the programmmaintainer, your compile will be very very slown because now distcc start the compile local with -j10 Option !
Now you have 10 c++ jobs on your; e.g weak; local machine, not 10 c++ jobs distributed on 6 servers on your distcc-farm!
This is an big difference. Your load gets high, your swappspace runs full and 98% and more of your processes are in waiting state..i bet you're finish the compile faster by writing the code on paper ;-)

In my humble opinion, when maintainers turn of distributed compiling in their ebuilds, either they, or better portage, should detec this and automatical reduce MAKEOPTS to -j1 oder -2 for this ebuild.

There is a big difference in runing compile jobs with -j1 or -j10 on one and the same machine.

I read manpages for make, distcc, gcc and make.conf, there is no -I Option which reduces in case of an fallover the amount of compilingjobs...

Thia, Peter.
Comment 3 Marius Mauch (RETIRED) gentoo-dev 2005-02-22 08:10:41 UTC
"...compile was tuned off by the programmmaintainer..."

Turned off how?
Comment 4 Yang Zhao 2005-06-09 16:30:41 UTC
(In reply to comment #3)
> "...compile was tuned off by the programmmaintainer..."
> 
> Turned off how?

I think what Peter menas are packages that disables parallel compiling, such as
Mozilla.

Those packages override MAKEOPTS with -j1, so this is a non-issue.

@Marius: The option you want is -l (small L), not -I (big i)
Comment 5 Lionel Bouton 2005-07-07 07:33:58 UTC
There's a special case where the parallel compile isn't disabled but distcc is:
the gcc compilation does it by overriding $CC.

I have a 5 system compile farm, MAKEOPTS=-j8 and I'm currently waiting on a 96M
Xen domain-0 to finish its gcc build in SWAP...

I was considering proposing to make the problematic ebuilds override MAKEOPTS
themselves but in fact there's no way to set a "good for everyone" -jX value.
I'm not sure -lY is a good solution. Aren't each distcc client counting as a
process waiting for I/O? If you want to distribute the load in roughly the same
way than '-jX' allows, you'll probably set Y not far from X. But waiting on TCP
I/O isn't nearly as bad as waiting on disk I/O. So you'll end up having bad
performance when distcc isn't available too.

I'm currently testing with '-l3' instead of '-j8' in order to test the actual
behaviour with distcc.

If '-lY' doesn't work, one first step would be to define a "NODISTCC_MAKEOPTS"
alternative that would be configurable in make.conf and usable by
"distcc-disabling" ebuild.

One other (better?) way could be to better integrate distcc with emerge by
having emerge parse the /etc/distcc/hosts file, attempt a connection (either tcp
or ssh depending on the type of entry) to each host and for those that fail
dynamically substract the number of parallel jobs (with a minimum) allowed by
MAKEOPTS by the amount supported by the /etc/distcc/hosts.
Comment 6 Radek Podgorny 2005-07-22 03:32:27 UTC
All those problems could be solved by zeroconf/rendezvous.
Comment 7 Radek Podgorny 2005-07-22 03:48:46 UTC
...actually, it does not solve -jX problem itself but helps to manage that
automatically. See bug 80219.
Comment 8 SpanKY gentoo-dev 2005-07-22 13:29:15 UTC
i dont think ive seen a case yet where a build failure due to distcc wasnt due
to issues in the build system itself with running in parallel
Comment 9 Radek Podgorny 2005-07-22 15:35:54 UTC
Are you sure you want to close this one? I think the main spirit was not about
distcc crashing. It was about selecting the -j parameter automatically depending
on the number of available build hosts. It's quite insane to run a compile with
"default" (on my farm) -j10 when you're disconnected from the network (it swaps
it's ass off)...
Comment 10 SpanKY gentoo-dev 2005-07-22 22:00:55 UTC
auto-adjusting -j value depending on # of available hosts really isnt feasible i
dont think

in the original report where the pc got thrashed because -j4 was used on a
package which didnt support distcc, i say fix the package rather than munging up
portage code
Comment 11 Radek Podgorny 2005-07-23 11:18:30 UTC
Ok, so should I create a new bug for "add portage feature to detect # of
available hosts and adjust the -j value"?
Comment 12 Alec Warner (RETIRED) archtester gentoo-dev Security 2005-07-23 15:33:41 UTC
This portage feature would depend on Zeroconf and friends, which currently means 
Howl ( thanks to a quick google search ).  I am not sure how suitable that would 
be inside of an ebuild ( it would need to be either part of portaged or some 
other deamon to track nodes ).  It almost seems more of a clustering deal than a 
part of portage.
Comment 13 SpanKY gentoo-dev 2005-07-24 02:00:31 UTC
no, i already said that trying to auto-adjusting -j value depending on # of
available hosts really isnt feasible and would just add bloat to portage imho