I use distcc to spread compiling over two very fast PC's. But both of them have little memory. Setting MAKEOPTS="-j4" makes for very fast distributed building, but if a build fails to make use of distcc (eg mythtv), then the memory gets exhausted, and compiling grinds to a crawl while the swap gets thrashed. For example, tonight I tried to compile the latest mythtv, kicked it off and then left the computers on for several hours, came back and discovered (to my frustration) almost no progress had been made. Is it possible to get portage to somehow drop the value of -j4 in these situations? Reproducible: Always Steps to Reproduce: 1.Install and configure distcc on multiple hosts 2.Configure portage for a -jN appropriate to the number of nodes 2.Start a build that doesn't make use of distcc Actual Results: Multiple (N) concurrent gcc processes will run on the local machine, causing poor performance as they compete for the CPU and memory. Expected Results: The number of concurrent processes should be dropped.
There is no way to detect failure. You could use -l instead of -j. Check the manpage.
According to Nicholas Jones, this is not a matter of failure detection, it is an matter of cpu/memory power. Runing a compile with: FEATURES="distcc" MAKEOPTS="-j10" distributed on 6 distcc-server leads to a very fast compile. If distributed compiling of the programm you want to compile was tuned off by the programmmaintainer, your compile will be very very slown because now distcc start the compile local with -j10 Option ! Now you have 10 c++ jobs on your; e.g weak; local machine, not 10 c++ jobs distributed on 6 servers on your distcc-farm! This is an big difference. Your load gets high, your swappspace runs full and 98% and more of your processes are in waiting state..i bet you're finish the compile faster by writing the code on paper ;-) In my humble opinion, when maintainers turn of distributed compiling in their ebuilds, either they, or better portage, should detec this and automatical reduce MAKEOPTS to -j1 oder -2 for this ebuild. There is a big difference in runing compile jobs with -j1 or -j10 on one and the same machine. I read manpages for make, distcc, gcc and make.conf, there is no -I Option which reduces in case of an fallover the amount of compilingjobs... Thia, Peter.
"...compile was tuned off by the programmmaintainer..." Turned off how?
(In reply to comment #3) > "...compile was tuned off by the programmmaintainer..." > > Turned off how? I think what Peter menas are packages that disables parallel compiling, such as Mozilla. Those packages override MAKEOPTS with -j1, so this is a non-issue. @Marius: The option you want is -l (small L), not -I (big i)
There's a special case where the parallel compile isn't disabled but distcc is: the gcc compilation does it by overriding $CC. I have a 5 system compile farm, MAKEOPTS=-j8 and I'm currently waiting on a 96M Xen domain-0 to finish its gcc build in SWAP... I was considering proposing to make the problematic ebuilds override MAKEOPTS themselves but in fact there's no way to set a "good for everyone" -jX value. I'm not sure -lY is a good solution. Aren't each distcc client counting as a process waiting for I/O? If you want to distribute the load in roughly the same way than '-jX' allows, you'll probably set Y not far from X. But waiting on TCP I/O isn't nearly as bad as waiting on disk I/O. So you'll end up having bad performance when distcc isn't available too. I'm currently testing with '-l3' instead of '-j8' in order to test the actual behaviour with distcc. If '-lY' doesn't work, one first step would be to define a "NODISTCC_MAKEOPTS" alternative that would be configurable in make.conf and usable by "distcc-disabling" ebuild. One other (better?) way could be to better integrate distcc with emerge by having emerge parse the /etc/distcc/hosts file, attempt a connection (either tcp or ssh depending on the type of entry) to each host and for those that fail dynamically substract the number of parallel jobs (with a minimum) allowed by MAKEOPTS by the amount supported by the /etc/distcc/hosts.
All those problems could be solved by zeroconf/rendezvous.
...actually, it does not solve -jX problem itself but helps to manage that automatically. See bug 80219.
i dont think ive seen a case yet where a build failure due to distcc wasnt due to issues in the build system itself with running in parallel
Are you sure you want to close this one? I think the main spirit was not about distcc crashing. It was about selecting the -j parameter automatically depending on the number of available build hosts. It's quite insane to run a compile with "default" (on my farm) -j10 when you're disconnected from the network (it swaps it's ass off)...
auto-adjusting -j value depending on # of available hosts really isnt feasible i dont think in the original report where the pc got thrashed because -j4 was used on a package which didnt support distcc, i say fix the package rather than munging up portage code
Ok, so should I create a new bug for "add portage feature to detect # of available hosts and adjust the -j value"?
This portage feature would depend on Zeroconf and friends, which currently means Howl ( thanks to a quick google search ). I am not sure how suitable that would be inside of an ebuild ( it would need to be either part of portaged or some other deamon to track nodes ). It almost seems more of a clustering deal than a part of portage.
no, i already said that trying to auto-adjusting -j value depending on # of available hosts really isnt feasible and would just add bloat to portage imho