Compiling and downloading of software takes a very long time in Gentoo when the hardware (e.g. Celeron 850 / 128 mb ) and the connection speeds (256 - 64 kbps link from India) are limited. Instead of having to wait for a particular compile to finish before the download of the next package, a download for packages that will be needed if all goes well with the compile of the current package (as most do) should start and be "backgrounded". To illustrate, gcc, gettext and glibc are downloaded while binutils is being built.
I'm going to pass this one to carpaski as he has already written the code once and is working on a rewrite to synch up with drobbins' latest code... it worked and various versions are available at http://gentoo.twobit.net/scripts/portage/threaded Nick, get to work! :-D
*** Bug 4553 has been marked as a duplicate of this bug. ***
*** Bug 2832 has been marked as a duplicate of this bug. ***
*** Bug 3944 has been marked as a duplicate of this bug. ***
*** Bug 5868 has been marked as a duplicate of this bug. ***
From my [gentoo-dev] mail: http://lists.gentoo.org/pipermail/gentoo-dev/2002-August/014047.html -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- I've been working on code to make downloading occur continuously during the emerge process. I have new, clean, and working code that needs a bit more testing that I can give it. I'd appreciate some testing. It includes a couple subtle enhancements also. Try 'emerge -ef world' to see the "(X of Y)" counts. At present, the output is a little noisy because wget and builds are happening at the same time. I HIGHLY recommend adding "-q" to wget's FETCH_COMMAND in make.globals/make.conf. Please CC me (carpaski@gentoo.org) with any bugs that you think you come across. PLEASE follow the directions in the README.txt. http://gentoo.twobit.net/portage/threaded/ It does as follows: 1. Downloads begin and do not pause until they are completed. 2. Merges occur in order as the required files finish downloading. 3. Failed merges will stop the merging of packages at that point, but fetching will continue until all files are downloaded. 4. Failed fetches will be announced, but WILL NOT stop the fetching. Merges will continue until the failed fetch is reached. Fetches will continue until all fetches have completed. 5. Failures in the fetching and/or failure in the merge will be noted when both the fetching and the merging are completed. Only one notice for the fetches is given. Only one notice of the failed merge is given. Thanks, -- Nicholas Jones -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Another note... After trolling the forums I saw that the use of prozilla across mirrors was improving download rates. You might consider trying this as well.
Harmless --fetchonly traceback fixed in -r2
Update to 2.0.24 available
2.0.25 update available. Any problems? Goodness/Badness reports? http://gentoo.twobit.net/portage/threaded/
2.0.26 available. Any commentary on functionality... Things that might be nice... Anything you can think of would be appreciated.
2.0.27 is up.
It appears to work fine though wget and compile output being spit out together doesn't make for good reading :)
worked fine for me, too. Tried with gentoo1.4b stage1. sed '22,23s:"$: -a /tmp/port_log":' /etc/make.global and the output is ok.
It works under Portage 2.0.27, except that one package failed to download which killed the rest of the install process.
huh? how does this work? i am emerging world and i dont see it downloading something as it is compiling.
*** Bug 6413 has been marked as a duplicate of this bug. ***
Is there any estimation as when we can see this functionality merged into the official portage?
Ximian bind # emerge bind-9.2.2_rc1.ebuild Calculating dependencies ...done! <<< Fetching daemon started (1 packages) <<< Fetching (1 of 1) net-dns/bind-9.2.2_rc1. !!! doebuild: /usr/portage/net-dns/bind/bind-9.2.2_rc1.ebuild not found. !!! Fetching failed on net-dns/bind-9.2.2_rc1. <<< Fetching ended. !!! Fetching failed. Ximian bind # pwd /usr/myportage/net-dns/bind Ximian bind # grep PORTDIR_OVERLAY /etc/make.conf PORTDIR_OVERLAY="/usr/myportage"
Posted 2.0.35 ... I still need to add in PORTAGE_OVERLAY support to it.
*** Bug 5917 has been marked as a duplicate of this bug. ***
*** Bug 8535 has been marked as a duplicate of this bug. ***
Is this stable? Is it planned to be integrated into the "official" version soon? What can we do to help?
Well... This isn't exactly stable. There's a shared memory issue caused by globals in portage, and I'll have to find some time to make a fair large change to portage before I cna get this working properly.
*** Bug 9288 has been marked as a duplicate of this bug. ***
*** Bug 8953 has been marked as a duplicate of this bug. ***
*** Bug 9441 has been marked as a duplicate of this bug. ***
*** Bug 9851 has been marked as a duplicate of this bug. ***
*** Bug 10142 has been marked as a duplicate of this bug. ***
*** Bug 11769 has been marked as a duplicate of this bug. ***
Are there any still any development being done to add this feature to portage? what is the status if so?
*** Bug 12595 has been marked as a duplicate of this bug. ***
*** Bug 13362 has been marked as a duplicate of this bug. ***
If this functionality is not going to be implemented in standard Portage any time soon, might it be a good idea to at least have Portage implement advisory locking, so that multiple instances of emerge will refuse to run? This would be especially helpful on machines administerered by several people.
Is this going to be added soon? I very much would appreciate that.
*** Bug 14802 has been marked as a duplicate of this bug. ***
If this can't be implemented anytime soon how about at least a fetch before compile option? ie. emerge --fetch_only kde fetches all packages and deps then exit. and another option to emerge --fetchall_then_compile kde ?? It's a good compromise I think.
Try emerge -f kde.
Maybe it could be done this way. User runs: emerge kde -something. portage downloads the first package, and starts compiling it, while it downloads the second package. Lets say it finishes compiling package #12 before downloading of package #13 is complete. Normally this would mess up portage, but how about this: portage waits until downloading of package #13 is complete, and begins compiling it once done, while downloading package #14. If this is done from the same emerge command, it wouldn't mess up portage. I have no idea on how to switch between looking at the compiling and looking at the downloading, but that should't be difficult to fix.
*** Bug 18332 has been marked as a duplicate of this bug. ***
*** Bug 18408 has been marked as a duplicate of this bug. ***
*** Bug 18656 has been marked as a duplicate of this bug. ***
I have written a little patch that seems to work fine for me. You can emerge -fu world in one tty then emerge -u world on another one. Before compiling, the second emerge will wait until files are downloaded (if they are being downloaded). And obviously, the first emerge will keep downloading while the other compiles. Given its simplicity, I'd suggest adding this little funtionality into portage. Use the patch at your own risk, though. Patch for /usr/lib/python2.2/site-packages/portage.py, tested only on portage 2.0.47-r10 follows: (please back up your original portage.py, just in case) 13a14,38 > #/kl4rk #TODO: check permissions on the locks directory and on locked files? > > lockdir="var/tmp/portage/" > def lockfile(filename,verbose=1): > "Locks on /lockdir/filename, creates the lock file if necessary" > #fd=open(root+lockdir+md5.new(filename).hexdigest(),'w') > fd=open(root+lockdir+filename,'w') > if verbose: > try: > fcntl.lockf(fd,fcntl.LOCK_EX|fcntl.LOCK_NB) > except IOError: > print green(" * ")+"Another process is holding a lock on '"+filename+"'" > print green(" * ")+"Waiting.. ", > sys.stdout.flush() > fcntl.lockf(fd,fcntl.LOCK_EX) > print "Resuming." > else: > fcntl.lockf(fd,fcntl.LOCK_EX) > return fd > > def unlockfile(fd): > fcntl.lockf(fd,fcntl.LOCK_UN) > fd.close() > #kl4rk/ > 999a1025 > 1062a1089,1093 > > #/kl4rk > myfilefd=lockfile(myfile) > #kl4rk/ > 1122a1154,1156 > #/kl4rk > unlockfile(myfilefd) > #kl4rk/ 1235c1269 < --- >
*** Bug 19999 has been marked as a duplicate of this bug. ***
Josep, That patch of yours works for me. It'd be great to see this added to the official portage asap ;-) Thanks, Stu
Created attachment 11635 [details, diff] patch for /usr/lib/python2.2/site-packages/portage.py (portage 2.0.47-r10) This is a new version of the patch i posted before, but with better coding style and without stupid comments. Just to make it more serious.
*** Bug 15790 has been marked as a duplicate of this bug. ***
Created attachment 13367 [details, diff] updated patch for /usr/lib/python2.2/site-packages/portage.py (portage 2.0.48-r1) Update of the patch I submitted. Should I contact anyone to ask them to merge this into the official portage? Did anyone else try this? I loved to read Stuart's reply :-)
Created attachment 13944 [details, diff] Patch for portage.py that uses environment variables for the lockdir The location of the "lockdir" in the original patchfile relies on "/var/log/portage". This causes it not to work on systems who have changed the "PORTAGE_TMPDIR" in their "make.conf". I have updated the "lockdir" to take from "settings["BUILD_PREFIX"]" which is "PORTAGE_TMPDIR/portage". This allows continued functionality if "PORTAGE_TMPDIR" or the "portage" directories change names. The variable "lockdir" has also been placed inside the "def lockfile" since the variable "settings" isn't available until after a call to "doebuild" and it doesn't really need a global scope as far as I can tell.
Eh, my comment looks horrible, bare with me...my first patch ever ;) -Chris
Yes, Josep Sanjuas patch works for me nicely. Although it's not very userfriendly... Is there no possibility to integrate the calling of "emerge -fU @args" into the normal "emerge -U @args"? Would love to test this patch.
*** Bug 21192 has been marked as a duplicate of this bug. ***
*** Bug 27153 has been marked as a duplicate of this bug. ***
I have made a multi-threaded C program that takes care of this idea. See (http://forums.gentoo.org/viewtopic.php?p=486763#486763). This would perhaps be a good workaround until this is finally integrated into Portage. It depends on being able to run emerge to install a package while using emerge to download one. I'm not sure of whether or not that will b0rk Portage, but it seems to work OK for me, at least until Portage can do it by itself.
*** Bug 28465 has been marked as a duplicate of this bug. ***
*** Bug 19367 has been marked as a duplicate of this bug. ***
*** Bug 29402 has been marked as a duplicate of this bug. ***
*** Bug 29763 has been marked as a duplicate of this bug. ***
*** Bug 36532 has been marked as a duplicate of this bug. ***
*** Bug 36659 has been marked as a duplicate of this bug. ***
This will be coming about in the not-too-distant future. Lockfiles are in place for the DB already and the globals are disappearing rapidly.
*** Bug 38760 has been marked as a duplicate of this bug. ***
*** Bug 40500 has been marked as a duplicate of this bug. ***
*** Bug 47124 has been marked as a duplicate of this bug. ***
Is this enhancement dead or maintained and managed elsewhere?
Can this be implemented? everything i read sounds like portage is really needing this sort of duality of merging/fetching to make one's life easier. At this time i can not start a emerge -ef world and then after a few package have downloaded safely start a emerge -e world, as it happens your internet connection slows down the day you run a emerge -ef and emerge -e at the same time and eventually your PC gets fast enough and yay the -e world caught up to -ef world and both touch the same tarball, md5 bad, both fail and to make it all better emerge --resume decided to resume the -ef world to be what needs to be done. i will be trying the supplied patches once this emerge -e world finishes the last important comment was from december 31st last year, what is the current official stance on file locking/parallel merging/fetching (threaded or not) it would even help for this case to have the option to tell emerge to --retry-fetch and the other to --skip-fetch to signal what to do in the case a bad md5 (from a dual accessed file on download or otherwise) one would happily skip over it and keep on fetching, the other would retry until it gets a good md5, i guess --retry-fetch=max_retries should be specified for those times the md5 is just bad, period maybe this approach is simpler/easier to implement? now i have more reason to hope this -e world finishes soon
Created attachment 35074 [details, diff] updated patch for /usr/lib/portage/pym/portage.py (portage 2.0.50-r8) This is almost a one-line-patch because newer versions already have the locking/unlocking functions defined. Works for me
Josep- that's a helluva lot cleaner then what I worked up. You're right, it pretty much comes down to a lock and unlock call. Also, need to lock /var/log/emerge.log to prevent messages getting mixed during writing. Either way, commiting fetch locking shortly- it shouldn't eat your fs, but if it does tough cookies (and file a comment here) :) Note this isn't a final solution- a final solution would be jstubb's mergemanager. I'm adding the locking to try and prevent corruption of distfiles (and the log file when running multiple emerges). If an emerge instance hits an active lock in fetch, it waits for the lock- this is kind of ugly behaviour, since there is no indication of what's happening. I'd rather not dirty up fetch w/ comments along the lines of 'acquiring lock on %s', 'releasing lock on %s'. I'll leave that to a correct solution. Either way, been testing it w/ (emerge -f blah &); emerge blah, locks are working fine. InCVS, should show in pre14.
*** Bug 71619 has been marked as a duplicate of this bug. ***
Update from cvs, atm, parallelized fetching/compiling is commited under FEATURES="parallel-fetch", and requires FEATURES="distlocks" to be on.
*** Bug 75367 has been marked as a duplicate of this bug. ***
*** Bug 85324 has been marked as a duplicate of this bug. ***
*** Bug 87296 has been marked as a duplicate of this bug. ***
*** Bug 89412 has been marked as a duplicate of this bug. ***
Portage should have a better conception. (poormans proposal) The few actions for each ebuild should consist of atomic actions. - Fetch - Unpack - Compile - Install - Collision-protect (optional) - Merge - Config (or whatever it might look like) Merging multiple packages meens playing with dependencies. The dependencies should be calculated on the actions. examples to explain: - Fetching: always allowed - Unpacking: unpack x packages to the queue - Compiling: compile x packages parallel - if dependencies are merged - Install: always allowed - Merge: always / or maybe restriction: not parallel - Config: maybe on a seperate terminal, that admins easily track all the important messages... These atomic actions could be handled in a continuously modular way on multiple queues. Locking and Unlocking needs to be handled clearly. examples: - qt and xorg-x11 could not be compiled together (but fetched, unpacked) - openoffice and gnomemeeting could be comipled parallel Advantages: - distcc would behave far better (even incl. casas of -j1) - installing packages on a computerfarm would be a picknick - parallel fetching / merging resolved - maybe better load balance of diskaccess and cpu - would make gentoo a favorite distro for compiling reasons Disadvantages: - Modular rewrite of portage with great effor needed (so no earlier than in portage 4.0) - The dependencies of the packages need to be handled while running New features Pro and Con: (would be great but needs lot of work and conception) - emerge.log needs new format to idetify the atomic actions, but parallel merges could be better tracked than today - a new feature "load logger" could maybe be integrated, calculating the GUs used to compile serving as a base for stats on packages compiling (might be tricky to integrate it into distcc) - based on stats the progress could be estimated.
The plan is to essentially have two "atoms". One is fetch and the other is build. A separate upper limit will be specifiable for both and portage will do as many in parallel as it can based on dependencies.
Theese plans sound cool, even if there are only two atomic steps. When might it be integrated, is there already some testing version around, not mentioned in this bug?
I did have it working at one stage, but the code went in a completely different direction to where portage _was_ going. CVS HEAD currently has a single parallel fetch thread which will be available in the next major portage version. The major version following that will contain many fixes for dependency calculation and will enable proper parallelization of all tasks. Timelines? Next major version will probably go stable in around 6 months. Can't really make any reasonable estimate on the following version at this stage.
Thanks for your info. Of course it's estimated 6months, but I'd be lucky to see this fixed once in future.
*** Bug 96554 has been marked as a duplicate of this bug. ***
*** Bug 97861 has been marked as a duplicate of this bug. ***
Putting a hold on feature requests for portage as they are drowning out the bugs. Most of these features should be available in the next major version of portage. But for the time being, they are just drowning out the major bugs and delaying the next version's progress. Any bugs that contain patches and any bugs for etc-update or dispatch-conf can be reopened. Sorry, I'm just not good enough with bugzilla. ;)
reopening...
released in 2.1_pre*, enabled via FEATURES="distlocks parallel-fetch"
*** Bug 129955 has been marked as a duplicate of this bug. ***
Why isn't parralel-fetch on by default yet?
(In reply to comment #87) > Why isn't parralel-fetch on by default yet? Portage currenly has no memory of which distfiles it has checksummed and their timestamps. Due to this issue, if you're emerging a large list of packages and you need to restart it (for example if one of the packages fails to build), the parallel-fetch causes all the files to be re-checksummed needlessly. The extra io overhead could be annoying in some circumstances, so it's disabled by default.
*** Bug 132490 has been marked as a duplicate of this bug. ***
*** Bug 186047 has been marked as a duplicate of this bug. ***
Hello, I have a simple question: is it possible to resume a failed fetch in the background? I start an emerge world, it downloads x packages then inet connection fails. The background fetch stops while the compilation continues. At one point, inet is back while emerge still compiles. Can I restart the fetchonly in the background? (without interrupting the compilation) Thanks, Chris
(In reply to comment #91) You can't restart it "in the background" of running emerge instance, but you can do this: emerge --resume --fetchonly (In reply to comment #87) > Why isn't parralel-fetch on by default yet? It's enabled by default now, in >=portage-2.1.5.
Just something that comes out of my mind: Why don't emerge uses many mirror as advantages in downloading? For example assume emerge is downloading a huge package from 1st mirror, during that time (simultaneously) , It starts download the 2nd package from 2nd mirror, and so on (depending on some variable configured like MAX_PARALLEL_FETCH=3) Just some of my oppinion, what do you think?
(In reply to comment #93) > Just something that comes out of my mind: > Why don't emerge uses many mirror as advantages in downloading? > For example assume emerge is downloading a huge package from 1st mirror, during > that time (simultaneously) , It starts download the 2nd package from 2nd > mirror, and so on (depending on some variable configured like > MAX_PARALLEL_FETCH=3) > > Just some of my oppinion, what do you think? I've thought of adding an emerge --fetch-jobs option, which is similar to --jobs but controls the number of parallel-fetch threads instead of build threads. You'd be able to configure it as a default setting in the EMERGE_DEFAULT_OPTS variable.