Summary: | True parallel fetch for each job running (current parallel-fetch is asynchronous) | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Marcus Becker <marcus.disi> |
Component: | Enhancement/Feature Requests | Assignee: | Portage team <dev-portage> |
Status: | UNCONFIRMED --- | ||
Severity: | enhancement | CC: | joe, pacho, slashbeast, zmedico |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 377365 |
Description
Marcus Becker
2012-07-10 15:15:39 UTC
If you have a slow connection, adding more fetch jobs to be processed at once will NOT help anything, it will just stall in a different way. But if one package has ~50mb to download and it stalls a 500kb package, I think it would be an inprovement. How many jobs you want to run can be set in the make.conf anyway. (In reply to comment #0) > 1. It starts with 4 jobs, downloads the first, then the second etc. It will actually download all 4 in parallel. The relevant code is here: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=ef58bc7573ddce5e3a5466eea50160b81de8edf4 When downloading in parallel, each fetcher's output goes to the corresponding build log (and that part of the build log is discarded if the fetch is successful). > 2. lets say the first was a small package of ~500kb and is already done and > the third is a larger one etc. > 3. at some stage only 1 job is running because other jobs have to wait for > it to be downloaded Something like this could happen if all other jobs depend on the one that's currently being fetched/logged in /var/log/emerge-fetch.log. In order to fix this, we'd have to create separate logs for each fetcher. (In reply to comment #1) > If you have a slow connection, adding more fetch jobs to be processed at > once will NOT help anything, it will just stall in a different way. We can add a --fetch-jobs=N option so that people can tune the number of concurrent fetch jobs for their connection speed. One example:
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Starting parallel fetch
>>> Emerging (1 of 13) sys-kernel/linux-firmware-20120708
>>> Emerging (2 of 13) media-libs/libpng-1.5.12
>>> Jobs: 0 of 13 complete, 1 running Load avg: 1.07, 0.86, 0.88
since linux-firmware is 15M to download, it stalls the other jobs?
(In reply to comment #4) > One example: > Calculating dependencies... done! > >>> Verifying ebuild manifests > >>> Starting parallel fetch > >>> Emerging (1 of 13) sys-kernel/linux-firmware-20120708 > >>> Emerging (2 of 13) media-libs/libpng-1.5.12 > >>> Jobs: 0 of 13 complete, 1 running Load avg: 1.07, 0.86, 0.88 > > since linux-firmware is 15M to download, it stalls the other jobs? Well, you could be looking at a case of bug 403895 there, which is fixed in portage-2.1.11.x. slashbeast: I remember we talked about this a few months ago, is there another bug for this or is it just this one? I have memory of discussing it I think in regard to golang projects dependencies taking ages to fetch when we GODEP was a thing but I cannot find a bug for it that I've opened so perhaps I never created it. This seems valid though. Also, can we rename parallel-fetch to background-fetch? OP makes a good point that anyone seeing this option is going to think it relates to jobs. I now have multiple packages with >100 dependencies to download (blame go, rust, node stuff) and most are only a few hundred KB, each one takes a few seconds to communicate with the mirrors. It adds up to many minutes. A true parallel fetch with multiple fetch jobs at a time would greatly reduce this. dev-vcs/repo seems to default to 4 for fetching git repos (github doesn't seem to like it when going much higher, but 4 has been bulletproof). HTTPS mirror fetching we could probably safely go even higher... (In reply to Joe Kappus from comment #8) > Also, can we rename parallel-fetch to background-fetch? OP makes a good > point that anyone seeing this option is going to think it relates to jobs. Yeah, or maybe background-prefetch (internals refer to the corresponding fetcher as prefetchers). It feels kind of crazy to rename it after it has existed for nearly 20 years now, so maybe we should just update the documentation to compare/contrast with the sort of parallel fetch that can happen with emerge --jobs. > I now have multiple packages with >100 dependencies to download (blame go, > rust, node stuff) and most are only a few hundred KB, each one takes a few > seconds to communicate with the mirrors. It adds up to many minutes. > > A true parallel fetch with multiple fetch jobs at a time would greatly > reduce this. dev-vcs/repo seems to default to 4 for fetching git repos > (github doesn't seem to like it when going much higher, but 4 has been > bulletproof). HTTPS mirror fetching we could probably safely go even > higher... I'm thinking about how we could handle the logging here. I suppose in this case we could simply send the fetch output to /dev/null (that's what parallel fetch originally did in https://gitweb.gentoo.org/proj/portage.git/commit/?id=0e5af163b1fe7cb5ec9101930ce0905713ed775b), then retry serially with logging for anything that failed. |