Summary: | rsync refetches all Manifest files and profiles/use.local.desc every time | ||
---|---|---|---|
Product: | Gentoo Infrastructure | Reporter: | Ulrich Müller <ulm> |
Component: | Other | Assignee: | Gentoo Infrastructure <infra-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alexander, asl, axiator, azamat.hackimov, bertrand, bircoph, bug, che, cloos, dwfreed, grobian, holger, hydrapolic, junghans, konstantinos.smanis, maksbotan, pacho, pesa, phantom4, rdalek1967, rhill, salikov.alexey, sebastien.picavet, stilriv, zlogene, zmedico |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=557962 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 549914 |
Description
Ulrich Müller
2015-08-10 08:02:22 UTC
profiles/use.local.desc suffers from the same problem, BTW. (In reply to Ulrich Müller from comment #1) > profiles/use.local.desc suffers from the same problem, BTW. All the tree suffers from this problem, emerge --sync pulls everything again and again. Same here, multiple servers. Adds load to the mirrors, but not sure if it's enough to worry about. Add this to make.conf PORTAGE_RSYNC_EXTRA_OPTS="--checksum" this should either be made default or mirrors should be updated to --checksum from masterdistfiles or we should stop writing files that don't change, we are using repoman manifest in parallel right now to do the signing, if someone has a better method that doesn't rewrite the files... (In reply to Matthew Thode ( prometheanfire ) from comment #4) > Adds load to the mirrors, but not sure if it's enough to worry about. > > Add this to make.conf > > PORTAGE_RSYNC_EXTRA_OPTS="--checksum" That's not an acceptable solution. With --checksum, rsync on the client's side must read all files, instead of just their inode. (In reply to Matthew Thode ( prometheanfire ) from comment #5) > [...] or > we should stop writing files that don't change, ^^ This, please. > we are using repoman manifest in parallel right now to do the signing, if > someone has a better method that doesn't rewrite the files... Run repoman only if some file in the dir is newer than the (previous) Manifest? My original scripts actually used 'git diff' to process only the directories that were changed. I don't know why this was replaced by far less efficient solution. probably because we were tired :P (In reply to Matthew Thode ( prometheanfire ) from comment #5) > we should stop writing files that don't change, we are using repoman > manifest in parallel right now to do the signing, if someone has a better > method that doesn't rewrite the files... Note that egencache --update-manifests was intended for this purpose. It does manifests in parallel with the --jobs and --load-average options. zmedico: --update-manifests seems to be extremely slow compared to the other variant we are using. time find -maxdepth 1 -mindepth 1 -type d \ | parallel --verbose -j4 'cd {} && repoman manifest' hot: wallclock ~1m10s, cputime ~4m14s cold: ~20min wallclock, 80min cpu --update-manifests: did not finish in 60 minutes with --jobs=4. egencache command was: egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \ --portdir=${STAGEDIR} \ --update-use-local-desc \ --update-manifests --thin-manifests=n \ --repo=gentoo \ >> ${REGEN_LOG_DIR}/${REGEN_LOG_FILE} 2>&1 dwfreed and I started to write a better solution, that was aware of how mtimes needed to be propagated in creating thick Manifests, but I've had some work travel, so it wasn't finished. (specifically, the mtime on the thick Manifest should be the greatest mtime of any file in a given package) (In reply to Robin Johnson from comment #10) > zmedico: > --update-manifests seems to be extremely slow compared to the other variant > we are using. Okay, I'll have to do some profiling to see what's wrong. > (specifically, the mtime on the thick Manifest should be the greatest mtime > of any file in a given package) But eclass changes can cause the DIST entries in the Manifest to change, and non-ebuild files are irrelevant to the Manifest when using thin-manifests. So, I think it would make sense to use the max mtime of the ebuilds and the eclasses they inherit. (In reply to Zac Medico from comment #11) > (In reply to Robin Johnson from comment #10) > > zmedico: > > --update-manifests seems to be extremely slow compared to the other variant > > we are using. > > Okay, I'll have to do some profiling to see what's wrong. > > > (specifically, the mtime on the thick Manifest should be the greatest mtime > > of any file in a given package) > > But eclass changes can cause the DIST entries in the Manifest to change, and > non-ebuild files are irrelevant to the Manifest when using thin-manifests. > So, I think it would make sense to use the max mtime of the ebuilds and the > eclasses they inherit. If an eclass changes the resulting set of DIST entries, wouldn't you have to regenerate the thin manifest? In which case, it'd propagate down to the thick manifest anyway. (In reply to dwfreed from comment #12) > If an eclass changes the resulting set of DIST entries, wouldn't you have to > regenerate the thin manifest? In which case, it'd propagate down to the > thick manifest anyway. Yes, that makes sense. So, "the mtime on the thick Manifest should be the greatest mtime of any file in a given package" is correct if you include the thin Manifest mtime. (In reply to Zac Medico from comment #13) > (In reply to dwfreed from comment #12) > > If an eclass changes the resulting set of DIST entries, wouldn't you have to > > regenerate the thin manifest? In which case, it'd propagate down to the > > thick manifest anyway. > > Yes, that makes sense. So, "the mtime on the thick Manifest should be the > greatest mtime of any file in a given package" is correct if you include the > thin Manifest mtime. Which I am :) For the record, here's my script: https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy (In reply to dwfreed from comment #14) > For the record, here's my script: > > https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy The mtime code seems like it should work. I've filed bug 557962 to add similar behavior directly to portage's Manifest.write() method. I wonder if the --update-manifests code of egencache is running before the regular --update code. In my initial implementation of my script, I was not running egencache --update on the repo first, and I noticed that FetchlistDict creation was taking forever; a simple strace showed me that the portage code was sourcing the ebuild to assemble the FetchlistDict. The results of this sourcing would then be otherwise wasted, and so it would have to source the ebuild again for md5-cache generation. If md5-cache generation is run first, this is used for FetchlistDict generation, and everything is much faster. It doesn't explain why sourcing ebuilds for FetchlistDict generation is orders of magnitude slower than sourcing ebuilds for md5-cache generation, but might explain a large part of it. (In reply to dwfreed from comment #16) > The results of this sourcing would then be otherwise wasted, and so it would > have to source the ebuild again for md5-cache generation. If md5-cache > generation is run first, this is used for FetchlistDict generation, and > everything is much faster. It doesn't explain why sourcing ebuilds for > FetchlistDict generation is orders of magnitude slower than sourcing ebuilds > for md5-cache generation, but might explain a large part of it. One thing that makes it slower is that it does not parallelize the cache generation in this case. The cache is generated on-demand in the main process, so the required time will be comparable to egencache --update --jobs=1. It's just not optimized for this case, since the assumption is that you will either call egencache --update either before or together with --update-manifests. possible fix deployed -find ${STAGEDIR} -maxdepth 1 -mindepth 1 -type d | parallel -j4 'cd {} && repoman manifest' >>${REGEN_LOG_DIR}/${REGEN_LOG_FILE} + +# copy cached manifests to avoid regen +for manifest in ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/*/*/Manifest; do + manifest=${manifest#${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/} + # redirect to stderr for packages that were just removed + cp ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/${manifest} ${STAGEDIR}/${manifest} 2>/dev/null +done + +# look for ebuilds that have been touched and need manifests regenerated +new_manifest_list=$(find ${BASE}/exports/gentoo-x86/ -type f -name Manifest -mmin -30) +# regerate those manifests +for manifest in ${new_manifest_list}; do + package_dir=${manifest%/Manifest} + pushd ${package_dir} + repoman manifest + popd +done # for egencache, set user/group or make sure the user is in the portage group #--update-changelogs \ @@ -109,6 +125,9 @@ egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \ rval=$? echo "END REGEN $(date -u)" >> ${TIMESLOG} +# copy so we don't have to regen all the time +rsync -am --delete --include='*Manifest' --include='*/' --exclude='*' ${STAGEDIR} ${BASE}/gentoo-x86-manifest-scratch + prometheanfire showed me the exact script that's being used to generate the rsync tree, and the following changes will get rid of the nasty hack he added and use my script instead: 2015-08-17 06:44:36 <dwfreed> remove line 100-115, 122, and 129; then run my script at line 129 2015-08-17 06:46:11 <dwfreed> ./thicken-manifests.py -j 4 ${STAGEDIR} The first run should complete in under 20 minutes, and subsequent runs will complete in less than a minute. The -j (--jobs) flag defaults to the number of CPUs on the host, so if you want to give it all the cores, you can just omit it. my fix was didn't work for the manifests anyway FWIW, in Prefix we use http://hg.code.sf.net/p/gentooprefixtree/code/file/f225be4b4d76/scripts/rsync-generation/hashgen.c which does it in ~1m and restores the mtime of the Manifest before it was thickened. wtf? Revert the bad changes and stop bothering users. You see, it's not working (for days) - so what are you waiting for? (In reply to Per Pomsel from comment #22) > wtf? Revert the bad changes and stop bothering users. It's a by-product of the migration from CVS to git, and we can't really go back to CVS at this point. So, the only option is to fix the git -> rsync process. gentoo resources so limited that you can not keep for a while cvs and git? (In reply to salikov.alexey from comment #24) > gentoo resources so limited that you can not keep for a while cvs and git? We've still got the old cvs repo but it's read-only now. Developers now push everything to git, and cvs is only retained for historical purposes. I've patched Portage on dipper with patch from git which causes all Manifests to copy mtime over from the newest Manifested file. Hopefully this will cause one more full Manifest copy, and then things will get back to normal. Please let me know if it works for you. It seems to be working as advertised (one last full Manifest pull then things go back to normal). (In reply to Ulrich Müller from comment #1) > profiles/use.local.desc suffers from the same problem, BTW. This part isn't solved. use.local.desc is refetched even if it is unchanged. (In reply to Ulrich Müller from comment #28) > (In reply to Ulrich Müller from comment #1) > > profiles/use.local.desc suffers from the same problem, BTW. > > This part isn't solved. use.local.desc is refetched even if it is unchanged. There's a portage patch in this branch: https://github.com/zmedico/portage/commits/bug_557192 (In reply to Zac Medico from comment #29) > (In reply to Ulrich Müller from comment #28) > > (In reply to Ulrich Müller from comment #1) > > > profiles/use.local.desc suffers from the same problem, BTW. > > > > This part isn't solved. use.local.desc is refetched even if it is unchanged. > > There's a portage patch in this branch: > > https://github.com/zmedico/portage/commits/bug_557192 In the master branch now: https://gitweb.gentoo.org/proj/portage.git/commit/?id=041a81dd1b99d538620ea395d1cf1a36c47a7735 Released -n portage-2.2.21 |