Handling of snapshot seems broken Output from emerge-delta-webrsync: Looking for available base versions for a delta fetching patches failed fetching snapshot-20050510-20050511.patch.bz2.md5sum patch_fh size=109142 patch_type=8 verbosity level(1) src_fh size=19153763 disabling bufferless, patch_count(1) == 1 || forced_reorder(1) size1=176476160, size2=176619520 reconstruction return=0, commands=16193 result was 16193 commands versions size is 176619520 applied 1 patches reordering commands? 1 reconstructing target file based off of dcbuff commands... collapsing processing src 0: 10457 commands. processing src 1: 5736 commands. reconstruction completed successfully re-compressing verifying uncompressed md5 recompressing. portage-20050510.tar: 9.216:1, 0.868 bits/byte, 89.15% saved, 176619520 in, 19164042 out. verifying generated tarball compressed md5 differs, but uncompressed md5 says it right. bzip2 version incompatability in other words saving the md5 beginning update to the tree Syncing local tree... [ output of tree sync ] cleansing removing portage-20050509.tar.bz2 done. In DISTDIR I now have # md5sum portage-200*.tar.bz2 c4728f3a70885597faf00cc32eea2a8a portage-20050509.tar.bz2 c88c07644dc0392dc2692058af95b5d2 portage-20050510.tar.bz2 # cat portage-200*.tar.bz2.md5sum c4728f3a70885597faf00cc32eea2a8a portage-20050509.tar.bz2 6da0b5021c4a946b42c99f4b4bbbb0a1 portage-20050510.tar.bz2 Errors: 1. In contrast to output portage-20050509.tar.bz2 was _not_ removed. 2. In contrast to output md5sum of portage-20050510.tar.bz2 was _not_ saved. md5sum is broken instead, which should lead to errors on subsequent runs.
Ok, I overlocked /var/delta-webrsync/ where correct checksums for generated tarballs are stored. Thus, Incremental update works: Looking for available base versions for a delta fetching patches failed fetching snapshot-20050511-20050512.patch.bz2.md5sum patch_fh size=76934 patch_type=8 verbosity level(1) src_fh size=19164042 disabling bufferless, patch_count(1) == 1 || forced_reorder(1) size1=176619520, size2=176885760 reconstruction return=0, commands=11369 result was 11369 commands versions size is 176885760 applied 1 patches reordering commands? 1 reconstructing target file based off of dcbuff commands... collapsing processing src 0: 7423 commands. processing src 1: 3946 commands. reconstruction completed successfully re-compressing verifying uncompressed md5 recompressing. portage-20050511.tar: 9.206:1, 0.869 bits/byte, 89.14% saved, 176885760 in, 19214072 out. verifying generated tarball compressed md5 differs, but uncompressed md5 says it right. bzip2 version incompatability in other words saving the md5 beginning update to the tree Syncing local tree... [ tree sync ] cleansing removing portage-20050510.tar.bz2 removing portage-20050509.tar.bz2 done. Please remove ", which should lead to errors on subsequent runs" from Error 2 in original bug report. Proposal: Store all generated data, i.e. generated tarballs as well, in /var/delta-webrsync/, and do not delete or create files in DISTDIR (exception: downloaded full snapshots and their md5sums).
Hmm. yeah, shoving the md5 differing tarball elsewhere is possible I guess. Regarding blocking removal of files, -k should do it for you also. File creation *will* occur within $DISTDIR, although I can see the point of shifting the compresed md5 differing target elsewhere.
error 1: I manually removed $DISTDIR/portage-200* and put in a backup of full snapshot portage-20050509.tar.bz2 with its portage-20050509.tar.bz2.md5sum. All snapshots and md5sums with the exception of portage-20050516.tar.bz2 and portage-20050516.tar.bz2.md5sum were now removed, so error 1 of the original bug report seems to be fixed. error 2: The reason why I do not like the generated snapshot file being stored in DISTDIR is that DISTDIR should only contain files that were downloaded from the internet (DISTDIR is a kind of cache), and the generated snapshot is not bytewise identical to the downloadable file of the same name. You could call it a cache corruption. So, what is the reason you do not want to put generated snapshots in /var/delta-webrsync/ ?
Reasoning is that the tarball *is* valid. it's just issues with how we do verification. Shifting it into a seperate dir would work for the time being, since the updated md5 isn't stored in $DISTDIR also...
Thanks for your great programs, Brian! Here the old snapshots are not removed from DISTDIR, too. This solves the problem: --- emerge-delta-webrsync.~1~ 2005-05-11 15:36:43.000000000 +0200 +++ emerge-delta-webrsync 2005-05-18 19:52:49.000000000 +0200 @@ -423,7 +423,7 @@ echo "cleansing" for x in $potentials; do echo "removing ${x}" - rm "${x}" "${x}.md5sum" &> /dev/null + rm "${DISTDIR}/${x}" "${DISTDIR}/${x}.md5sum" &> /dev/null rm "${STATE_DIR}/${x}.md5sum" "${STATE_DIR}/${x}.umd5sum" &> /dev/null done fi It'd be nice if your script would detect the use of the dynamic deltup server network (app-portage/getdelta, http://www.ddeltup.org/). Right now the scripts produce quite chaotic output when used in conjunction because the FETCHCOMMAND tries to download deltas for the patch, md5sum, umd5sum and for the snapshot, before it fetches them from your server (see the attached file sync.log). I think that your script should detect FETCHCOMMAND="/usr/bin/getdelta.sh \${URI}" and should use wget (or something similar) to download the files needed by emerge-delta-webrsync. Best regards Christian
Created attachment 59257 [details] sync.log log of emerge-delta-webrsync working together with getdelta.sh as FETCHCOMMAND
Some other things that have come to my mind: * The snapshot-200* files which are not needed anymore (e.g. snapshot-20050516-20050517.patch.bz2, snapshot-20050516-20050517.patch.bz2.md5sum) should be deleted from DISTDIR also. * Maybe the emerge-delta-webrsync ebuild should take care that it cannot be executed by someone who is not in the portage group. Right now it can be executed by everyone (although this does not harm anybody, I think): -rwxr-xr-x 1 root root 11107 May 18 19:52 emerge-delta-webrsync * I agree with sf that the snapshots should not be built inside DISTDIR.
Reconstruction will occur in /tmp or $PORTAGE_TMPDIR, re: the DISTDIR deletion, yeah, aparently I suck (you're right) :) getdelta... hmm. I don't like special cases much. Possible alternatives? Keep in mind that this script has a *finite* lifespan to it, it's already been removed from cvs head of portage; my intention is to have it handled by a subclass of the sync.snapshot class if/when head hits. Re: special casing, the issue I see is that if I special case it now, I'll have to special case it in the class, which gets *really* hard since FETCHCOMMAND has been abstracted away, into transports.fetchcommand . I can see doing it as a temp hack, since like I said, emerge-delta-webrsync has a limited shelf life till a proper integration occurs. The special casing/hack won't fly down the line though, since the users defined FETCHCOMMAND is just that, what they told portage to use. So... long term? By the way, regarding doing a release with the fix, I'll -r1 it. Any further tweaks will have to wait a bit, hopefully within a week. A replacement for the untar/rsync portion is finally working, exempting supporting --excludes... so finishing that sucker off, and intending on next major bump using that. Based off of a good chunk of diffballs code, so it's got the usual support for working from compressed files, handling tarballs, yadda yadda yadda. Main thing is it removes the need to untar, thus is quite a bit faster overhaul (bit faster then rsync anyways, for some screwed up reason).
(In reply to comment #8) > getdelta... hmm. I don't like special cases much. Possible alternatives? > Keep in mind that this script has a *finite* lifespan to it, it's already been > removed from cvs head of portage; my intention is to have it handled by a > subclass of the sync.snapshot class if/when head hits. > Re: special casing, the issue I see is that if I special case it now, I'll have > to special case it in the class, which gets *really* hard since FETCHCOMMAND has > been abstracted away, into transports.fetchcommand . > I can see doing it as a temp hack, since like I said, emerge-delta-webrsync has > a limited shelf life till a proper integration occurs. The special casing/hack > won't fly down the line though, since the users defined FETCHCOMMAND is just > that, what they told portage to use. > So... long term? Thanks for this insight in portage development. So I guess I'll just have to live with it. It looks a little bit ugly, but it works great. So this is no problem. > A replacement for the untar/rsync portion is finally working, exempting > supporting --excludes... so finishing that sucker off, and intending on next > major bump using that. Based off of a good chunk of diffballs code, so it's got > the usual support for working from compressed files, handling tarballs, yadda > yadda yadda. Main thing is it removes the need to untar, thus is quite a bit > faster overhaul (bit faster then rsync anyways, for some screwed up reason). That's great news! I'm especially looking forward to the speed improvement because here it's *way slower* than rsync (AMD Duron 1300 with 256 MB RAM).
Re Comment #3 I do not know where I looked last time but all snapshots are still there (see comment #5).
InCVS If you're game for testing out tarsync, emerge app-arch/tarsync. emerge-delta-webrsync 3 will detect it, and use it (it'll also go parallel in recompressing for you smp folk, since tarsync is io bound and bzip2 is a bit more proc bound). Enjoy.