Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 92224 - app-portage/emerge-delta-webrsync-2 snapshot handling
Summary: app-portage/emerge-delta-webrsync-2 snapshot handling
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Tools (show other bugs)
Hardware: All All
: High normal
Assignee: Brian Harring (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-11 01:25 UTC by sf
Modified: 2005-05-24 05:48 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
sync.log (sync.log,15.96 KB, text/plain)
2005-05-19 00:27 UTC, Christian Schlotter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sf 2005-05-11 01:25:15 UTC
Handling of snapshot seems broken

Output from emerge-delta-webrsync:

Looking for available base versions for a delta
fetching patches
failed fetching snapshot-20050510-20050511.patch.bz2.md5sum
patch_fh size=109142
patch_type=8
verbosity level(1)
src_fh size=19153763
disabling bufferless, patch_count(1) == 1 || forced_reorder(1)
size1=176476160, size2=176619520
reconstruction return=0, commands=16193
result was 16193 commands
versions size is 176619520
applied 1 patches
reordering commands? 1
reconstructing target file based off of dcbuff commands...
collapsing
processing src 0: 10457 commands.
processing src 1: 5736 commands.
reconstruction completed successfully
re-compressing
verifying uncompressed md5
recompressing.
  portage-20050510.tar:  9.216:1,  0.868 bits/byte, 89.15% saved, 176619520 in,
19164042 out.
verifying generated tarball 
compressed md5 differs, but uncompressed md5 says it right.  bzip2 version incompatability in other words
saving the md5
beginning update to the tree
Syncing local tree...
[ output of tree sync ]
cleansing
removing portage-20050509.tar.bz2
done.

In DISTDIR I now have

# md5sum portage-200*.tar.bz2
c4728f3a70885597faf00cc32eea2a8a  portage-20050509.tar.bz2
c88c07644dc0392dc2692058af95b5d2  portage-20050510.tar.bz2

# cat portage-200*.tar.bz2.md5sum
c4728f3a70885597faf00cc32eea2a8a  portage-20050509.tar.bz2
6da0b5021c4a946b42c99f4b4bbbb0a1  portage-20050510.tar.bz2

Errors:
1. In contrast to output portage-20050509.tar.bz2 was _not_ removed.
2. In contrast to output md5sum of portage-20050510.tar.bz2 was _not_ saved. md5sum is broken instead, which should lead to errors on subsequent runs.
Comment 1 sf 2005-05-12 03:34:41 UTC
Ok, I overlocked /var/delta-webrsync/ where correct checksums for generated tarballs are stored.

Thus, Incremental update works:

Looking for available base versions for a delta
fetching patches
failed fetching snapshot-20050511-20050512.patch.bz2.md5sum
patch_fh size=76934
patch_type=8
verbosity level(1)
src_fh size=19164042
disabling bufferless, patch_count(1) == 1 || forced_reorder(1)
size1=176619520, size2=176885760
reconstruction return=0, commands=11369
result was 11369 commands
versions size is 176885760
applied 1 patches
reordering commands? 1
reconstructing target file based off of dcbuff commands...
collapsing
processing src 0: 7423 commands.
processing src 1: 3946 commands.
reconstruction completed successfully
re-compressing
verifying uncompressed md5
recompressing.
  portage-20050511.tar:  9.206:1,  0.869 bits/byte, 89.14% saved, 176885760 in, 
19214072 out.
verifying generated tarball 
compressed md5 differs, but uncompressed md5 says it right.  bzip2 version incompatability in other words
saving the md5
beginning update to the tree
Syncing local tree...
[ tree sync ]
cleansing
removing portage-20050510.tar.bz2
removing portage-20050509.tar.bz2
done.

Please remove ", which should lead to errors on subsequent runs" from Error 2 in  original bug report.

Proposal:

Store all generated data, i.e. generated tarballs as well, in /var/delta-webrsync/, and do not delete or create files in DISTDIR (exception: downloaded full snapshots and their md5sums).
Comment 2 Brian Harring (RETIRED) gentoo-dev 2005-05-14 18:13:40 UTC
Hmm.  yeah, shoving the md5 differing tarball elsewhere is possible I guess.
Regarding blocking removal of files, -k should do it for you also.
File creation *will* occur within $DISTDIR, although I can see the point of shifting the compresed md5 differing target elsewhere.
Comment 3 sf 2005-05-17 08:59:33 UTC
error 1:

I manually removed $DISTDIR/portage-200* and put in a backup of full snapshot
portage-20050509.tar.bz2 with its portage-20050509.tar.bz2.md5sum.

All snapshots and md5sums with the exception of portage-20050516.tar.bz2 and
portage-20050516.tar.bz2.md5sum were now removed, so error 1 of the original bug
report seems to be fixed.

error 2:

The reason why I do not like the generated snapshot file being stored in DISTDIR
is that DISTDIR should only contain files that were downloaded from the internet
(DISTDIR is a kind of cache), and the generated snapshot is not bytewise
identical to the downloadable file of the same name. You could call it a cache
corruption.

So, what is the reason you do not want to put generated snapshots in
/var/delta-webrsync/ ?
Comment 4 Brian Harring (RETIRED) gentoo-dev 2005-05-17 09:29:57 UTC
Reasoning is that the tarball *is* valid.  it's just issues with how we do
verification.  Shifting it into a seperate dir would work for the time being,
since the updated md5 isn't stored in $DISTDIR also...
Comment 5 Christian Schlotter 2005-05-19 00:26:21 UTC
Thanks for your great programs, Brian!

Here the old snapshots are not removed from DISTDIR, too.  This solves the problem:
--- emerge-delta-webrsync.~1~   2005-05-11 15:36:43.000000000 +0200
+++ emerge-delta-webrsync       2005-05-18 19:52:49.000000000 +0200
@@ -423,7 +423,7 @@
        echo "cleansing"
        for x in $potentials; do
                echo "removing ${x}"
-               rm "${x}" "${x}.md5sum" &> /dev/null
+               rm "${DISTDIR}/${x}" "${DISTDIR}/${x}.md5sum" &> /dev/null
                rm "${STATE_DIR}/${x}.md5sum" "${STATE_DIR}/${x}.umd5sum" &>
/dev/null
        done
 fi

It'd be nice if your script would detect the use of the dynamic deltup server
network (app-portage/getdelta, http://www.ddeltup.org/).  Right now the scripts
produce quite chaotic output when used in conjunction because the FETCHCOMMAND
tries to download deltas for the patch, md5sum, umd5sum and for the snapshot,
before it fetches them from your server (see the attached file sync.log).

I think that your script should detect FETCHCOMMAND="/usr/bin/getdelta.sh
\${URI}" and should use wget (or something similar) to download the files needed
by emerge-delta-webrsync.

Best regards
Christian
Comment 6 Christian Schlotter 2005-05-19 00:27:56 UTC
Created attachment 59257 [details]
sync.log

log of emerge-delta-webrsync working together with getdelta.sh as FETCHCOMMAND
Comment 7 Christian Schlotter 2005-05-19 02:16:11 UTC
Some other things that have come to my mind:
* The snapshot-200* files which are not needed anymore (e.g.
snapshot-20050516-20050517.patch.bz2,
snapshot-20050516-20050517.patch.bz2.md5sum) should be deleted from DISTDIR also.
* Maybe the emerge-delta-webrsync ebuild should take care that it cannot be
executed by someone who is not in the portage group.  Right now it can be
executed by everyone (although this does not harm anybody, I think):
-rwxr-xr-x  1 root root 11107 May 18 19:52 emerge-delta-webrsync
* I agree with sf that the snapshots should not be built inside DISTDIR.
Comment 8 Brian Harring (RETIRED) gentoo-dev 2005-05-19 20:36:26 UTC
Reconstruction will occur in /tmp or $PORTAGE_TMPDIR, re: the DISTDIR deletion,
yeah, aparently I suck (you're right) :)

getdelta... hmm.  I don't like special cases much.  Possible alternatives?
Keep in mind that this script has a *finite* lifespan to it, it's already been
removed from cvs head of portage; my intention is to have it handled by a
subclass of the sync.snapshot class if/when head hits.
Re: special casing, the issue I see is that if I special case it now, I'll have
to special case it in the class, which gets *really* hard since FETCHCOMMAND has
been abstracted away, into transports.fetchcommand .
I can see doing it as a temp hack, since like I said, emerge-delta-webrsync has
a limited shelf life till a proper integration occurs.  The special casing/hack
won't fly down the line though, since the users defined FETCHCOMMAND is just
that, what they told portage to use.
So... long term?

By the way, regarding doing a release with the fix, I'll -r1 it.  Any further
tweaks will have to wait a bit, hopefully within a week.  

A replacement for the untar/rsync portion is finally working, exempting
supporting --excludes... so finishing that sucker off, and intending on next
major bump using that.  Based off of a good chunk of diffballs code, so it's got
the usual support for working from compressed files, handling tarballs, yadda
yadda yadda.  Main thing is it removes the need to untar, thus is quite a bit
faster overhaul (bit faster then rsync anyways, for some screwed up reason).
Comment 9 Christian Schlotter 2005-05-19 23:35:05 UTC
(In reply to comment #8)
> getdelta... hmm.  I don't like special cases much.  Possible alternatives?
> Keep in mind that this script has a *finite* lifespan to it, it's already been
> removed from cvs head of portage; my intention is to have it handled by a
> subclass of the sync.snapshot class if/when head hits.
> Re: special casing, the issue I see is that if I special case it now, I'll have
> to special case it in the class, which gets *really* hard since FETCHCOMMAND has
> been abstracted away, into transports.fetchcommand .
> I can see doing it as a temp hack, since like I said, emerge-delta-webrsync has
> a limited shelf life till a proper integration occurs.  The special casing/hack
> won't fly down the line though, since the users defined FETCHCOMMAND is just
> that, what they told portage to use.
> So... long term?

Thanks for this insight in portage development.  So I guess I'll just have to
live with it.  It looks a little bit ugly, but it works great.  So this is no
problem.

> A replacement for the untar/rsync portion is finally working, exempting
> supporting --excludes... so finishing that sucker off, and intending on next
> major bump using that.  Based off of a good chunk of diffballs code, so it's got
> the usual support for working from compressed files, handling tarballs, yadda
> yadda yadda.  Main thing is it removes the need to untar, thus is quite a bit
> faster overhaul (bit faster then rsync anyways, for some screwed up reason).

That's great news!  I'm especially looking forward to the speed improvement
because here it's *way slower* than rsync (AMD Duron 1300 with 256 MB RAM).
Comment 10 sf 2005-05-20 05:02:05 UTC
Re Comment #3

I do not know where I looked last time but all snapshots are still there (see
comment #5).
Comment 11 Brian Harring (RETIRED) gentoo-dev 2005-05-24 05:48:11 UTC
InCVS
If you're game for testing out tarsync, emerge app-arch/tarsync.
emerge-delta-webrsync 3 will detect it, and use it (it'll also go parallel in
recompressing for you smp folk, since tarsync is io bound and bzip2 is a bit
more proc bound).

Enjoy.