Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 36657 - patch and binary diffs for distfiles
Summary: patch and binary diffs for distfiles
Status: RESOLVED DUPLICATE of bug 24433
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Conceptual/Abstract Ideas (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-12-28 08:22 UTC by Brad Allen
Modified: 2005-07-17 13:06 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Allen 2003-12-28 08:22:25 UTC
Compressed files are hard to diff.  However, if the previous
suggestions are all followed (SHA1, RMD160, GnuPG detached signatures,
all done on both of distributed file and uncompressed of same file,
with some reasonable management of all those verifications like a
compressed tar file full of them in native file formats for each
program (e.g., .gpg/.sign/.sig/.asc files for gpg detached signatures)), 
then checking the uncompressed contents of a file should also be
possible.  Since it is hard to duplicate archive creation (tar may
not always produce the sme result with all the same files),
distributing a set of verifications for all subfiles (done on
their contents and all included metadata) would enable the following:

*  patch diffs on ASCII files and binary diffs for binary files

Emerge's downloading subcomponent would be able to verify a healthy
older version of a file in /usr/portage/distfiles on the local system,
then check to see if there is a patch file set that would get it from
/usr/portage/distfiles to the new version.  In some cases, this could
be downloaded from a third party, e.g.,
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch-2.6.1.bz2 (and its
verification files patch-2.6.1.sign, patch-2.6.1.bz2.sign, and
patch-2.6.1.gz.sign).  In almost all cases, a Gentoo automatically
created version should be available.  Portage emerge should be able
to know this, get it, and process it.  It would apply the diffs
to the old downloaded version, and then verify the result with
verifications.

Before downloading all the many patch sets and final file verification
sets it would need to to verify such an incremental update, emerge
should do a simple total bytes comparison of the amount needed to be
downloaded in each case, and it should simply download the new
version when it is less than the lessor of 105% of the size of the
diff, or (only used for comparison when available via caching)
five seconds more of download time than diff and any additionally
necessary verification sets (only needed to verify such
patching).  This way, extra patch downloading would not be performed
when it does not make sense.

Patching would be done for the various types you deem to support:

	tar
	zip
	cpio
	afio
	ar

Any subarchives those contain would have to be considered too
(recursive fixing), depending on the package and its ability to
withstand that (theoretically, embedded checksums might work
since they're updated too, or might fail due to variations in
internal archiving, depending on their implementation; where
they fail, patch sets would not be used or would be hand-crafted
to work ONLY if appropriate (which in Gentoo it doesn't seem to be;
it uses pure distfiles)).  E.g., a tarball containing a gzipped
tar file (I've seen a lot of those), or a a zip file containing a
bzip2ed tar file (silly example I've never witnessed), or a tar
file containing zip files (very common) would have to be conisdered
for nested diffing and patching.

In addition, local resources should be considered by emerge:
will it take more space to patch the file than to just get a new
one, and if so, do I have enough space for one approach but not
the other, and if so, do that approach instead.  (I suppose the
situation where it would take more space to download a new file
than patch an old one *could* be considered, in order to force
a patch download, but how often is this the case?!  Probably
never ...)


** PURPOSE **

The purpose of the above recommendation is to save network bandwidth.

Here where anything above POTS modem is either expensive slowish
ISDN or $300/month or more (with heavy implementation costs) local
connection, this would save end users a lot.  More important,
however, is making Gentoo upgrading a better network citizen,
by not using backbones so much, Gentoo network distributors, and 
all other components.
Comment 1 SpanKY gentoo-dev 2003-12-28 10:41:42 UTC

*** This bug has been marked as a duplicate of 24433 ***