Compressed files are hard to diff. However, if the previous suggestions are all followed (SHA1, RMD160, GnuPG detached signatures, all done on both of distributed file and uncompressed of same file, with some reasonable management of all those verifications like a compressed tar file full of them in native file formats for each program (e.g., .gpg/.sign/.sig/.asc files for gpg detached signatures)), then checking the uncompressed contents of a file should also be possible. Since it is hard to duplicate archive creation (tar may not always produce the sme result with all the same files), distributing a set of verifications for all subfiles (done on their contents and all included metadata) would enable the following: * patch diffs on ASCII files and binary diffs for binary files Emerge's downloading subcomponent would be able to verify a healthy older version of a file in /usr/portage/distfiles on the local system, then check to see if there is a patch file set that would get it from /usr/portage/distfiles to the new version. In some cases, this could be downloaded from a third party, e.g., ftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch-2.6.1.bz2 (and its verification files patch-2.6.1.sign, patch-2.6.1.bz2.sign, and patch-2.6.1.gz.sign). In almost all cases, a Gentoo automatically created version should be available. Portage emerge should be able to know this, get it, and process it. It would apply the diffs to the old downloaded version, and then verify the result with verifications. Before downloading all the many patch sets and final file verification sets it would need to to verify such an incremental update, emerge should do a simple total bytes comparison of the amount needed to be downloaded in each case, and it should simply download the new version when it is less than the lessor of 105% of the size of the diff, or (only used for comparison when available via caching) five seconds more of download time than diff and any additionally necessary verification sets (only needed to verify such patching). This way, extra patch downloading would not be performed when it does not make sense. Patching would be done for the various types you deem to support: tar zip cpio afio ar Any subarchives those contain would have to be considered too (recursive fixing), depending on the package and its ability to withstand that (theoretically, embedded checksums might work since they're updated too, or might fail due to variations in internal archiving, depending on their implementation; where they fail, patch sets would not be used or would be hand-crafted to work ONLY if appropriate (which in Gentoo it doesn't seem to be; it uses pure distfiles)). E.g., a tarball containing a gzipped tar file (I've seen a lot of those), or a a zip file containing a bzip2ed tar file (silly example I've never witnessed), or a tar file containing zip files (very common) would have to be conisdered for nested diffing and patching. In addition, local resources should be considered by emerge: will it take more space to patch the file than to just get a new one, and if so, do I have enough space for one approach but not the other, and if so, do that approach instead. (I suppose the situation where it would take more space to download a new file than patch an old one *could* be considered, in order to force a patch download, but how often is this the case?! Probably never ...) ** PURPOSE ** The purpose of the above recommendation is to save network bandwidth. Here where anything above POTS modem is either expensive slowish ISDN or $300/month or more (with heavy implementation costs) local connection, this would save end users a lot. More important, however, is making Gentoo upgrading a better network citizen, by not using backbones so much, Gentoo network distributors, and all other components.
*** This bug has been marked as a duplicate of 24433 ***