File verifications should be generated and checkable for all files in both their default compressed format as distributed, and in their uncompressed format. Then, whenever Portage finds a file that is either uncompressed or compressed with bad verification, it must try to verify its uncompressed version (by uncompressing it if it has to) to see if it was just an alternate path for the same file. This way, lots of things can be done by administrators: * Use Debian "orig" tars which are almost always just gzipped identicals of the main program as distributed bzip2ed. * Use manually downloaded files * Store files in any format: uncompressed, compressed quickly (gzip -1) or compressed well (bzip2), regardless of original version Using bzip2 and gzip to get the same file as someone else did with those same two programs is often extremely difficult if not impossible without their copy of the compressed file. Since the uncompressed file is always supposed to be identical, this is possible. This would save a lot of disk space, downloading time, etc. For my previous submissions regarding SHA1, RMD160, and MD5, and the other one which considers parallel public key cryptographic signatures as well, all those versions should be created for the uncompressed version of all files. Cryptographic signatures, because they are a pain to make: * When done automatically by some program, should be done for all versions of file (two: compressed and uncompressed); * When done manually, should be done on the ultimate compressed version as distributed, since this is faster to verify, but *could* be done on the uncompressed version, since Portage would know to decompress the file for verification purposes as explained above; then, an automatic process would come through with purpose-made keys for each level: developor, program distributor, program developor (in backwards order of time of possible creation), verifying such manually generated singings, and making an automated signature for any missing manual signings for other versions ONLY where the manual ones are missing, perhaps one automatic key for every manual key so verified; this key would ONLY be used for each manual signature when that manual signature is verified, and as I said, only when that manual signature is not on both compressed and uncompressed versions. Gentoo distribution would do this. This would be an indication to Gentoo management programs and users that Gentoo has verified those signings for all compression types managed (in this case, the one compression type used for that file in distfiles, and the one uncompressed equivilent). I have not surveyed distfiles to find out if more than just a nominal variety of compression types exist. I can imagine the usual types are: bzip2 gzip with less usual: zip and very unusual if even existant: freeze compress (gzip knows this) compact (gzip should know this) and perhaps even: pgp gpg (pgp & gpg can use compression when encrypting, and perhaps when doing binary output for in-file signing, too.) Magic files could be used to determine which uncompression program to run for these verifications. Actually, "zip" is difficult; you could put that on a back burner, but eventually, you would want to have a signature for all of its possible components; if zip contained four large files and six small ones, you'd rather have 10 file verification sets than have to download the entire zip again. --------- If all my recommendations are implemented, which I think the first two need to be (SHA1 & RMD160; public key signatures), there could be literally dozens of verifications per file. Verification management would make sense: * Specify a simple single file format for these signings: Two main ideas are good; first, my first idea: * Tar or zipfile containing many subfiles of the various verifications: + one subfile for every gpg/pgp detached signature; + one subfile for each type of checksum (one for rmd160, one for sha1, and one for md5 which can potentially used by md5sum -c) for each type of compression (default compressed (bzip2/gzip/etc.), and uncompressed) Now, my other idea, which seems much more difficult to implement, less flexible, harder to use, and just not a good idea; don't do it -- use the tarfile idea explained above. * rmd160, sha1, and md5sum could all have four columns, comprised of their rmd160, sha1, md5sum, and filename. Specifying meanings of the columns could be done a number of ways: + a text identifier + a column position * GnuPG keys could also be ASCII-encoded if not already, and then put into this file, with proper delimeters. The delimeters could possibly use the following existant tools, as your coders find best: + MIME + XML Of course, you could define a mechanism close to or equal to MIME components for these items which is a subset of XML ... or you can create your own.
define 'uncompressed version' ... you mean you want to generate hashs on every single file inside of a tarball ?
Closing (not going to implement it without a good reason). Aside from that, the alt-digest db I mentioned in glep 25 would allow this for distfiles.