File verifications should be generated and checkable for all
files in both their default compressed format as distributed,
and in their uncompressed format.
Then, whenever Portage finds a file that is either uncompressed
or compressed with bad verification, it must try to verify its
uncompressed version (by uncompressing it if it has to) to see
if it was just an alternate path for the same file. This way,
lots of things can be done by administrators:
* Use Debian "orig" tars which are almost always just gzipped
identicals of the main program as distributed bzip2ed.
* Use manually downloaded files
* Store files in any format: uncompressed, compressed quickly
(gzip -1) or compressed well (bzip2), regardless of original
Using bzip2 and gzip to get the same file as someone else did with
those same two programs is often extremely difficult if not impossible
without their copy of the compressed file.
Since the uncompressed file is always supposed to be identical, this is
This would save a lot of disk space, downloading time, etc.
For my previous submissions regarding SHA1, RMD160, and MD5, and
the other one which considers parallel public key cryptographic
signatures as well, all those versions should be created for the
uncompressed version of all files. Cryptographic signatures,
because they are a pain to make:
* When done automatically by some program, should be done for all
versions of file (two: compressed and uncompressed);
* When done manually, should be done on the ultimate compressed
version as distributed, since this is faster to verify, but *could*
be done on the uncompressed version, since Portage would know to
decompress the file for verification purposes as explained above;
then, an automatic process would come through with purpose-made
keys for each level: developor, program distributor, program
developor (in backwards order of time of possible creation),
verifying such manually generated singings, and making an automated
signature for any missing manual signings for other versions
ONLY where the manual ones are missing, perhaps one automatic
key for every manual key so verified; this key would ONLY be used
for each manual signature when that manual signature is verified,
and as I said, only when that manual signature is not on both
compressed and uncompressed versions. Gentoo distribution would
do this. This would be an indication to Gentoo management programs
and users that Gentoo has verified those signings for all compression
types managed (in this case, the one compression type used for that
file in distfiles, and the one uncompressed equivilent).
I have not surveyed distfiles to find out if more than just a nominal
variety of compression types exist. I can imagine the usual types are:
with less usual:
and very unusual if even existant:
compress (gzip knows this)
compact (gzip should know this)
and perhaps even:
(pgp & gpg can use compression when encrypting, and perhaps when
doing binary output for in-file signing, too.)
Magic files could be used to determine which uncompression program
to run for these verifications.
Actually, "zip" is difficult; you could put that on a back burner,
but eventually, you would want to have a signature for all of its
possible components; if zip contained four large files and six small
ones, you'd rather have 10 file verification sets than have to
download the entire zip again.
If all my recommendations are implemented, which I think the first two
need to be (SHA1 & RMD160; public key signatures), there could be
literally dozens of verifications per file. Verification management
would make sense:
* Specify a simple single file format for these signings:
Two main ideas are good; first, my first idea:
* Tar or zipfile containing many subfiles of the various verifications:
+ one subfile for every gpg/pgp detached signature;
+ one subfile for each type of checksum (one for rmd160, one for
sha1, and one for md5 which can potentially used by md5sum -c)
for each type of compression (default compressed (bzip2/gzip/etc.),
Now, my other idea, which seems much more difficult to implement, less
flexible, harder to use, and just not a good idea; don't do it -- use
the tarfile idea explained above.
* rmd160, sha1, and md5sum could all have four columns, comprised of
their rmd160, sha1, md5sum, and filename.
Specifying meanings of the columns could be done a number of ways:
+ a text identifier
+ a column position
* GnuPG keys could also be ASCII-encoded if not already, and then
put into this file, with proper delimeters.
The delimeters could possibly use the following existant tools,
as your coders find best:
Of course, you could define a mechanism close to or equal to MIME
components for these items which is a subset of XML ...
or you can create your own.
define 'uncompressed version' ...
you mean you want to generate hashs on every single file inside of a tarball ?
Closing (not going to implement it without a good reason).
Aside from that, the alt-digest db I mentioned in glep 25 would allow this for distfiles.