Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 36655 - File verifications on uncompressed version of all files too
Summary: File verifications on uncompressed version of all files too
Status: RESOLVED WONTFIX
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Conceptual/Abstract Ideas (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: Portage team
URL: http://glep.gentoo.org/glep-0025.html
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-12-28 07:58 UTC by Brad Allen
Modified: 2005-02-27 23:33 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Allen 2003-12-28 07:58:32 UTC
File verifications should be generated and checkable for all
files in both their default compressed format as distributed,
and in their uncompressed format.

Then, whenever Portage finds a file that is either uncompressed
or compressed with bad verification, it must try to verify its
uncompressed version (by uncompressing it if it has to) to see
if it was just an alternate path for the same file.  This way,
lots of things can be done by administrators:

*  Use Debian "orig" tars which are almost always just gzipped
   identicals of the main program as distributed bzip2ed.

*  Use manually downloaded files

*  Store files in any format:  uncompressed, compressed quickly
   (gzip -1) or compressed well (bzip2), regardless of original
   version

Using bzip2 and gzip to get the same file as someone else did with
those same two programs is often extremely difficult if not impossible
without their copy of the compressed file.

Since the uncompressed file is always supposed to be identical, this is
possible.

This would save a lot of disk space, downloading time, etc.

For my previous submissions regarding SHA1, RMD160, and MD5, and
the other one which considers parallel public key cryptographic
signatures as well, all those versions should be created for the
uncompressed version of all files.  Cryptographic signatures,
because they are a pain to make:

 *  When done automatically by some program, should be done for all
    versions of file (two:  compressed and uncompressed);
 *  When done manually, should be done on the ultimate compressed
    version as distributed, since this is faster to verify, but *could*
    be done on the uncompressed version, since Portage would know to
    decompress the file for verification purposes as explained above;

    then, an automatic process would come through with purpose-made
    keys for each level:  developor, program distributor, program
    developor (in backwards order of time of possible creation),
    verifying such manually generated singings, and making an automated
    signature for any missing manual signings for other versions
    ONLY where the manual ones are missing, perhaps one automatic
    key for every manual key so verified; this key would ONLY be used
    for each manual signature when that manual signature is verified,
    and as I said, only when that manual signature is not on both
    compressed and uncompressed versions.  Gentoo distribution would
    do this.  This would be an indication to Gentoo management programs
    and users that Gentoo has verified those signings for all compression
    types managed (in this case, the one compression type used for that
    file in distfiles, and the one uncompressed equivilent).

I have not surveyed distfiles to find out if more than just a nominal
variety of compression types exist.  I can imagine the usual types are:
	bzip2
	gzip
with less usual:
	zip
and very unusual if even existant:
	freeze
	compress (gzip knows this)
	compact (gzip should know this)
and perhaps even:
	pgp
	gpg
(pgp & gpg can use compression when encrypting, and perhaps when
doing binary output for in-file signing, too.)

Magic files could be used to determine which uncompression program
to run for these verifications.

Actually, "zip" is difficult; you could put that on a back burner,
but eventually, you would want to have a signature for all of its
possible components; if zip contained four large files and six small
ones, you'd rather have 10 file verification sets than have to
download the entire zip again.


---------

If all my recommendations are implemented, which I think the first two
need to be (SHA1 & RMD160; public key signatures), there could be
literally dozens of verifications per file.  Verification management
would make sense:

*  Specify a simple single file format for these signings:

Two main ideas are good; first, my first idea:

*  Tar or zipfile containing many subfiles of the various verifications:
   + one subfile for every gpg/pgp detached signature;
   + one subfile for each type of checksum (one for rmd160, one for
     sha1, and one for md5 which can potentially used by md5sum -c)
     for each type of compression (default compressed (bzip2/gzip/etc.),
     and uncompressed)

Now, my other idea, which seems much more difficult to implement, less
flexible, harder to use, and just not a good idea; don't do it -- use
the tarfile idea explained above.

*  rmd160, sha1, and md5sum could all have four columns, comprised of
   their rmd160, sha1, md5sum, and filename.
   Specifying meanings of the columns could be done a number of ways:
   + a text identifier
   + a column position

*  GnuPG keys could also be ASCII-encoded if not already, and then
   put into this file, with proper delimeters.

The delimeters could possibly use the following existant tools,
as your coders find best:

+ MIME
+ XML

Of course, you could define a mechanism close to or equal to MIME
components for these items which is a subset of XML ...

or you can create your own.
Comment 1 SpanKY gentoo-dev 2003-12-28 10:32:36 UTC
define 'uncompressed version' ...

you mean you want to generate hashs on every single file inside of a tarball ?
Comment 2 Brian Harring gentoo-dev 2005-02-27 23:33:37 UTC
Closing (not going to implement it without a good reason).
Aside from that, the alt-digest db I mentioned in glep 25 would allow this for distfiles.