Portage Package List

In this document, I describe a package list type scheme to replace the rsync
of ebuilds.  This is inspired by Debian GNU/Linux, and is intended to provide
several advantages over the current method.

The first advantage would be that Portage would no longer require a
long-winded `emerge sync` of all ebuilds and meta-data.  Instead, an `emerge
sync` would retrieve the package lists and other important data from the
mirror.  This would reduce both filesystem usage and rsync stress.

The second advantage is that, with proper tweaking, this scheme could be used
to more easily implement binary package acquisition with Portage.  This would
require more strict dependency control as well; but if done properly, the
system could react automatically to USE flag changes.

== Structure ==
The current portage tree structure is represented qualitatively below.  This
is not an exact representation of structure; just a general idea.

/usr/portage/
/usr/portage/cat-egory/
/usr/portage/cat-egory/package/
/usr/portage/cat-egory/package/ebuilds.ebuild
/usr/portage/cat-egory/package/Manifest
/usr/portage/cat-egory/package/ChangeLog
/usr/portage/cat-egory/package/files/
/usr/portage/cat-egory/package/files/digest
/usr/portage/cat-egory/package/files/patches.diff
/usr/portage/eclass/
/usr/portage/eclass/some.eclass
/usr/portage/profiles/
/usr/portage/profiles/*

The new structure would be as below.

/usr/portage/
/usr/portage/categories/
/usr/portage/categories/cat-egory.list.gz
/usr/portage/eclass/
/usr/portage/eclass/some.eclass
/usr/portage/profiles/
/usr/portage/profiles/*

cat-egory.list.gz (or .bz2, or just plaintext) would contain a list of all
meta-data about an ebuild relavent to the dependency calculation process.
This data would be used to determine what to download, and when to download
it, as well as what packages are needed.

The ebuilds would no longer be in the portage tree.  Upon merge, portage would
download the ebuild, manifest, digest, and appropriate patches to /tmp; and
then execute the ebuild.  In this manner, the (now almost 100000) list of
files to hash against and transfer becomes smaller; although the files to do
the calculations against and transfer of become themselves larger.

Compression of the cat-egory.list will reduce both the network overhead of
transfer and the CPU overhead of hashing by reducing the size of the data to
be checked.  This will incur a one-time overhead of compression.

The use of a cat-egory.list file will incur overhead from indexing the CVS
tree before it is pushed to the rsync mirrors.  I suggest all rsync mirrors be
two steps behind:  as the rsync mirror is grabbing the data for the tree as of
(-1), the tree as current (0) will be being scanned and indexed.  This
requires that a snapshot can be taken of the CVS tree for indexing, else there
will be loss of service (CVS commits should not occur during the indexing) or
lack of consistency in the cat-egory.list.

Users may benefit from holding full descriptions in cat-egory.list, as Debian
does in packages.gz.  This will, of course, increase file size.