Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 167668 - Update default rsync excludes to skip digests
Summary: Update default rsync excludes to skip digests
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Conceptual/Abstract Ideas (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 167107
  Show dependency tree
 
Reported: 2007-02-19 19:50 UTC by Chris Gianelloni (RETIRED)
Modified: 2007-05-08 08:14 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
add --exclude='**/files/digest-*' to PORTAGE_RSYNC_OPTS (rsync_excludes.patch,1.20 KB, patch)
2007-02-20 20:43 UTC, Zac Medico
Details | Diff
use an rsync "hide" filter rule so that stale digests are removed on the receiving side (rsync_excludes.patch,1.23 KB, patch)
2007-02-21 03:54 UTC, Zac Medico
Details | Diff
script that scans the whole portage tree missing distfiles hashes (check_dist_hashes.py,1.25 KB, text/plain)
2007-02-22 09:05 UTC, Zac Medico
Details
use an rsync "hide" filter rule so that stale digests are removed on the receiving side (rsync_excludes.patch,1.23 KB, patch)
2007-02-22 23:50 UTC, Zac Medico
Details | Diff
prune empty ${FILESDIR}s by adding --prune-empty-dirs to PORTAGE_RSYNC_OPTS (prune_empty_dirs.patch,1.46 KB, patch)
2007-02-26 03:43 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Gianelloni (RETIRED) gentoo-dev 2007-02-19 19:50:08 UTC
This one is another simple one.  If we take a certain version of portage, say the upcoming 2.1.2-r10 and add this support to it, then we can remove all of the digests from the snapshot.  This will decrease the snapshot size significantly.  Adding a default rsync excludes on the digests keeps users from re-downloading them again.

The same trigger from bug #167667 that disables digest generation could be used to enable these excludes, if such functionality was even deemed necessary.
Comment 1 Zac Medico gentoo-dev 2007-02-20 20:43:38 UTC
Created attachment 110770 [details, diff]
add --exclude='**/files/digest-*' to PORTAGE_RSYNC_OPTS

I've tested this patch and it seems to do the trick.  The only annoying thing that I can think of is that people who already have a tree with the **/files/digest-* files will not have them purged automatically.  This means that some stale **/files/digest-* may be left in the user's tree.  When packages are removed, there will be empty directories left containing only digest-* files.  However, portage will ignore these files anyway, so it's not a show stopper.
Comment 2 Chris Gianelloni (RETIRED) gentoo-dev 2007-02-20 22:51:06 UTC
I don't consider that a problem.  We can simply tell people via the GWN, elog/einfo, etc.
Comment 3 Zac Medico gentoo-dev 2007-02-21 03:54:02 UTC
Created attachment 110787 [details, diff]
use an rsync "hide" filter rule so that stale digests are removed on the receiving side

Given current code in portage, we actually need rsync clean up the stale digests so that portage doesn't tree to parse them and verify files against them (since they can become inaccurate once they're no longer synced).
Comment 4 Zac Medico gentoo-dev 2007-02-22 09:05:53 UTC
Created attachment 110967 [details]
script that scans the whole portage tree missing distfiles hashes

The default behavior of this script is to list packages with distfiles that have no Manifest2 hashes.  It can also take hash names as command line arguments if you want to require specific hashes.  Here are counts from a recent run on the gentoo-x86 cvs tree:

$ ./check_dist_hashes.py | wc -l
37
$ ./check_dist_hashes.py SHA1 | wc -l
54

Once all the packages have SHA1, we should be able to safely add the  **/files/digest-* exclusion to the default PORTAGE_RSYNC_OPTS.
Comment 5 Marius Mauch (RETIRED) gentoo-dev 2007-02-22 13:30:49 UTC
(In reply to comment #4)
> $ ./check_dist_hashes.py | wc -l
> 37
> $ ./check_dist_hashes.py SHA1 | wc -l
> 54

Just to make this clear: The second command is the important one.
Comment 6 Zac Medico gentoo-dev 2007-02-22 23:50:30 UTC
Created attachment 111010 [details, diff]
use an rsync "hide" filter rule so that stale digests are removed on the receiving side

This is in svn now.  The single quotes have been removed because emerge --sync doesn't use a shell to spawn rsync and rsync will not accept the quotes.
Comment 7 Zac Medico gentoo-dev 2007-02-24 00:09:33 UTC
There are still 24 remaining packages that are unfetchable and have missing SHA1 digests.  I think we can neglect those though, since they will simply result in a "missing digest" error and users encountering that can file a bug for the maintainer to update the Manifest:

app-emulation/cedega
app-emulation/crossover-office-bin
app-emulation/vm-arm
app-emulation/vmware-gsx-console
dev-db/oracle-instantclient-basic
dev-db/oracle-instantclient-jdbc
dev-db/oracle-instantclient-sqlplus
dev-java/jdbc2-oracle
dev-java/jdbc3-oracle
dev-java/jess-bin
dev-lang/icc
dev-lang/ifc
dev-lang/palmos-sdk
dev-libs/matrixssl
dev-util/elfsh
games-fps/unreal-tournament-infiltration
net-im/wildfire
net-misc/cisco-aironet-client-utils
net-misc/freenet6
net-misc/nxserver-business
net-misc/nxserver-enterprise
net-misc/nxserver-personal
sci-electronics/modelsim
sci-electronics/systemc
Comment 8 Zac Medico gentoo-dev 2007-02-24 04:14:47 UTC
This has been released in 2.1.2-r11.
Comment 9 Chris Gianelloni (RETIRED) gentoo-dev 2007-02-25 10:33:34 UTC
I'll fix quite a few of these packages since I have distfiles for quite a few of them.
Comment 10 Ulrich Müller gentoo-dev 2007-02-25 19:00:21 UTC
Is there a possibility to automatically skip/remove the resulting empty "files" directories, too:

$ find /usr/portage -type d -name files -empty | wc -l
6755

For an ext2/ext3 filesystem, this would save another 27 MB of space.
Comment 11 Zac Medico gentoo-dev 2007-02-25 21:51:05 UTC
(In reply to comment #10)
> Is there a possibility to automatically skip/remove the resulting empty "files"
> directories, too:

The -m or --prune-empty-dirs rsync option should do that.  You can add that to PORTAGE_RSYNC_EXTRA_OPTS for now and I don't see a reason why we can't also add that to the default PORTAGE_RSYNC_OPTS.
Comment 12 Zac Medico gentoo-dev 2007-02-26 03:43:14 UTC
Created attachment 111253 [details, diff]
prune empty ${FILESDIR}s by adding --prune-empty-dirs to PORTAGE_RSYNC_OPTS

I'm not sure if this patch is worthy of a 2.1.2-r12 revbump, but it is ready and waiting in svn r6071 of the 2.1.2 branch.
Comment 13 Zac Medico gentoo-dev 2007-02-27 07:40:59 UTC
(In reply to comment #12)
> Created an attachment (id=111253) [edit]
> prune empty ${FILESDIR}s by adding --prune-empty-dirs to PORTAGE_RSYNC_OPTS

This has been released in 2.1.2-r12.
Comment 14 Zac Medico gentoo-dev 2007-03-02 08:31:31 UTC
(In reply to comment #12)
> Created an attachment (id=111253) [edit]
> prune empty ${FILESDIR}s by adding --prune-empty-dirs to PORTAGE_RSYNC_OPTS

This has been reverted in portage-2.1.2-r13 due to bug #168646. Users can add it to PORTAGE_RSYNC_EXTRA_OPTS if they want.
Comment 15 Zac Medico gentoo-dev 2007-05-08 08:14:33 UTC
*** Bug 177591 has been marked as a duplicate of this bug. ***