I run the source mirror mirror.mdfnet.se. This server also has a public rsync mirror for the portage tree. After portage started to verify the tree after each sync (an awesome and much needed feature btw) my daily syncs against the mirror started failing a little now and then with messages like:
Manifest mismatch for metadata/Manifest.gz
__size__: expected: 2148, have: 2147
I figured it probably had to do with bad verification of the tree on my mirror and/or that the mirror was halfway through syncing itself.
I checked the documentation at https://wiki.gentoo.org/wiki/Project:Infrastructure/Mirrors/Rsync it looks like no tree verification is done even on the official rsync community mirrors. In my opinion a mirror should only serve data if it has verified that the entire tree is intact. A sync against a half-synced, or broken, mirror will just waste time and resources as the user will have to re-sync the tree a second time.
I created a heavy modified version of the sync-script which:
* uses eix-sync to sync and verify against upstream (retries once if first sync fails)
* does a local rsync to a non-active portage tree
* updates a symlink to point to the newly synced portage tree
* at next re-sync it syncs to the previous active tree, so it basically always has two ports trees: one active, and one inactive
I'm not proposing that this should be used as-is as it is just an ugly hack that can only run on gentoo machines (as it uses eix-sync) and is extremely untested, but I do this we should do *something* to improve the quality of the mirrors.
Steps to Reproduce:
1. sync against a community mirror
verification fails more often than reasonable
verification should pass
Created attachment 558234 [details]
example script that verifies tree before serving it