I just had an idea to allow users to serve as mirrors for one another, bittorrent style. On a strictly opt-in basis, allow users to volunteer their machines as mirror sources for other users, preferably in their own geographical area. It seems like a useful idea, and may take some load off the mirrors.
Good idea, what is your vision of bringing this to reality?
# AHM I call the scheme "Ad Hoc Mirror", or AHM. It comes in three parts. The coordinator, the ahm itself, and any clients wishing to use the service of an AHM. ## Coordinator A gentoo hosted coordinator that can keep rough track of anonymized identities of willing volunteers who wish to opt in as Ad Hoc Mirrors, or AHMs for short On a new install, or once implemented, gentoo systems, by using a client side application or script of some sort, could register with the coordinator as an AHM and get a unique token that they can then use to "log in" and make themselves available. The coordinator in turn would keep a live database of some sort that was geographically aware, and can assign incoming syncers to nearby AHMs The Coordinator's job is to maintain a database of AHMs, and provide a means for them to "log in". Presumably it's a bad idea to allow one AHM to hijack another's session. it is analogous to a tracker in a bittorrent swarm ## AHM The Ad Hoc Mirror in turn would run a lightweight daemon, possibly startable as an init.d service, that would log the system in to the coordinator as an AHM, providing an inbound IP that could be used by clients to request downloads. When an AHM onlines itself, it connects to the coordinator and reports three things: * IP address clients can use to connect * Version information for its data (example: rsync timestamp or git commit) * Any desired load limits, which the coordinator will obey when it provides AHMs to clients It's expected that an AHM would only volunteer for syncing the ebuild tree or other lightweight operations. Mirroring actual package payloads is probably a heavyweight operation best left to dedicated mirrors. Of note, it is likely possible to have AHMs serve the ebuild tree both by git as well as by rsync. it is analogous to a seed in a bittorrent swarm ## Client A Client is any gentoo system that wants to sync itself. It would ask the Coordinator for any available AHMs, and then using the returned IP to initiate a direct connection with the AHM and sync itself from it. The Coordinator will find an AHM that meets the following criteria: * Has newer information than the client (computable from comparing the version information) * Is not overloaded or unresponsive * Is near enough to the client geographically that it's more optimal to download from them than from an "official" mirror. Also, it's probably a good idea for any client that is also serving as an AHM to update its version information to the Coordinator once it completes a sync. it is analogous to a leacher in a bittorrent swarm # Details Presumably the Coordinator could use some sort of load balancing mechanism to distribute clients evenly to AHMs to prevent any particular AHM from being overloaded. It goes without saying that a system wishing to volunteer as an AHM is doing so on an opt-in basis, and can drop from service any time it sees fit. To encourage adoption it could default to opt-in, possibly through a use flag in portage that causes a conditional RDEPEND on the AHM mechanism. It's also advisable for the Coordinator to send periodic pings to online AHMs to keep its database clean. Any AHM that fails to respond to pings should presumably be dropped from the online database. Privacy is a concern, so whatever method an AHM uses to identify itself to the Coordinator should: * Be creatable ad-hoc by the Coordinator * Possibly be encrypted to prevent MITM or anything else that would cause corruption * Require a minimum level of refreshing, i.e., have it given an identity that will expire after 30 days * Be anonymous beyond identifying a particular AHM The AHM should also be free at any time to purge or "forget" its api key for the coordinator, so to speak, and request a new one at any time. If the Coordinator is doing proper housekeeping of its AHM account database, dormant keys should be flushed out eventually. Possibly the Coordinator could index the issued AHM api-keys in an LRU fashion and periodically revoke dormant keys. It's probably important for the AHM-side package to check its key's timestamp and drop it if it's too old. It's also probably important for the coordinator to blacklist a key for awhile once it expires before recycling it to prevent key collisions. Since new keys can be generated at any time and an AHM needs to respond to periodic pings to remain online, it's probably best to err on the side of caution for keys. AHMs encountering any potential corruption should be encouraged to discard their current key and register a new one with the Coordinator, and Coordinators encountering any "funny business" involving a key should blacklist the key for a few months until it expires and can be safely recycled. ## N.B. "Coordinator" is just my SRE inspired name for whatever runs on gentoo's infra to keep track of the AHMs.
Implementing this, IMHO, should be done in three separate packages, one each for the client, coordinator, and ahm, and itself separate from portage. Maybe the client portion could be an esync module to keep it pluggable with portage.
You are going to need to actually build something if you want us to use it. In general I don't think we need more mirroring capacity (we have plenty of rsync mirrors.) One advantage would be to build a privacy centric mirroring system, but I'm not sure your draft proposal meets those requirements. -A