Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 686106 - (portage?) suggestion: allow users to volunteer as ad-hoc mirrors
Summary: (portage?) suggestion: allow users to volunteer as ad-hoc mirrors
Status: RESOLVED NEEDINFO
Alias: None
Product: Mirrors
Classification: Unclassified
Component: Feature Request (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Mirror Admins
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-16 15:48 UTC by Raymond Jennings
Modified: 2020-03-11 18:00 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Raymond Jennings 2019-05-16 15:48:23 UTC
I just had an idea to allow users to serve as mirrors for one another, bittorrent style.

On a strictly opt-in basis, allow users to volunteer their machines as mirror sources for other users, preferably in their own geographical area.

It seems like a useful idea, and may take some load off the mirrors.
Comment 1 Tomáš Mózes 2019-05-16 19:03:34 UTC
Good idea, what is your vision of bringing this to reality?
Comment 2 Raymond Jennings 2019-05-16 21:07:56 UTC
# AHM

I call the scheme "Ad Hoc Mirror", or AHM.

It comes in three parts.  The coordinator, the ahm itself, and any clients wishing to use the service of an AHM.

## Coordinator

A gentoo hosted coordinator that can keep rough track of anonymized identities of willing volunteers who wish to opt in as Ad Hoc Mirrors, or AHMs for short

On a new install, or once implemented, gentoo systems, by using a client side application or script of some sort, could register with the coordinator as an AHM and get a unique token that they can then use to "log in" and make themselves available.

The coordinator in turn would keep a live database of some sort that was geographically aware, and can assign incoming syncers to nearby AHMs

The Coordinator's job is to maintain a database of AHMs, and provide a means for them to "log in".  Presumably it's a bad idea to allow one AHM to hijack another's session.

it is analogous to a tracker in a bittorrent swarm

## AHM

The Ad Hoc Mirror in turn would run a lightweight daemon, possibly startable as an init.d service, that would log the system in to the coordinator as an AHM, providing an inbound IP that could be used by clients to request downloads.

When an AHM onlines itself, it connects to the coordinator and reports three things:

* IP address clients can use to connect
* Version information for its data (example: rsync timestamp or git commit)
* Any desired load limits, which the coordinator will obey when it provides AHMs to clients

It's expected that an AHM would only volunteer for syncing the ebuild tree or other lightweight operations.  Mirroring actual package payloads is probably a heavyweight operation best left to dedicated mirrors.

Of note, it is likely possible to have AHMs serve the ebuild tree both by git as well as by rsync.

it is analogous to a seed in a bittorrent swarm

## Client

A Client is any gentoo system that wants to sync itself.

It would ask the Coordinator for any available AHMs, and then using the returned IP to initiate a direct connection with the AHM and sync itself from it.

The Coordinator will find an AHM that meets the following criteria:

* Has newer information than the client (computable from comparing the version information)
* Is not overloaded or unresponsive
* Is near enough to the client geographically that it's more optimal to download from them than from an "official" mirror.

Also, it's probably a good idea for any client that is also serving as an AHM to update its version information to the Coordinator once it completes a sync.

it is analogous to a leacher in a bittorrent swarm

# Details

Presumably the Coordinator could use some sort of load balancing mechanism to distribute clients evenly to AHMs to prevent any particular AHM from being overloaded.

It goes without saying that a system wishing to volunteer as an AHM is doing so on an opt-in basis, and can drop from service any time it sees fit.  To encourage adoption it could default to opt-in, possibly through a use flag in portage that causes a conditional RDEPEND on the AHM mechanism.

It's also advisable for the Coordinator to send periodic pings to online AHMs to keep its database clean.  Any AHM that fails to respond to pings should presumably be dropped from the online database.

Privacy is a concern, so whatever method an AHM uses to identify itself to the Coordinator should:

* Be creatable ad-hoc by the Coordinator
* Possibly be encrypted to prevent MITM or anything else that would cause corruption
* Require a minimum level of refreshing, i.e., have it given an identity that will expire after 30 days
* Be anonymous beyond identifying a particular AHM

The AHM should also be free at any time to purge or "forget" its api key for the coordinator, so to speak, and request a new one at any time.  If the Coordinator is doing proper housekeeping of its AHM account database, dormant keys should be flushed out eventually.  Possibly the Coordinator could index the issued AHM api-keys in an LRU fashion and periodically revoke dormant keys.  It's probably important for the AHM-side package to check its key's timestamp and drop it if it's too old.  It's also probably important for the coordinator to blacklist a key for awhile once it expires before recycling it to prevent key collisions.

Since new keys can be generated at any time and an AHM needs to respond to periodic pings to remain online, it's probably best to err on the side of caution for keys.  AHMs encountering any potential corruption should be encouraged to discard their current key and register a new one with the Coordinator, and Coordinators encountering any "funny business" involving a key should blacklist the key for a few months until it expires and can be safely recycled.

## N.B.

"Coordinator" is just my SRE inspired name for whatever runs on gentoo's infra to keep track of the AHMs.
Comment 3 Raymond Jennings 2019-05-16 21:20:10 UTC
Implementing this, IMHO, should be done in three separate packages, one each for the client, coordinator, and ahm, and itself separate from portage.

Maybe the client portion could be an esync module to keep it pluggable with portage.
Comment 4 Alec Warner (RETIRED) archtester gentoo-dev Security 2020-03-11 18:00:58 UTC
You are going to need to actually build something if you want us to use it.

In general I don't think we need more mirroring capacity (we have plenty of rsync mirrors.) One advantage would be to build a privacy centric mirroring system, but I'm not sure your draft proposal meets those requirements.

-A