Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 78407 - p2p mirroring of distfiles using bittorrent
Summary: p2p mirroring of distfiles using bittorrent
Status: RESOLVED LATER
Alias: None
Product: Mirrors
Classification: Unclassified
Component: Feature Request (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Mirror Admins
URL:
Whiteboard:
Keywords:
: 221251 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-01-17 12:51 UTC by Loki
Modified: 2008-05-10 18:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Loki 2005-01-17 12:51:10 UTC
i would like it if it would be possible to download not only the livecd's over bittorrent but  distfiles as well.

not too difficult to implement well at least theoretically.

1. portage download the .torrent, or get it when syncing and then using the users favorite bittorrent app to download it, having some config options in make.conf perhaps setting that and maximum upload. have options for how quickly it should close after downloading as well, default as right after.

pros: 
* for mirrors to have less load
* for dedicated users to be able to help share distfiles
* more or less garranteed ability to download all portage files
* no scrolling through unavailable/dead/slow mirrors

cons: 
* possibly a longer time users would have to wait before connecting to mirrors
Comment 1 Nicholas Jones (RETIRED) gentoo-dev 2005-01-17 19:22:31 UTC
Ah! You cite theory! That's a key ingredient.
I'll cite the majority of the opposition to moving forward with this.
If someone could find fixes for all of these issues, it'd help, but the
potential for this to be reality any time soon is very very low.

The client would need to be in C, I imagine... And would need very
succinct and finite control over the system.

> pros: 
> * for mirrors to have less load

Some numbers to start with... Right now we have over 23,000 distfiles.
We have a large of donated servers/bandwidth that we have no control over.

So, to manage bittorrent for distfiles, we have to run 23,000 active
instances of a bittorrent "file seeder". I'm not aware if you've tried
to do bittorrent with substantial numbers of files, like 150, but it
tends to hurt systems. The wildly spread accesses to file chucks results
in incredible seeks and extreme degredation. I wouldn't say it was
exponential, but it's definately beyond linear. Startup is also painful.
The initial checks to ensure pieces are valid takes a complete linear
read, so every time the server daemon goes down it has to read the entire
distfiles repository from end to end AND perform SHA-1 on each block.

Given an AMAZING server tool, I would say it's reasonably sane to put
down 400 commonly-used files per server. So we'd need around 30-50 servers
to handle the server daemons and all 23,000 files.

> * for dedicated users to be able to help share distfiles

The strain this would apply to users in well beyond feasable. Granted
they would only be sharing files that they use, so we could gain the
benefit of fast, common files, we would also need a system by which a
user can control the daemons and their effects on system performance.
Even sharing 20 files is quite noticible on a desktop box with 512M ram.

> * more or less garranteed ability to download all portage files

Well, this would hinge on the ability to get the files posted and available.
We'd also need infrastructure to aid verification of torrents. I, and others,
have already created system which can manage this reasonably well, but the
support for verification is not there. Adding torrents to the tree would
result in a massive inflation of the tree, and batch torrents would provide
no benefits as we cannot use them for partial lists based on use flags.

> * no scrolling through unavailable/dead/slow mirrors

Ever tried a torrent were users stopped seeding quickly after retrieving the
full file? It can take a very long time to acquire proper peers that allow
you that same potential to leech. It's quite possible with a list of several
hundred to several thousand peers, to _never_ connect to a peer.

Related to the 30-50 server issue above, if we only have scattered master
seeds and no useful or reachable peers, you may never get the file as the
master is more than likely saturated. I am inflating the saturation a little
but it is not that unlikely.

> cons: 
> * possibly a longer time users would have to wait before connecting to mirrors

Yes, related to the above.
Comment 2 Loki 2005-01-18 22:51:17 UTC
well there is a c based torrent client, it's even in portage, named ctorrent. it has very low memory  and cpu usage. 

i am somewhat confused as to what you meant by the problem of too many SEEK's as the mirror's are being downloaded from at different points all the time, i don't see how it would be any different, especially if you take into account the fact that there would be much LESS downloading going on, due to bittorrent.

i must admit at current bittorrent may not be the best for making all distfiles available for p2p download, however i'm certain that we could do this for larger downloads such as those for games and xfree, stuff that is 50 mb+.  

oh and would also like to note that the very popular show Naruto, which have tens of thousands of people downloading it as soon as it comes out. in these cases i never seem to have a problem getting the file. even though i'm certain that most people stop the bt client as soon as they notice that download has been completed.

we don't have to do an all or nothing scenario. we could start with:

* adding the ability for ebuilds to have an optional TORRENT variable with options with a simple numberic arguements, 1=off (default) 2=wish to make as single torrent of all files, 3= wish to make torrents of files according to some groups they have created in accordance to USE variables, 4= wish to make torrents of files according to the groups they have created of distfiles in groups that are larger than 20mb. this is also good in terms that we wont need "VERIFICATION", as we will create the torrents ourselves (as i believe that is what you were refering to), we just make a script that will create torrents based on the options.  and makes them available on the mirrors, as well as adding them to the tracker. which could be running on one of gentoo's "controlled" servers.

* adding a FEATURES variable in make.conf say calling it bittorrent, to indicate weather people wish to use bittorrent to download the ebuilds that have in enabled. 

* creating a small torrent client control script to check every 15 seconds or so if the required md5sum has been reached of the indicated file (as size is not always a good indicator). once it has then it will tell portage to go on with installation. while it sits and will kill bittorrent after a set amount of seconds/cycles after the discovery that the file has completed after which the script itself should die.

* creating some tracker mirrors &/or getting the mirrors to have the .torrent files on them for download. as they are small they shouldn't take up too much space or bandwidth. 

from my understanding the above things should be more or less feasable, and i believe i have addressed all concerns so far. as the number of torrents has been drastically decreased to only those that would actually be useful
Comment 3 Nicholas Jones (RETIRED) gentoo-dev 2005-01-19 08:27:53 UTC
Regarding seeking, bittorent shuffles 128k/256k chunks around. Meaning
that if you have 100 users retrieving a single file, you have the
potential for 100 non-continuous segments of that file to be read into
memory. Which is the seeking. Now, take that example an expand it to
100 files with X users per file. You get completely random seeking
if the environment is saturated, like your Narturo episodes.

ctorrent would be a decent start, but it provides no where near the
monitoring and load-sharing capabilities required to manage something
of this scale. I've used it. It's extremely minimalistic.

> * creating a small torrent client control script to check every 15
> seconds [...] kill bittorrent after [...] the file has completed. 

This scenario almost eliminates the usefulness of bittorrent. Only in
an update storm beyond the mirror-system's bandwidth capability would
this be of any benefit.

Comment 4 Loki 2005-01-19 14:34:11 UTC
as for seeking i still  maintain that there will be fewer clients downloading at a time per server, or at least not nearly as large amounts, and there are no disadvantages in the way torrent distributes files as in normal instances more than one person is downlading the file at the same time, and they are almost always if not always downloading from different points of the file. besides that if the server is using lvm then the file is broken up into 32mb sections all over the partition, and if you are using any filesytem then you will also have to take note of the fact that the file itself is broken up into a bunch of chunks and only apears to be one file instead of many lil chunks, and those chunks themselves may not even be close together, they could be on oposite ends of the partiton.

ctorrent is minimalistic, but we do not need all that many features for the servers/mirrors as all they have to do is seed. it's up to the users which clients they wish to use anyways. 

as for the point where bittorrent will be shut down after completion of download, you must note that the file is being shared While it is being Downloaded, not only after download, that is the beauty of bittorrent. as we are going to be using large files downloading will take a while. besides all that we could set the default number of cycles for the torrent to exit after noticing completion much higher, or evne increase the intervals at which the file is checked to once every 30 seconds. 

also note that many mirrors are experiencing high bandwidth loads and are therby rather slow, otherwise i wouldn't be bothering to submit this bug in the first place.

oh and btw, i am not sure if you understood me but i was saying how the naruto episodes download very well... i do not experience any kind of environment saturation as you call it. and as i would be sharing it as much as anyone else on the network i'm sure no one else would be experiencing any kind of saturation.

Comment 5 Nicholas Jones (RETIRED) gentoo-dev 2005-01-20 07:46:34 UTC
Try sharing all 130+ episodes of Naruto at once. Even just one leech on
each file is difficult for most computers with reasonable amounts of ram.

I have 4 listed mirrors giving me over 2M/s. They hardly seem inundated
to me. As far as the infrastructure is concerned, the amount of effort
and resources to support bittorrent at this point far outweighs the
benefits. The mirrors are no where near "over subscribed".

Those that cannot figure out how to find a reasonably fast mirror
probably will be unlikely to set up and understand a reasonably fast
bittorrent download and all the quirks it might inject in their
asynchronis internet connections.

ctorrent is fine for users, it's not fine for infrastructure.
Comment 6 Loki 2005-01-20 22:00:41 UTC
>Try sharing all 130+ episodes of Naruto at once.
we are not asking people to keep their torrent downloads for long

>the amount of effort and resources to support bittorrent at this point far outweighs the
>benefits. The mirrors are no where near "over subscribed".
i wish to disagree as the resources required will diminish significantly and the effort involved  will not be any greater than it would be with the addition of any new portage feature. 
 
you note that the mirrors are not over subscribed.  and then you go on to say:

>Those that cannot figure out how to find a reasonably fast mirror
>probably will be unlikely to set up and understand a reasonably fast
>bittorrent download and all the quirks it might inject in their
>asynchronis internet connections.

so as you must admit there are servers experiencing high load. and that it takes some fine tuning in order to achieve even reasonable download speeds.

i however disagree with your second statement as the are only two scenarios i can think of (if you can think of others feel free to state them) in which the users will have to pay any attention At All to their bittorrent client is to either decrease their upload speed which isn't going to raise their download speed any, and in any case would be easily configured. And of course, in the following scenario: if the have outgoing ports 6881-6999 closed. this is only a problem if your running windows xp sp2, i somehow doubt that our users will be running they would be running windows xp sp2. however i must admit there is the potential that they may also have a problem with the administrator of their connection disabling those ports. in that scenario, the user may simply change the ports used to port 80 or 21 or any of the multiple ports the DO have open.

and as i have mentioned in previous posts, if they Can't figure it out, they can always take it out of their FEATURES options in /etc/make.conf

>ctorrent is fine for users, it's not fine for infrastructure.
this i find very confusing. as what would the servers be acting as? to be honest i've never herd of an option in bittorrent that would increase the load on any particular computer cause it was labeled server, or infrastructure, or anything of the like. to my understanding bittorrent treats all users equally. it's one of them non racist things lol
 
Comment 7 Jeffrey Forman (RETIRED) gentoo-dev 2005-01-31 06:39:51 UTC
We'll just tuck this under our pillow and save it for later.

-Jeffrey
Comment 8 Andrew Gaffney (RETIRED) gentoo-dev 2008-05-10 18:05:44 UTC
*** Bug 221251 has been marked as a duplicate of this bug. ***