Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 85098 - Reducing syncing time for mirrored files
Summary: Reducing syncing time for mirrored files
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Dev box issues (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-13 08:46 UTC by Stefan Schweizer (RETIRED)
Modified: 2007-01-07 10:06 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Schweizer (RETIRED) gentoo-dev 2005-03-13 08:46:28 UTC
Hi,

can we please have a public Server where devs can upload there source tarballs so that they do not have to wait for /space/distfiles-local?
Of course that mirror would need to get mirrored in gentoo: also later on.
Comment 1 Lance Albertson (RETIRED) gentoo-dev 2005-03-13 08:53:26 UTC
I have been wanting to make something to where our dev's dont' have to host anything on dev.g.o anymore to stop things like this. The rough idea i had was using the servers we have in rsync.g.o in a round robin dns server said files. To make things easier on us, keeping it to an rsync module might work better. Reason being, don't have to worry about setting up apache and maintaining it. I'd also have to chat with our sponsors on whether they don't mind the extra bandwidth this may cause. Another stipulation would be that no files on this site with nomirror stated in the ebuild. This is only a system that will update say every 15min so that files can get quickly out instead of waiting on the mirror system. It *WILL NOT* be a replacement for the mirror system though. 

To make sure our devs put files on mirrors, it might be a good idea if we had cfengine delete files that are older than X days (say 30 days) so that they won't abuse this system.
Comment 2 Lance Albertson (RETIRED) gentoo-dev 2005-03-13 09:24:08 UTC
After some discussion on irc in #-dev, I think the best option would be to use a very simple http server. So far the one I'd use is thttpd [1]. Its very small, scalabe, easy to configure, and fast! I'll have to run this past klieber to make sure this won't be a huge issue to him, but I think this may be the best option to get our devs to quick hosting distfiles on dev.g.o. I can actually start enforcing it and saying we have something else in place :)

[1] http://www.acme.com/software/thttpd/
Comment 3 Lance Albertson (RETIRED) gentoo-dev 2005-03-13 13:49:41 UTC
For now, I'd like this bug to be focused on solving the long wait for files to hit mirrors. Later on we could work on a solution for files that are on dev.g.o for other reasons. Basically, this is just a temporary mirroring solution until the files hit the mirrors.
Comment 4 Stefan Schweizer (RETIRED) gentoo-dev 2005-08-11 02:47:00 UTC
well, some files also vanish from the mirrors automagically:
http://bugs.gentoo.org/show_bug.cgi?id=100260

Maybe it makes sense to just have a server that hosts all files that are
gentoo-only, where we can upload stuff directly.

Would solve a) the waiting problem, b) the file removed problem.

Your thttpd idea sounds suitable to solve this for me.
Comment 5 Kurt Lieber (RETIRED) gentoo-dev 2005-08-11 07:39:24 UTC
I don't really see the need for yet another mirroring system.  I'd rather look
at ways of reducing the time it takes files to get from distfiles-local to the
master mirror.  Even if we had a special thttpd server for this, it would still
be the same single point of failure that toucan is.  It wouldn't be anything
that devs could permanently set SRC_URI to and never worry about again -- it
would only be temporary.  
Comment 6 Brian Harring (RETIRED) gentoo-dev 2005-08-13 08:43:54 UTC
It's not exactly another mirroring system; you already have distfiles-local +
some quicky rsync'ing that doesn't play too well with mirror-dist.

What's proposed here is a helluva lot cleaner then the existing distfiles-local
+ rsync cronjobbing that's in use, something that got brought up and I wanted
when mirror-dist got added.

Short version of the problem is that mirror-dist cannot be authorative for
distfiles-local because it can't verify the files and add them into the master
distfiles dir; the files get autodumped in.

Makes for an interesting tug of war when chksum information disagrees, screwing
things up.  Some central fall back box that updates can be pushed to (and are
immediately live/accessible on), which mirror-dist can use for managing the
master distfiles makes things a helluva lot cleaner/better from where I'm sitting.

My 2 cents at least. :)
Comment 7 Kurt Lieber (RETIRED) gentoo-dev 2005-08-14 11:41:03 UTC
so if the problem is files getting autodumped onto osprey...why not have them
rsync'd to a temporary directory and then have mirror-dist handle the transfer
once it's comfortable/knowledgeable about the files?

Comment 8 Brian Harring (RETIRED) gentoo-dev 2005-08-15 15:22:08 UTC
figuring (worst case) 5 hours for a file to deploy, mainly.
Well aware the docs state that people should deploy some obscene # of hours
before pushing stuff into the tree, but people pretty much ride the line on
pushing stuff out; don't get me wrong, addressing the schizo interaction of
distfiles-local with mirror-dist would be wonderful, but a fallback
mirror/server that devs can dump to (patches mirrors as lance labeled it once
upon a time) still seems like the best approach; cuts down on the upload delays
among other things :)

as always, my ill informed 2 cents
Comment 9 Lance Albertson (RETIRED) gentoo-dev 2005-08-15 18:39:30 UTC
Well, there's almost two problems we could solve with this. 

a) Reducing the time it takes to get files/patches from our developers to the
mirrors by providing a temporary place that the ebuild could be coded to look for

b) Provide a more permanent place for gentoo-only hosted projects/patches/etc.
Right now there's a lot of stuff pointing to toucan for files when we could
provide something a little more robust (maybe just two servers for fallback) to
hold such files. 

I'm not sure on the semantics from the portage point of view of how to make the
first option work, but I can definately see the value of the latter problem that
this setup could do.

Of course we wouldn't want the second option to become a playground for folks to
get something up quick, it'd be just for a permanent URI for a patch or gentoo
hosted tarball instead of pointing it at toucan.

I kind of like your idea of changing how distfiles-local works on toucan to
possibly having it either put it in a location for mirror-dist to sort it, or we
could just put it in a temporary location along side the patches stuff until it
hits the mirrors. 

What do you think? (Hopefully that all made sense)
Comment 10 Stefan Schweizer (RETIRED) gentoo-dev 2005-08-15 21:53:59 UTC
(In reply to comment #9)
> provide something a little more robust (maybe just two servers for fallback) to
> hold such files. 

I don't think it is necessary to have more than one server for the first 4-5
hours especially as many people have not even synced in that time ..
And then the distfiles that are gentoo hosted are not many .. usually theare
hosted somewhere else.

> Of course we wouldn't want the second option to become a playground for folks to
> get something up quick, it'd be just for a permanent URI for a patch or gentoo
> hosted tarball instead of pointing it at toucan.

Why not? What we move to /space/distfiles-local is usually gentoo-mirrored stuff
that can't be fetched elsewhere.
So imo it makes sense to solve both problems with one server where we can point
our SRC_URIs to and where the files are fast uploaded and kept.
The problem here is that we just need one small infra server dedicated to be a
fallback and first 4-5 hours mirror where we have a nice SRC_URI containing
gentoo and fast upload access ..
Comment 11 Brian Harring (RETIRED) gentoo-dev 2005-09-15 04:50:55 UTC
One thing to note, although if rsync is smart enough this may not be an issue.
The current distfiles-local setup with rsync'ing of stuff off of toucan over to
osprey assumes toucan is sane; right now it's giving stats for files, but no
data for any files.
If rsync is stupid, that results in corruption of distfiles-local files on
osprey, resulting in those files eventually getting nuked from the main mirror.
Bad thing :)

Willing to bet rsync isn't that stupid, but is indicative of why the current
pull + push doesn't mesh well, why it should be pull only (at least for the
master distfiles dir updates).
Comment 12 Corey Shields 2005-09-30 22:23:00 UTC
Once we get a new dev.g.o, toucan could be repurposed as the 4-5 hour buffer 
server.  Not making any promises as the method of doing this is still up in the 
air, but just pointing out the opportunity. 
 
-C 
Comment 13 Brian Harring (RETIRED) gentoo-dev 2005-11-29 11:15:51 UTC
Corey, any eta on the new toucan btw?  Saw some postings, but nothing concrete,
just nudging so that it's jotted down here and current state of things are
viewable in a glance.  Thanks
Comment 14 Stefan Schweizer (RETIRED) gentoo-dev 2006-01-20 12:30:01 UTC
Please do not forget about this. Issues a) and b) in comment #9 still persist.

A consistent webspace would solve this, just take a spare server, give it a nice URL and us easy access w/o a 4 hour buffer.
Comment 15 Brian Harring (RETIRED) gentoo-dev 2006-02-09 02:59:13 UTC
If this is ever implemented, mirror-dist already supports the required functionality for it; in the update_distfiles.sh script on osprey, just change MIRROR_OVERIDES, whitespace delimited.

gentoo http://patches.gentoo.org/distfiles/

in the overrides file should do the trick when/if this patches.g.o (or whatever the mirror is named) is implemented.
Comment 16 Stefan Schweizer (RETIRED) gentoo-dev 2006-06-11 10:31:33 UTC
so, any update on this is appreciated. I have recently switched to using gentooexperimental.org for seed-hosting the tarballs. I would rather have an infra box to point SRC_URI to.
Download problems because a file is not yet available still happen from time to time.
This is a bug that is easy to solve, please do something about it. Cshields pointed out we can use the old toucan for this purpose, can this please be implemented?
Comment 17 Lance Albertson (RETIRED) gentoo-dev 2006-06-11 11:48:30 UTC
(In reply to comment #16)
> so, any update on this is appreciated. I have recently switched to using
> gentooexperimental.org for seed-hosting the tarballs. I would rather have an
> infra box to point SRC_URI to.
> Download problems because a file is not yet available still happen from time to
> time.
> This is a bug that is easy to solve, please do something about it. Cshields
> pointed out we can use the old toucan for this purpose, can this please be
> implemented?

I sent an email to zmedico last night concerning this. So its still on the todo list. And no, its not as easy as you think it is to do it *right*. So please be patient.
Comment 18 Stefan Schweizer (RETIRED) gentoo-dev 2007-01-07 10:06:21 UTC
There seems to be little interest in solving this or it is too hard for infra to do it. Anyway I am closing it as WORKSFORME because we have another way of reducing the syncing time. Thanks.

If you want to resume work on this feel free to reopen.