Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 176186 - Gentoo projects file hosting
Summary: Gentoo projects file hosting
Status: IN_PROGRESS
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: High enhancement (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-26 23:40 UTC by Daniel Drake (RETIRED)
Modified: 2022-10-09 19:05 UTC (History)
16 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
IRC log from Jan 2011 (vhost-log.txt,3.93 KB, text/plain)
2011-02-22 15:59 UTC, Jeremy Olexa (darkside) (RETIRED)
Details
kup-server-gitolite-subcmd.patch (kup-server-gitolite-subcmd.patch,6.33 KB, patch)
2022-07-10 16:50 UTC, Robin Johnson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Drake (RETIRED) gentoo-dev 2007-04-26 23:40:40 UTC
Most ebuilds in the portage tree reference a permanent URL for the distfiles they require. This is because good upstreams don't purge their distfiles. The Gentoo mirrors drop the files sometime after the ebuilds have gone away, but because upstream still host them, people using old ebuilds/frozen trees/etc are happy.

For Gentoo-hosted projects such as genpatches, we have no permanent hosting, which causes problems with frozen trees, previous release media, people wanting to revert to old ebuilds, etc.

Could infra please provide a simple hosting service for projects like this?
I would envision some box which we can scp files to, and then they are available over http at e.g. http://projects.gentoo.org/project/filename
Access would be available to listed developers only.

I wouldn't say any redundancy is needed -- occasional downtime would generally be OK (Gentoo mirrors would be used in most cases anyway), and plus many upstreams do go down for short periods on occasion.

I'd advise that regular backups should be taken.

Besides genpatches this would probably be useful for genkernel, catalyst, portage, sandbox, pax-utils, baselayout, and others.

In terms of space requirements, genpatches grows at a rate of about 10mb per year. I would seed any new distfiles host with the patch tarballs going back to 2.6.11 which currently totals 22mb in size.


When discussing this before, we were also talking about moving http://dev.gentoo.org/~dsd/genpatches to something like http://genpatches.gentoo.org and auto-updated. I'd like to put that idea on hold for now, and consider it as a separate project from implementing a simple permanent distfiles host.
Comment 1 Daniel Drake (RETIRED) gentoo-dev 2007-08-22 20:14:12 UTC
Is there anything I can do to help move this along? i.e. does it need discussion on some mailing list, comment by a specific individual, or are you guys just busy? Thanks!
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2007-08-23 05:08:05 UTC
It seems pretty trivial to implement in theory.

/space/gentoo-projects/ on pecker gets rsynced to this box.

How would you implement file removals?

I'd prefer more information (re how the service would operate) in order to move the request along.

Infra obviously can't do much unless they know the service requirements (and really deleting is the only 'hard' one)

-Alec
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2007-08-23 18:02:26 UTC
I'd never want any deletion. I'm requesting permanent hosting here. Similar as you might have on sourceforge or any other upstream host.

If there are space concerns, I would suggest placing a size limit on a per-project basis. If you gave me 500mb for genpatches, it would take me about 50 years to fill it.
Comment 4 Alec Warner (RETIRED) archtester gentoo-dev Security 2007-08-23 18:53:37 UTC
(In reply to comment #3)
> I'd never want any deletion. I'm requesting permanent hosting here. Similar as
> you might have on sourceforge or any other upstream host.
> 
> If there are space concerns, I would suggest placing a size limit on a
> per-project basis. If you gave me 500mb for genpatches, it would take me about
> 50 years to fill it.

I don't think thats possible.  You have to EOL files sometime.  Particularly if someone is running a frozen gentoo they probably will mirror all the sources to be 100% sure of not getting screwing themselves by upstream (just as they should be mirroring debian or centOS or Ubuntu).  Mirroring all our sources is not expensive these days (100 gigs).

So we can make the provisioning non-automated.  You ask for a folder in /space/projects/ and then we put a size cap on that (your 500megs per project).

It makes the push a bit harder then, but it might be possible to come up with something that works.  You get per-project space then, and we can track usage by the # of projects * space per project.

-Alec
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2007-08-23 22:57:27 UTC
I have no desire to EOL the files, ever. Like the projects I host on sourceforge, I will never delete those distfiles. There are many more reasons why having files around forever is useful than just the frozen-portage-tree situation.

Nevertheless, if infra decide it's not possible to store these things forever, then that's fine - you can only do what you can do. Just please consider my request for completely permanent hosting. Thanks :)
Comment 6 William L. Thomson Jr. (RETIRED) gentoo-dev 2008-06-16 23:50:02 UTC
If/when this bug get's worked. If the scope could be extended beyond projects to overlays as well. So we have a place to store sources, large patches, etc. Without constantly hitting upstream for sources, or storing/serving the stuff from a third party location/server.
Comment 7 Ulrich Müller gentoo-dev 2009-06-15 06:39:32 UTC
The Emacs team would also be interested in this, for the packages where we are our own upstream (e.g. app-emacs/gentoo-syntax). Our space requirements would be rather small, currently less than 5 MB in total.
Comment 8 Ben de Groot (RETIRED) gentoo-dev 2010-04-09 19:32:37 UTC
This is one of those things that in my opinion is so obviously a good idea, that it makes one wonder why three years later this is still not implemented...
Comment 9 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-05-18 22:37:42 UTC
Reporting in as infra on this now.

I'd like to propose it as the following:

1. /space/projects-local/$PROJECTNAME/ on dev.g.o
2. Access to place files on the service is limited to Gentoo developers ONLY.
3. Mirrored to /projects/ on mirrors.
5. gentoo-projects entry in thirdparty mirrors, so usable as mirror://gentoo-projects/$PROJECTNAME/$FILENAME
5. NO projects.g.o vhost.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-05-19 07:09:49 UTC
The infra folk had a discussion in IRC about this, and alternatives, and the plan has been adjusted slightly to include a hardlink (details on some reasons afterwards).

Proposal:
---------
1. /space/projects-local/$PROJECTNAME/ on dev.g.o, developers copy files to here just like the existing distfiles-local and experimental-local.
2. Access to place files on the service is limited to Gentoo developers ONLY.
3. Mirrored to /projects/ on mirrors.
4. HARDLINK content to /distfiles/ on mirrors.
5. gentoo-projects entry in thirdparty mirrors, so usable as mirror://gentoo-projects/$PROJECTNAME/$FILENAME
6. Contents of /space/projects-local/ may be cleared occasionally, but files will persist on the mirror layer.

Reasonings:
-----------
- Having a vhost, along with a lot of other suggested solutions have one downside:
Any content added to a site and referenced from SRC_URI will ALSO end up on /distfiles/ by the nature Gentoo distfiles fetching that takes place on the master mirror. Even my initial gentoo-projects suggestion fell afoul of this. The hardlinks prevent this problem as the files are mirrored out.
- Additionally, if we introduced a vhost, we would still have the issue of availability. Comparing our mirror tiers with any single vhost we have, the mirrors have MUCH higher availability.

Not yet solved:
---------------
ferringb pointed out that we should find a way to reliably track what the digests of files SHOULD be, and who introduced a file.
Comment 11 Ulrich Müller gentoo-dev 2010-05-19 12:55:37 UTC
(In reply to comment #10)
> Proposal:
> ---------
> 1. /space/projects-local/$PROJECTNAME/ on dev.g.o, developers copy files to
> here just like the existing distfiles-local and experimental-local.
> 2. Access to place files on the service is limited to Gentoo developers ONLY.
> 3. Mirrored to /projects/ on mirrors.
> 4. HARDLINK content to /distfiles/ on mirrors.
> 5. gentoo-projects entry in thirdparty mirrors, so usable as
> mirror://gentoo-projects/$PROJECTNAME/$FILENAME
> 6. Contents of /space/projects-local/ may be cleared occasionally, but files
> will persist on the mirror layer.

Looks good to me.

> ferringb pointed out that we should find a way to reliably track what the
> digests of files SHOULD be, and who introduced a file.

Require that files are accompanied by a detached PGP signature? Looking at other projects, this seems to be the common practice.
Comment 12 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-08-06 01:28:55 UTC
I like Robin's proposal with Ulrich addition, and would be nice to have, like... ASAP?

I've actually got quite a bit of content to push there... I would suggest allowing subdirectories though, just to be able to organise stuff in a decent form...

Thanks!
Comment 13 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-08-06 19:22:04 UTC
subdirs within each $project you mean?
Comment 14 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-08-06 19:29:25 UTC
Yeps
Comment 15 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-01-20 04:22:53 UTC
Updated proposal. Included signatures for file validity AND tracing who uploaded or is responsible for a given file.

Proposal:
---------
1. /space/projects-local/$PROJECTNAME/ on dev.g.o
1.1. developers copy files to here just like the existing distfiles-local and experimental-local.
1.2. subdirectories are permitted. 
1.3. No namespace conflicts are permitted between ANY files (regardless of dirs)
1.4. DIGESTS-${DATE} must be uploaded with files. Contains GPG-clearsigned Manifest2 data for files.
2. Access to place files on the service is limited to Gentoo developers ONLY.
3. Mirrored to /projects/ on mirrors.
3.1. Every file MUST have a matching DIGESTS entry to be mirrored.
3.2. Per directory, Merge all DIGESTS-* files to a single DIGESTS file for user validation.
4. HARDLINK content to /distfiles/ on mirrors, with directories stripped
5. gentoo-projects entry in thirdparty mirrors, so usable as
mirror://gentoo-projects/$PROJECTNAME/$FILENAME
6. Contents of /space/projects-local/ may be cleared occasionally, but files
will persist on the mirror layer.

Reasonings:
-----------
- Having a vhost, along with a lot of other suggested solutions have one
downside:
Any content added to a site and referenced from SRC_URI will ALSO end up on
/distfiles/ by the nature Gentoo distfiles fetching that takes place on the
master mirror. Even my initial gentoo-projects suggestion fell afoul of this.
The hardlinks prevent this problem as the files are mirrored out.
- Additionally, if we introduced a vhost, we would still have the issue of
availability. Comparing our mirror tiers with any single vhost we have, the
mirrors have MUCH higher availability.
Comment 16 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-02-22 15:59:26 UTC
Created attachment 263505 [details]
IRC log from Jan 2011

In this log, we discussed the previously proposed solution and the merits of going back to a vhost. The vhost idea won again, so initial stages of implementation will be happening soon as this is one of the projects on my list. Robin has stated that he will be working on the checksum/verification aspect and I will be getting the automation bits in place. I'm posting this log as a reminder to myself
Comment 17 Mike Pagano gentoo-dev 2013-04-09 00:17:53 UTC
Guys, any thoughts on moving forward with the proposed solution?
Comment 18 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2013-04-09 00:21:31 UTC
(In reply to comment #17)
> Guys, any thoughts on moving forward with the proposed solution?
Can somebody make a GSoC project idea out of this please, and put it on the wiki?
I never did get all the tools written.
Comment 19 Alex Legler (RETIRED) archtester gentoo-dev Security 2014-12-30 02:15:49 UTC
Yay, updates after briefly talking to robbat2 about this:
http://linux.die.net/man/1/kup is a tool for a similar purpose used by kernel.org that already covers most of the needed features.

Special things:
- Every project has its own directory, as well as a distfiles/ subdirectory that is exported to mirrors. All files in all distfiles/ directories are thus required to have a unique filename. The kup tool will reject files with duplicate names. Subdirectories can be created and will be recursed for that purpose.
- Files currently found in CVS (gentoo/xml/proj/..) can be uploaded to the project directory directly and are not bound by the filename uniqueness rules. These are only available via (probably) projects.g.o

Next up:
- Select a machine
- Patch kup-server to enforce filename uniqueness
Comment 20 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-04-12 09:07:24 UTC
Update: kup looked reasonable to try set up, but need the missing piece for gitolite integration. Robin's emailed the kernel.org folks.
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2022-07-10 16:50:56 UTC
Created attachment 790865 [details, diff]
kup-server-gitolite-subcmd.patch

Raw kup gitolite patch from Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Comment 22 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-07-24 17:21:56 UTC
*** Bug 780324 has been marked as a duplicate of this bug. ***
Comment 23 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-10-09 19:05:53 UTC
Dropping council@ as agreed in September meeting.