Most ebuilds in the portage tree reference a permanent URL for the distfiles they require. This is because good upstreams don't purge their distfiles. The Gentoo mirrors drop the files sometime after the ebuilds have gone away, but because upstream still host them, people using old ebuilds/frozen trees/etc are happy. For Gentoo-hosted projects such as genpatches, we have no permanent hosting, which causes problems with frozen trees, previous release media, people wanting to revert to old ebuilds, etc. Could infra please provide a simple hosting service for projects like this? I would envision some box which we can scp files to, and then they are available over http at e.g. http://projects.gentoo.org/project/filename Access would be available to listed developers only. I wouldn't say any redundancy is needed -- occasional downtime would generally be OK (Gentoo mirrors would be used in most cases anyway), and plus many upstreams do go down for short periods on occasion. I'd advise that regular backups should be taken. Besides genpatches this would probably be useful for genkernel, catalyst, portage, sandbox, pax-utils, baselayout, and others. In terms of space requirements, genpatches grows at a rate of about 10mb per year. I would seed any new distfiles host with the patch tarballs going back to 2.6.11 which currently totals 22mb in size. When discussing this before, we were also talking about moving http://dev.gentoo.org/~dsd/genpatches to something like http://genpatches.gentoo.org and auto-updated. I'd like to put that idea on hold for now, and consider it as a separate project from implementing a simple permanent distfiles host.
Is there anything I can do to help move this along? i.e. does it need discussion on some mailing list, comment by a specific individual, or are you guys just busy? Thanks!
It seems pretty trivial to implement in theory. /space/gentoo-projects/ on pecker gets rsynced to this box. How would you implement file removals? I'd prefer more information (re how the service would operate) in order to move the request along. Infra obviously can't do much unless they know the service requirements (and really deleting is the only 'hard' one) -Alec
I'd never want any deletion. I'm requesting permanent hosting here. Similar as you might have on sourceforge or any other upstream host. If there are space concerns, I would suggest placing a size limit on a per-project basis. If you gave me 500mb for genpatches, it would take me about 50 years to fill it.
(In reply to comment #3) > I'd never want any deletion. I'm requesting permanent hosting here. Similar as > you might have on sourceforge or any other upstream host. > > If there are space concerns, I would suggest placing a size limit on a > per-project basis. If you gave me 500mb for genpatches, it would take me about > 50 years to fill it. I don't think thats possible. You have to EOL files sometime. Particularly if someone is running a frozen gentoo they probably will mirror all the sources to be 100% sure of not getting screwing themselves by upstream (just as they should be mirroring debian or centOS or Ubuntu). Mirroring all our sources is not expensive these days (100 gigs). So we can make the provisioning non-automated. You ask for a folder in /space/projects/ and then we put a size cap on that (your 500megs per project). It makes the push a bit harder then, but it might be possible to come up with something that works. You get per-project space then, and we can track usage by the # of projects * space per project. -Alec
I have no desire to EOL the files, ever. Like the projects I host on sourceforge, I will never delete those distfiles. There are many more reasons why having files around forever is useful than just the frozen-portage-tree situation. Nevertheless, if infra decide it's not possible to store these things forever, then that's fine - you can only do what you can do. Just please consider my request for completely permanent hosting. Thanks :)
If/when this bug get's worked. If the scope could be extended beyond projects to overlays as well. So we have a place to store sources, large patches, etc. Without constantly hitting upstream for sources, or storing/serving the stuff from a third party location/server.
The Emacs team would also be interested in this, for the packages where we are our own upstream (e.g. app-emacs/gentoo-syntax). Our space requirements would be rather small, currently less than 5 MB in total.
This is one of those things that in my opinion is so obviously a good idea, that it makes one wonder why three years later this is still not implemented...
Reporting in as infra on this now. I'd like to propose it as the following: 1. /space/projects-local/$PROJECTNAME/ on dev.g.o 2. Access to place files on the service is limited to Gentoo developers ONLY. 3. Mirrored to /projects/ on mirrors. 5. gentoo-projects entry in thirdparty mirrors, so usable as mirror://gentoo-projects/$PROJECTNAME/$FILENAME 5. NO projects.g.o vhost.
The infra folk had a discussion in IRC about this, and alternatives, and the plan has been adjusted slightly to include a hardlink (details on some reasons afterwards). Proposal: --------- 1. /space/projects-local/$PROJECTNAME/ on dev.g.o, developers copy files to here just like the existing distfiles-local and experimental-local. 2. Access to place files on the service is limited to Gentoo developers ONLY. 3. Mirrored to /projects/ on mirrors. 4. HARDLINK content to /distfiles/ on mirrors. 5. gentoo-projects entry in thirdparty mirrors, so usable as mirror://gentoo-projects/$PROJECTNAME/$FILENAME 6. Contents of /space/projects-local/ may be cleared occasionally, but files will persist on the mirror layer. Reasonings: ----------- - Having a vhost, along with a lot of other suggested solutions have one downside: Any content added to a site and referenced from SRC_URI will ALSO end up on /distfiles/ by the nature Gentoo distfiles fetching that takes place on the master mirror. Even my initial gentoo-projects suggestion fell afoul of this. The hardlinks prevent this problem as the files are mirrored out. - Additionally, if we introduced a vhost, we would still have the issue of availability. Comparing our mirror tiers with any single vhost we have, the mirrors have MUCH higher availability. Not yet solved: --------------- ferringb pointed out that we should find a way to reliably track what the digests of files SHOULD be, and who introduced a file.
(In reply to comment #10) > Proposal: > --------- > 1. /space/projects-local/$PROJECTNAME/ on dev.g.o, developers copy files to > here just like the existing distfiles-local and experimental-local. > 2. Access to place files on the service is limited to Gentoo developers ONLY. > 3. Mirrored to /projects/ on mirrors. > 4. HARDLINK content to /distfiles/ on mirrors. > 5. gentoo-projects entry in thirdparty mirrors, so usable as > mirror://gentoo-projects/$PROJECTNAME/$FILENAME > 6. Contents of /space/projects-local/ may be cleared occasionally, but files > will persist on the mirror layer. Looks good to me. > ferringb pointed out that we should find a way to reliably track what the > digests of files SHOULD be, and who introduced a file. Require that files are accompanied by a detached PGP signature? Looking at other projects, this seems to be the common practice.
I like Robin's proposal with Ulrich addition, and would be nice to have, like... ASAP? I've actually got quite a bit of content to push there... I would suggest allowing subdirectories though, just to be able to organise stuff in a decent form... Thanks!
subdirs within each $project you mean?
Yeps
Updated proposal. Included signatures for file validity AND tracing who uploaded or is responsible for a given file. Proposal: --------- 1. /space/projects-local/$PROJECTNAME/ on dev.g.o 1.1. developers copy files to here just like the existing distfiles-local and experimental-local. 1.2. subdirectories are permitted. 1.3. No namespace conflicts are permitted between ANY files (regardless of dirs) 1.4. DIGESTS-${DATE} must be uploaded with files. Contains GPG-clearsigned Manifest2 data for files. 2. Access to place files on the service is limited to Gentoo developers ONLY. 3. Mirrored to /projects/ on mirrors. 3.1. Every file MUST have a matching DIGESTS entry to be mirrored. 3.2. Per directory, Merge all DIGESTS-* files to a single DIGESTS file for user validation. 4. HARDLINK content to /distfiles/ on mirrors, with directories stripped 5. gentoo-projects entry in thirdparty mirrors, so usable as mirror://gentoo-projects/$PROJECTNAME/$FILENAME 6. Contents of /space/projects-local/ may be cleared occasionally, but files will persist on the mirror layer. Reasonings: ----------- - Having a vhost, along with a lot of other suggested solutions have one downside: Any content added to a site and referenced from SRC_URI will ALSO end up on /distfiles/ by the nature Gentoo distfiles fetching that takes place on the master mirror. Even my initial gentoo-projects suggestion fell afoul of this. The hardlinks prevent this problem as the files are mirrored out. - Additionally, if we introduced a vhost, we would still have the issue of availability. Comparing our mirror tiers with any single vhost we have, the mirrors have MUCH higher availability.
Created attachment 263505 [details] IRC log from Jan 2011 In this log, we discussed the previously proposed solution and the merits of going back to a vhost. The vhost idea won again, so initial stages of implementation will be happening soon as this is one of the projects on my list. Robin has stated that he will be working on the checksum/verification aspect and I will be getting the automation bits in place. I'm posting this log as a reminder to myself
Guys, any thoughts on moving forward with the proposed solution?
(In reply to comment #17) > Guys, any thoughts on moving forward with the proposed solution? Can somebody make a GSoC project idea out of this please, and put it on the wiki? I never did get all the tools written.
Yay, updates after briefly talking to robbat2 about this: http://linux.die.net/man/1/kup is a tool for a similar purpose used by kernel.org that already covers most of the needed features. Special things: - Every project has its own directory, as well as a distfiles/ subdirectory that is exported to mirrors. All files in all distfiles/ directories are thus required to have a unique filename. The kup tool will reject files with duplicate names. Subdirectories can be created and will be recursed for that purpose. - Files currently found in CVS (gentoo/xml/proj/..) can be uploaded to the project directory directly and are not bound by the filename uniqueness rules. These are only available via (probably) projects.g.o Next up: - Select a machine - Patch kup-server to enforce filename uniqueness
Update: kup looked reasonable to try set up, but need the missing piece for gitolite integration. Robin's emailed the kernel.org folks.
Created attachment 790865 [details, diff] kup-server-gitolite-subcmd.patch Raw kup gitolite patch from Konstantin Ryabitsev <konstantin@linuxfoundation.org>
*** Bug 780324 has been marked as a duplicate of this bug. ***
Dropping council@ as agreed in September meeting.