Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 391439

Summary: extensible uri protocols
Product: Gentoo Hosted Projects Reporter: SpanKY <vapier>
Component: PMS/EAPIAssignee: PMS/EAPI <pms>
Status: CONFIRMED ---    
Severity: enhancement CC: chewi, esigra, m.seifert
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on: 334275    
Bug Blocks: 174380    

Description SpanKY gentoo-dev 2011-11-22 20:45:25 UTC
for uri protocols that are not explicitly specified by PMS, perhaps we can allow the user to locally extend things based on their needs

for example, if they have local ebuilds with SRC_URI="sftp://...", they could define a way to handle the sftp protocol for their site usage.

obvious possible solutions:
 - use FETCHCOMMAND_<protocol> variable
 - execute `pms_fetch_<protocol> ${DISTDIR} <url>`
Comment 1 James Le Cuirot gentoo-dev 2016-10-04 22:02:28 UTC
I have recently posted to gentoo-dev about this. Let's get this show on the road. We're already closer than the above comment would suggest.

The FETCHCOMMAND_<protocol> variable was already implemented before this bug report was even filed. Perhaps the report was merely intended to get this into PMS.

Simply allowing the user to manually configure this has limited benefit though. What we really need is for packages to be able to extend the list by dropping scripts in place. That way any package can use any protocol without user intervention as long as the package needed to satisfy that protocol is added to DEPEND.

The existing FETCHCOMMAND default values are defined in /usr/share/portage/config/make.globals. For a quick solution, change this to a directory and then packages could drop files in there. This could potentially have other uses though I can't think of any right now.

The drawback is that this solution is specific to Portage (and pkgcore) so packages would need to install a different file to support Paludis as well. As it happens, Paludis already uses the second of the above suggestions, albeit with slightly different arguments. See the Paludis documentation for the details.

http://paludis.exherbo.org/configuration/fetchers.html

This inherently allows packages to extend the protocol list by installing new scripts. If Portage (and pkgcore) were to support these fetcher scripts as well then only one script per protocol would be necessary. Unfortunately Paludis chose to bake its resume logic into the docurl script, whereas Portage handles this internally, so one side may have to make adjustments. Incidentally, Paludis already supports the file:// protocol requested in bug #334275.

http://git.exherbo.org/paludis/paludis.git/tree/paludis/fetchers/demos/docurl
http://git.exherbo.org/paludis/paludis.git/tree/paludis/fetchers/dofile

We would also need to consider migration of existing user configuration. Paludis allows for scripts under /etc to override those under /usr/share. Upon installing the new version of Portage, if FETCHCOMMAND is found to not match the existing default, we could transform its value into the new form and drop the resulting script under /etc.

Whaddya think?
Comment 2 Kristian Fiskerstrand (RETIRED) gentoo-dev 2016-10-05 11:08:12 UTC
Extensible protocols itself seems like a bad idea, the protocols in use in SRC_URI should be part of PMS.

Why not just use another variable for this along with an eclass?
Comment 3 James Le Cuirot gentoo-dev 2016-10-05 12:03:48 UTC
(In reply to Kristian Fiskerstrand from comment #2)
> Extensible protocols itself seems like a bad idea, the protocols in use in
> SRC_URI should be part of PMS.

Please give some reasoning.

> Why not just use another variable for this along with an eclass?

Eclasses are currently not involved in the fetching process. If you want them to be then you'll need to explain what you mean.

axs mentioned the src_fetch() idea on the list, which is good for VCS, but it raises more questions than it answers. How would it deal with checksums? Would it be used for all protocols including http? Would users still be able to customise it to the extent they can now and if so, how? When all you need to do is fetch a single file (as opposed to a repository), this doesn't seem like the way to go.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2016-10-05 13:32:47 UTC
And this doesn't raise any questions at all?

For a start, how to deal with dependencies? Does 'emerge --fetchonly' now install packages? What about keywords on fetch dependencies? What about mapping filenames from URIs? Do http/ftp rules work for this? Do we have to invent another script wrapper to generate filenames? How to handle mirrors? What about people with restricted Internet access? How to handle multiple files? How to protect people from 'emerge --fetchonly' executing arbitrary scripts, leaving junk in arbitrary locations?
Comment 5 James Le Cuirot gentoo-dev 2016-10-05 14:33:51 UTC
(In reply to Michał Górny from comment #4)
> And this doesn't raise any questions at all?

Fair play, let me try to tackle these.

> For a start, how to deal with dependencies? Does 'emerge --fetchonly' now
> install packages?

That's a good point, I hadn't considered --fetchonly. It fetches everything that it can and while it does fail visibly, it doesn't stop immediately on a failure such as a 404. I don't think it would be unreasonable to skip protocols that can't be dealt with yet.

> What about keywords on fetch dependencies?

What about them? If a package uses a protocol that needs a dependency then that dependency should go in DEPEND and it will obviously need to have the necessary keywords, just like any other dependency. This should be sufficient given what I said about --fetchonly above.

> What about
> mapping filenames from URIs?

We already have ->, which I intend to use with gogdownloader:// as the resulting filename does not match the URI. Am I missing something?

> Do http/ftp rules work for this?

I don't know what you mean by this.

> Do we have to
> invent another script wrapper to generate filenames?

Also not sure about this one.

> How to handle mirrors?

I hadn't considered that but it's probably fair to say that any protocol we add at this point is going to be obscure and not widely mirrored.

> What about people with restricted Internet access?

You mean like proxies? gogdownloader:// is really just HTTP + OAuth. The client uses libcurl so I imagine that proxy environment variables are respected. Failing that, I have said that I intend for gogdownloader:// to be opt-in via a flag so anyone behind a proxy can just download the file via their browser as usual.

For other protocols, if they're not opt-in then the implication is that the sources are not directly available by any other means. If proxies aren't supported then such users would not be able to fetch anyway.

> How to handle multiple
> files?

We currently handle SRC_URI entries individually. Are you suggesting that some protocol URIs may somehow map onto multiple files? Unless we're talking about repository URIs (which don't really fit SRC_URI semantics) then this seems unlikely.

> How to protect people from 'emerge --fetchonly' executing arbitrary
> scripts, leaving junk in arbitrary locations?

We should have some faith in our own developers but if you really want to be sure then the sandbox should be applied to the fetch process. I don't believe it currently is as I was able to write to /etc/lgogdownloader. I'll verify whether the client actually needs to write there when fetching as opposed to just logging in. If it does then we need to allow the sandbox to be configured for fetching.
Comment 6 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2016-10-05 20:54:00 UTC
What I'm pointing out is that you're trying to turn a very specific problem into a generic solution… but your generic solution doesn't really work for many things besides the specific problem you're trying to solve. In other words, three layers of complexity and no real benefit for anyone.
Comment 7 James Le Cuirot gentoo-dev 2016-10-06 13:25:30 UTC
(In reply to Michał Górny from comment #6)
> What I'm pointing out is that you're trying to turn a very specific problem
> into a generic solution… but your generic solution doesn't really work for
> many things besides the specific problem you're trying to solve. In other
> words, three layers of complexity and no real benefit for anyone.

Please point that out a bit clearer next time. :P

I do appreciate what you're saying given that there isn't exactly a queue of protocols needing this and it would still leave us with VCS eclasses abusing src_unpack.

Allow me to suggest a compromise. Scrap the idea of making the list freely extensible and have gogdownloader:// included in PMS with the caveat that an additional dependency is required. Portage (plus pkgcore) can define FETCHCOMMAND_GOGDOWNLOADER as I suggested in my mail and Paludis can include its own script. This weighs in at around 2 new lines for each. Not a big deal.

Now you may say that this still doesn't address any of the concerns you raised but if src_fetch is the alternative then that doesn't really help either. src_fetch will still require additional properly keyworded dependencies, which may or may not be installed with --fetchonly, it probably won't support mirrors, and it will still require sandboxing.

At the same time, it won't necessarily include the features you otherwise get for free, namely Portage's resume logic and its ability to verify checksums against the Manifest. Admittedly that's not a deal breaker as lgogdownloader does have these features built in.

I'm not saying that I wouldn't use src_fetch but that's a larger piece of work and no one seems in a hurry to implement it. It's hard for me to justify the time when just 2 lines could do what I need. I'd at least like some help.
Comment 8 Ulrich Müller gentoo-dev 2016-10-06 18:18:15 UTC
(In reply to James Le Cuirot from comment #7)
> Scrap the idea of making the list freely extensible and have
> gogdownloader:// included in PMS with the caveat that an additional
> dependency is required.

This bug is about "extensible URI protocols", so please stay on this topic. For any new ideas, a new bug should be opened.

Also I am not sure if hardcoding proprietary URI schemes in the package manager would be a good idea. In the case of gogdownloader, their webpage seems to indicate that there are are at least two incompatible versions of the protocol:
https://www.gog.com/support/website_help/downloads_and_games (11., "old GOG Downloader").
Comment 9 James Le Cuirot gentoo-dev 2016-10-06 20:00:00 UTC
(In reply to Ulrich Müller from comment #8)
> (In reply to James Le Cuirot from comment #7)
> > Scrap the idea of making the list freely extensible and have
> > gogdownloader:// included in PMS with the caveat that an additional
> > dependency is required.
> 
> This bug is about "extensible URI protocols", so please stay on this topic.
> For any new ideas, a new bug should be opened.

If we can't have extensible protocols then I am trying to find my way towards the next best thing so that any other protocols that come along may follow suit. I am also not ruling out src_fetch at this point, which would ultimately resolve this bug.

> Also I am not sure if hardcoding proprietary URI schemes in the package
> manager would be a good idea. In the case of gogdownloader, their webpage
> seems to indicate that there are are at least two incompatible versions of
> the protocol:
> https://www.gog.com/support/website_help/downloads_and_games (11., "old GOG
> Downloader").

I don't like the idea of a proprietary scheme in PMS either, which is why I was pushing for proper extensible support. I don't think versioning is an issue when there's only one client that would surely do whatever it can to adapt. It wouldn't be great if the scheme were to die but all the ebuilds would need adjusting whether the scheme is mentioned in PMS or not. That wouldn't be the end of the world and I'd gladly take on the responsibility. What would upset me most is if this idea were ultimately blocked by politics more than technical limitations.