Following bug 186454 let's concentrate all pkg level metadata to individual ebuilds.
Many eclasses set HOMEPAGE. How do you suggest to handle it?
Also many ebuilds reference the variable, most often for elog messages like "for documentation see ${HOMEPAGE}", or "download ${A} from ${HOMEPAGE}" for fetch restricted packages. Some (e.g. sys-apps/portage) even use it as constant at compile time. Maybe we should move metadata from XML to something that can be sourced (by the package manager) into the ebuild environment?
Per package eclasses!
(In reply to comment #2) > Maybe we should move metadata from XML to something that can be sourced (by the > package manager) into the ebuild environment? Well, if we can assume that the package manager has at least a minimal XML parser, the package manager can simply parse metadata.xml and export relevant variables to the ebuild environment. (In reply to comment #3) > Per package eclasses! That might be overkill, since we don't necessarily need full bash syntax support. For example, a format like make.defaults might serve the purpose well.
Assuming the package manager can read XML means that when it can't (e.g. due to libxml2 getting broken again), the package manager isn't usable...
(In reply to comment #5) > Assuming the package manager can read XML means that when it can't (e.g. due to > libxml2 getting broken again), the package manager isn't usable... So, if we went that route, then package managers that rely on libraries like expat or libxml2 would be forced to have an xml parser bundled or statically linked. Portage falls into this category, since python's xml libraries are implemented on top of expat.
Does this make a real difference from how metadata.xml is handled currently?
Stuff in XML is generally a pain in the ass to work with. Plus that whole "restrict" thing is just silly. Two mechanisms for essentially the same thing was never a good idea...
Maybe migrate from XML to JSON?
JSON is just like XML except with even less sanity checking.
I mentioned JSON because portage can rely on the json module that's included with Python, which is not vulnerable to the kind of breakage that Python's xml libraries are. Going back to my suggestion from comment #4 regarding a format like make.defaults, how about if we add a metadata.env file that only contains shell-like variable assignments and no executable code? The package manager will automatically source this file before the ebuild, if the ebuild's EAPI supports it. Therefore, the package manager will have to parse the EAPI assignment from the head of the ebuild, in order to decide whether or not the ebuild's EAPI supports metadata.env.
You can't add global-scope-changing behaviour, and ebuilds can't be "parsed". We already had this discussion.
(In reply to comment #12) > You can't add global-scope-changing behaviour, and ebuilds can't be "parsed". > We already had this discussion. Seems quite feasible in practice, but I'd rather not argue about it.
The problem with "quite feasible" is that it means "usually works", which in turn means that a) sometimes it doesn't work, and b) when it doesn't work, it doesn't work *badly*, because everyone assumes that it always does work. A large number of existing design screw-ups were caused that philosophy; we should be looking at fixing those, not adding even more.
(In reply to comment #12) > ebuilds can't be "parsed". For the purpose of parsing the EAPI assignment, I'd suggest that PMS retroactively require that the assignment be in the first N lines, where N is something like 30 or so. Also, PMS should specify a regular expression that all package managers can use to parse the assignment.
...or we could solve the problem properly, and not break stuff.
(In reply to comment #15) > For the purpose of parsing the EAPI assignment, I'd suggest that PMS > retroactively require that the assignment be in the first N lines, where N > is something like 30 or so. Also, PMS should specify a regular expression > that all package managers can use to parse the assignment. Such a solution could work, but it doesn't look very elegant. (We would have to require that the EAPI determined by parsing and the one determined by processing the ebuild with bash are identical, i.e. abort with an error if they're different.) I'd rather add a new command that would source metadata.env. Or maybe extend the inherit command, such that metadata.env would be implicitly sourced when the first inherit command is encountered.
(In reply to comment #17) > Such a solution could work, but it doesn't look very elegant. (We would have to > require that the EAPI determined by parsing and the one determined by > processing the ebuild with bash are identical, i.e. abort with an error if > they're different.) The fact that we can compare the probed EAPI to the actual EAPI variable after the ebuild is sourced seems like a perfect sanity check. We could easily detect inconsistencies and flag such ebuilds as invalid, providing a reliable feedback mechanism to ebuild developers. > I'd rather add a new command that would source metadata.env. Or maybe extend > the inherit command, such that metadata.env would be implicitly sourced when > the first inherit command is encountered. Now what your describing here sounds less elegant to me. Being able to reliably probe the EAPI has numerous advantages, one of the most important being that a package manager can detect that a particular ebuild's EAPI is unsupported without the need to incur the performance penalty of a bash process.
GLEP 55 gives you every advantage you ask for, plus an even larger performance improvement, and it also provides the *reliably* part.
(In reply to comment #19) > GLEP 55 gives you every advantage you ask for, plus an even larger performance > improvement, and it also provides the *reliably* part. There are advantages and disadvantages to both approaches, but the performance difference is negligible, especially if you consider that package managers typically use cached metadata to obtain the EAPI.
Uh, the directory read on the ebuild directory comes *before* checking cache. But even ignoring the performance gain, GLEP 55 is still the only technically sound solution.
I think the probing approach described in comment #15 is pretty sound, especially if you implement the sanity check discussed in comment #18. So, I don't think it's accurate to say that GLEP 55 is the only sound solution.
"pretty sound" is not "sound", "mostly works" is not "works", "pragmatic" is not "correct", and a system that is built upon a composite of things that are "pretty sound", "mostly working" and "pragmatic" does not itself "mostly work".
(In reply to comment #23) > "pretty sound" is not "sound", "mostly works" is not "works", "pragmatic" is > not "correct", and a system that is built upon a composite of things that are > "pretty sound", "mostly working" and "pragmatic" does not itself "mostly work". You, sir, are a master of FUD. /me bows
And you are a master of nailing more bits onto the side of a slowly collapsing building in such a way that usually only a few people die from falling debris each time.
Regardless of whether or not we ever migrate to a file name based system like GLEP 55, I think that for existing EAPIs, it would be useful for PMS to require that ebuilds support an EAPI probing approach like the one described in comment #15.
You mean you want us to put in another hack so we can postpone fixing it properly for even longer?
I would consider the inclusion of EAPI probing in PMS to be a useful enhancement to support existing EAPIs.
But it's not safe, and it doesn't solve the global scope problem.
EAPI probing is quite safe if you implement a feedback mechanism like the one discussed in comment #18. It enables use of EAPI to change the global scope environment that's used to source ebuilds, as long as the package manager supports EAPI probing. Therefore, global scope changes should not be made until EAPI probing support has been deployed for a reasonable amount of time.
...so in other words it's like GLEP 55, except less powerful, less reliable, with a long delay before introduction, and without future proofing.
EAPI probing would enhance support for existing EAPIs, and would not interfere with the implementation of GLEP 55, so they should not be considered as mutually exclusive proposals.
It was inevitable that someone would mention GLEP 55. However, it was voted down: <http://www.gentoo.org/proj/en/council/meeting-logs/20100823-summary.txt>. In the same meeting, the council also requested that a solution for the issues raised by GLEP 55 should be found. The method proposed in comment #15 and comment #18 could work in practice, and it shouldn't be difficult to verify if it works for all ebuilds in the tree. What I don't like about it is that each ebuilds would have to be read twice which might bring some performance penalty. I remember that long time ago we've been promised some benchmarks: <http://www.gentoo.org/proj/en/council/meeting-logs/20090326-summary.txt> What has happened to them?
(In reply to comment #33) > What I don't like about it is that each ebuilds would have to be read twice > which might bring some performance penalty. There's zero difference for the vast majority of accesses, since the EAPI is accessible via the metadata cache. So, there's not much point in doing a benchmark for that case. For the case where the ebuild actually needs to be opened twice, like when running emerge --regen or egencache, the buffer cache should make the difference negligible. So, there's not much point in benchmarking that case either.
(In reply to comment #33) > It was inevitable that someone would mention GLEP 55. However, it was voted > down: Then perhaps it's time for the Council to reconsider, given that GLEP 55 is clearly still needed.
Sorry that I inject some facts into the discussion. ;) I've scanned the gentoo-x86 tree using this (GNU sed) regexp: ^[ \t]*EAPI=\([0-9]\+\|"[0-9]\+"\|'[0-9]\+'\)[ \t]*\(#.*\)\?$ (Using [0-9] for now, because PMS doesn't specify what characters are allowed in an EAPI name.) The EAPI determined this way agrees with the one in the metadata cache, except for the following three ebuilds: dev-ml/bin-prot-2.0.3 www-servers/apache-2.2.21-r1 www-servers/apache-2.2.21-r2 The first one defines EAPI twice, and the other two fail because apache-2.eclass defines EAPI=2.
While I still thing GLEP 55 is the best solution, as a second choice having a specially formatted comment seems more elegant than imposing restrictions on the syntax of the actual bash code. It would also mean that the EAPI variable could be made read-only before sourcing the ebuild, eliminating the concern about it being changed inconsistently with the parsed value. There's no need to change existing EAPIs - anything without the comment could be handled the current way. The only real disadvantage that I can see is that it would require a delay before actually being used, since current PM versions would assume EAPI 0 for such ebuilds. During the GLEP 55 discussions some people suggested that they'd be OK with a one-time extension change to introduce this sort of thing, so that's a possible solution.
(In reply to comment #37) > The only real disadvantage that I can see is that it would require a delay > before actually being used, since current PM versions would assume EAPI 0 > for such ebuilds. Not necessarily. We could still require that ebuilds assign the EAPI in the traditional way, at least for a transition period. This would imply that the variable cannot be made readonly, at least not immediately. > During the GLEP 55 discussions some people suggested that they'd be OK with > a one-time extension change to introduce this sort of thing, so that's a > possible solution. You mean the *.eb that was proposed previously? I for my part wouldn't be opposed against it. (But was there consensus about it? I don't remember.)
(In reply to comment #38) > Not necessarily. We could still require that ebuilds assign the EAPI in the > traditional way, at least for a transition period. This would imply that the > variable cannot be made readonly, at least not immediately. That could work, but it does complicate things. > You mean the *.eb that was proposed previously? I for my part wouldn't be > opposed against it. (But was there consensus about it? I don't remember.) I'm not fussy about the exact extension, but I believe that was suggested, yes.
(In reply to comment #37) > While I still thing GLEP 55 is the best solution, as a second choice having a > specially formatted comment seems more elegant than imposing restrictions on > the syntax of the actual bash code. This feels hair splitting, but one could also say that it seems more elegant simply make use of existing bash assignment syntax than to embed metadata inside comments.
Have you any idea how hard it is to parse "bash assignment syntax"?
(In reply to comment #41) > Have you any idea how hard it is to parse "bash assignment syntax"? The idea is to have PMS specify a regular expression that the EAPI assignment is required to match, any ebuilds that do not match will be flagged as invalid, using the feedback mechanism discussed in comment #18.
...so not "bash assignment syntax" at all then. You want to introduce a different syntax, just for one variable, whose rules apply only to that variable and not anywhere else.
(In reply to comment #43) > ...so not "bash assignment syntax" at all then. It would be a subset of bash assignment. Something like the expression shown in comment #36. The 3 ebuilds that didn't match would be automatically flagged as invalid, and would have to be fixed.
(In reply to comment #40) > This feels hair splitting, but one could also say that it seems more elegant > simply make use of existing bash assignment syntax than to embed metadata > inside comments. Metadata in comments means you don't have to contort yourself around the rules of the host language - since it's ignored by the host, you can define it however you want (as long as you avoid the end-of-comment marker of course)
(In reply to comment #45) > (In reply to comment #40) > > This feels hair splitting, but one could also say that it seems more elegant > > simply make use of existing bash assignment syntax than to embed metadata > > inside comments. > > Metadata in comments means you don't have to contort yourself around the rules > of the host language - since it's ignored by the host, you can define it > however you want (as long as you avoid the end-of-comment marker of course) Sure, but a subset of bash assignment syntax would work perfectly well, and would fit the existing framework with minimal changes, so the comment approach feels like over-engineering in comparison.
(In reply to comment #46) > (In reply to comment #45) > > (In reply to comment #40) > > > This feels hair splitting, but one could also say that it seems more elegant > > > simply make use of existing bash assignment syntax than to embed metadata > > > inside comments. > > > > Metadata in comments means you don't have to contort yourself around the rules > > of the host language - since it's ignored by the host, you can define it > > however you want (as long as you avoid the end-of-comment marker of course) > > Sure, but a subset of bash assignment syntax would work perfectly well, and > would fit the existing framework with minimal changes, so the comment approach > feels like over-engineering in comparison. We're not doing bash assignment parsing. Whatever faults you may have w/ my original way of getting EAPI moving forward, I wasn't that stupid at least. Afaik, betelgeuse and myself are the only two in this discussion who have actually tried it (regex hacks in portage do not count, period)- it's a fucking rats nest, and trying to keep it matching bash changing behaviour is pointless maintenance work. Bluntly, it's a stupid, stupid idea and is a fricking hack. Worse, no one has gotten around to outlining the claimed gains; we already can lock eapi readonly if we wanted to for phase execution. Continuing my "this is stupid" thread; exactly how did we get from betelgeuse trying to concentrate pkg metadata to ebuilds (which there is no supporting reason given for beyond a bug he filed in '07), to this rehashing of eapi? Frankly, I don't particularly agree w/ Petteri's original request *anyways*.
(In reply to comment #47) > actually tried it (regex hacks in portage do not count, period)- it's a fucking > rats nest, and trying to keep it matching bash changing behaviour is pointless > maintenance work. The idea is to have PMS define a regex that matches a subset of bash assignment syntax, which would be sufficient for EAPI assignments. If bash extends assignment syntax in the future, we wouldn't have to match that unless we decided that you wanted to support the new syntax for EAPI assignments (unlikely).
(In reply to comment #47) > We're not doing bash assignment parsing. Whatever faults you may have w/ my > original way of getting EAPI moving forward, I wasn't that stupid at least. > Afaik, betelgeuse and myself are the only two in this discussion who have > actually tried it (regex hacks in portage do not count, period)- it's a > fucking rats nest, and trying to keep it matching bash changing behaviour is > pointless maintenance work. Do you really expect that the syntax of the bash assignment statement will change in a way that isn't downward compatible? If it really would, we would have much larger problems. I think it is quite safe if we require a subset of bash 3.2 syntax (of even POSIX shell syntax) here. Of course, if designing everything from scratch, I guess nobody would go for such a solution. We have existing EAPIs though, and AFAICS the following approaches are possible: 1) Parse the bash assignment. I believe that it would work in practice. A sanity check as suggested in comment #18 could be added. It's also the least intrusive solution. Still, it's sort of a hack. 2) Define the EAPI in a special header comment. With the additional bonus that the pattern could be added to sys-apps/file magic and to syntax rules of text editors. The problem is that it's incompatible with existing ebuilds, so one of the following would be required: a) Additionally have an EAPI assignment statement in the ebuild for some transition period. This would mean that the variable cannot be readonly. b) Do a one time change of the file extension.
Please take the discussions about EAPI parsing to the mailing lists where it belongs.
(In reply to comment #50) > Please take the discussions about EAPI parsing to the mailing lists where it > belongs. I think that discussing it here is perfectly fine. pms-bugs@g.o should reach everyone who is closely involved with EAPIs. Once we've reached consensus here (or find out that we don't), a broader discussion can still be started on gentoo-dev, and we can escalate it to the council if necessary. I'd rather avoid the sort of noise that we had with previous discussions in -dev about the issue.
It seems that the real problem is that "PMS doesn't specify what characters are allowed in an EAPI name" so why not just settle that issue? Something like the repo/ keyword names: ^[[:alnum:]]([[:alnum:]_-]*[[:alnum:]])?$ ie "An EAPI name can contain any of [A-Za-z0-9_-] but must not start or end with a hyphen or an underscore," would be my suggestion. I can't really see that it's such a hardship for ebuild authors to use: EAPI=token (no quotes, and no whitespace in the token, which is terminated by whitespace.) It's trivial to scan for, simple to remember and easy to read, as well as being compatible both now and into the future, so long as you state what a token is. Coupled with the sanity-check, that seems pretty watertight (so long as ebuilds set the EAPI, and not eclasses.) The scanner doesn't have to use a regex to verify; it's simple to do in C, so it can be done anywhere. I'll even volunteer to write it in C so you can wrap it in whatever else, if you like. As previously stated, there is no performance hit worth discussing. If you're using the cache (which is the vast majority) there can't be. If you're opening the ebuild anyway, the scan will simply load the file into cache which would already have to be done, and take a trivial amount of time to give a value which can be rejected if the mangler doesn't support it, or compared to the final value (after you've waited an inordinately longer time for the shell to execute.) After all, you *want* that data, right? Having the EAPI in XML would take an awful lot longer as well as being more fragile; having it as a specially parsed comment sounds a lot like a regression: it would need at least the same amount of time, and would require a change in format, along with all the attendant aggravation for no gain. Encoding EAPI in the filename breaks encapsulation amongst other issues, and was shot down a long time ago, after all sides had put their points clearly, thoroughly and forcefully, so let's not rehash that discussion, please. With respect to line number limits, doesn't EAPI have to be set before inherit? In that case, the scanner could just bail on seeing that. (It isn't hard to scan for if you allow quotes and whitespace and other wacky characters. I just don't see the need for it- after all, other stuff is restricted specifically so it's easy to match and use. Is anyone really concerned that such be allowed in an EAPI name?)
(In reply to comment #52) > With respect to line number limits, doesn't EAPI have to be set before > inherit? In that case, the scanner could just bail on seeing that. That seems reasonable, but in order to keep the algorithm as simple as possible, I would suggest to have PMS simply state that the first match of the EAPI assignment pattern found in the first N lines be used, where N is something like 30 lines. If no match is found after N lines, then the EAPI defaults to "0". When designing this search algorithm, we should consider two competing factors: (1) Keep the search algorithm simple as possible. (2) Support the largest possible subset of bash assignment syntax. As we try to strike a balance between these two competing factors, (1) should take precedence over (2), since any EAPI assignments that do not work with our chosen search algorithm can simply be detected and fixed as discussed in comment #18.
(In reply to comment #52) > I can't really see that it's such a hardship for ebuild authors to use: > EAPI=token > > (no quotes, and no whitespace in the token, which is terminated by > whitespace.) And how many ebuilds follow that? Even if you change the entire tree, people will still get it wrong, since it's an arbitrary restriction that doesn't apply to anything else and has no obvious reason for being there.
(In reply to comment #53) > (In reply to comment #52) > > With respect to line number limits, doesn't EAPI have to be set before > > inherit? In that case, the scanner could just bail on seeing that. We don't know how future EAPIs will look like. Maybe inherit syntax will change? So the scanner shouldn't try to be too clever. > That seems reasonable, but in order to keep the algorithm as simple as > possible, I would suggest to have PMS simply state that the first match of > the EAPI assignment pattern found in the first N lines be used, where N is > something like 30 lines. I'd be more strict here, and require the assignment to be within the first 10 lines, or in the first non-comment line, or both. The few deviating ebuilds in the tree (about 10 packages if we require N=10) can be easily fixed, and for the future this can be enforced with repoman. > If no match is found after N lines, then the EAPI defaults to "0". Hm, this would be a retroactive change. One could imagine also a non-retroactive variant, where the result of the scan would be ignored for EAPIs 0 to 4. Would the latter have any disadvantages?
(In reply to comment #55) > > If no match is found after N lines, then the EAPI defaults to "0". > > Hm, this would be a retroactive change. One could imagine also a > non-retroactive variant, where the result of the scan would be ignored for > EAPIs 0 to 4. Would the latter have any disadvantages? The only disadvantage is that is adds an additional case to handle in the code, versus treating all EAPIs the same.
Also, it delays detection of invalid EAPI assignments in ebuilds with EAPI 5 and later. That's somewhat annoying, since you have to spawn bash before you can find out that it has an invalid EAPI 5 assignment hidden somewhere.
(In reply to comment #57) > Also, it delays detection of invalid EAPI assignments in ebuilds with EAPI 5 > and later. That's somewhat annoying, since you have to spawn bash before you > can find out that it has an invalid EAPI 5 assignment hidden somewhere. Actually, there's no disadvantage here, since you have to spawn bash to validate the probed EAPI in any case.
Maybe we should work out the details, so that both proposals can be discussed in gentoo-dev. Proposal 1: - Ebuilds must contain at most one EAPI assignment statement. - It must occur within the first N lines of the ebuild (N=10 and N=30 were suggested). - The statement must match the following regular expression (ERE syntax): ^[ \t]*EAPI=(['"]?)([A-Za-z0-9._+-]*)\1[ \t]*(#.*)?$ - To be decided if this will apply retroactively to all EAPIs or only to EAPI 5 and later. Note: The first and third point are already fulfilled by all ebuilds in the Portage tree. The second point will require few ebuilds to be changed (9 packages for N=10, or 2 packages for N=30). Proposal 2: - The EAPI must be declared in a special comment in the first line of the ebuild's header. - No specific syntax has been suggested yet. How about: "The first line of the ebuild must contain the exact string 'ebuild', followed by whitespace, followed by the EAPI, followed by end-of-line or whitespace." - To be decided if this should be combined with a one time change of the file extension (like .ebuild -> .eb). If not, an EAPI assignment statement in the ebuild is still required, at least for a transition period.
(In reply to comment #54) > > I can't really see that it's such a hardship for ebuild authors to use: > > EAPI=token > > > > (no quotes, and no whitespace in the token, which is terminated by > > whitespace.) > > And how many ebuilds follow that? > > Even if you change the entire tree, people will still get it wrong, since > it's an arbitrary restriction that doesn't apply to anything else and has no > obvious reason for being there. Fine, so we allow quoting. It still makes sense to specify the allowed format of the EAPI name in PMS, just as is done for so many other tokens like keywords or USE flags. After all, the EAPI name defines everything else that can happen, so its omission seems surprising. (In reply to comment #55) > > > With respect to line number limits, doesn't EAPI have to be set before > > > inherit? In that case, the scanner could just bail on seeing that. > > We don't know how future EAPIs will look like. Maybe inherit syntax will > change? So the scanner shouldn't try to be too clever. > Yeah fair enough.
<http://git.overlays.gentoo.org/gitweb/?p=proj/pms.git;a=commit;h=5ef86ba1e5962154db37d2af806e45de6027884b>
So, at the moment it seems, it is impossible to inherit EAPI from an eclass. Any idea fix for that?
(In reply to comment #62) > So, at the moment it seems, it is impossible to inherit EAPI from an eclass. > Any idea fix for that? It is disallowed and we are going to build a guillotine for doing that.
(In reply to comment #62) > So, at the moment it seems, it is impossible to inherit EAPI from an eclass. > Any idea fix for that? You'll have to set it at the top of the ebuild, without exception. If the eclass requires a particular EAPI, then inside the eclass, you can check if the ebuild has set an appropriate EAPI and die if it has not.
In mpd overlay, all the plugins inherit EAPI from gmpc-plugin.eclass, not within the ebuild itself. In gmpc-plugin.eclass first line is: EAPI=2. So, what I understand here is; I should add the EAPI directly to the ebuilds and remove it from the gmpc-plugin.eclass not to define EAPI twice. Is there anything else to do? If not, I'll edit the ebuilds in overlay.
(In reply to comment #65) > In mpd overlay, all the plugins inherit EAPI from gmpc-plugin.eclass, not > within the ebuild itself. In gmpc-plugin.eclass first line is: > EAPI=2. Such usage was never allowed, see: <http://devmanual.gentoo.org/ebuild-writing/eapi/#usage-of-eapis> (Basically, the behaviour of the "inherit" command in future EAPIs cannot be taken for granted. So the EAPI must be defined before.) > So, what I understand here is; I should add the EAPI directly to the ebuilds > and remove it from the gmpc-plugin.eclass not to define EAPI twice. Exactly.
Thanks guys. I updated the mpd overlay.