Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 402167 - Parse ebuilds for the EAPI assignment
Summary: Parse ebuilds for the EAPI assignment
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: PMS/EAPI (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: PMS/EAPI
URL:
Whiteboard:
Keywords:
Depends on: 411875
Blocks: 409383
  Show dependency tree
 
Reported: 2012-02-04 15:19 UTC by Petteri Räty (RETIRED)
Modified: 2012-05-13 11:14 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petteri Räty (RETIRED) gentoo-dev 2012-02-04 15:19:36 UTC
Following bug 186454 let's concentrate all pkg level metadata to individual ebuilds.
Comment 1 Arfrever Frehtes Taifersar Arahesis 2012-02-04 15:40:57 UTC
Many eclasses set HOMEPAGE. How do you suggest to handle it?
Comment 2 Ulrich Müller gentoo-dev 2012-02-05 09:34:37 UTC
Also many ebuilds reference the variable, most often for elog messages like "for documentation see ${HOMEPAGE}", or "download ${A} from ${HOMEPAGE}" for fetch restricted packages. Some (e.g. sys-apps/portage) even use it as constant at compile time.

Maybe we should move metadata from XML to something that can be sourced (by the package manager) into the ebuild environment?
Comment 3 Ciaran McCreesh 2012-02-05 13:24:40 UTC
Per package eclasses!
Comment 4 Zac Medico gentoo-dev 2012-02-05 13:43:58 UTC
(In reply to comment #2)
> Maybe we should move metadata from XML to something that can be sourced (by the
> package manager) into the ebuild environment?

Well, if we can assume that the package manager has at least a minimal XML parser, the package manager can simply parse metadata.xml and export relevant variables to the ebuild environment.

(In reply to comment #3)
> Per package eclasses!

That might be overkill, since we don't necessarily need full bash syntax support. For example, a format like make.defaults might serve the purpose well.
Comment 5 Ciaran McCreesh 2012-02-05 13:49:53 UTC
Assuming the package manager can read XML means that when it can't (e.g. due to libxml2 getting broken again), the package manager isn't usable...
Comment 6 Zac Medico gentoo-dev 2012-02-05 14:35:17 UTC
(In reply to comment #5)
> Assuming the package manager can read XML means that when it can't (e.g. due to
> libxml2 getting broken again), the package manager isn't usable...

So, if we went that route, then package managers that rely on libraries like expat or libxml2 would be forced to have an xml parser bundled or statically linked. Portage falls into this category, since python's xml libraries are implemented on top of expat.
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-02-05 14:37:03 UTC
Does this make a real difference from how metadata.xml is handled currently?
Comment 8 Ciaran McCreesh 2012-02-05 14:42:07 UTC
Stuff in XML is generally a pain in the ass to work with. Plus that whole "restrict" thing is just silly. Two mechanisms for essentially the same thing was never a good idea...
Comment 9 Zac Medico gentoo-dev 2012-02-22 19:54:44 UTC
Maybe migrate from XML to JSON?
Comment 10 Ciaran McCreesh 2012-02-22 20:09:13 UTC
JSON is just like XML except with even less sanity checking.
Comment 11 Zac Medico gentoo-dev 2012-02-24 20:33:02 UTC
I mentioned JSON because portage can rely on the json module that's included with Python, which is not vulnerable to the kind of breakage that Python's xml libraries are.

Going back to my suggestion from comment #4 regarding a format like make.defaults, how about if we add a metadata.env file that only contains shell-like variable assignments and no executable code? The package manager will automatically source this file before the ebuild, if the ebuild's EAPI supports it. Therefore, the package manager will have to parse the EAPI assignment from the head of the ebuild, in order to decide whether or not the ebuild's EAPI supports metadata.env.
Comment 12 Ciaran McCreesh 2012-02-24 20:38:28 UTC
You can't add global-scope-changing behaviour, and ebuilds can't be "parsed". We already had this discussion.
Comment 13 Zac Medico gentoo-dev 2012-02-24 20:48:06 UTC
(In reply to comment #12)
> You can't add global-scope-changing behaviour, and ebuilds can't be "parsed".
> We already had this discussion.

Seems quite feasible in practice, but I'd rather not argue about it.
Comment 14 Ciaran McCreesh 2012-02-24 20:54:53 UTC
The problem with "quite feasible" is that it means "usually works", which in turn means that a) sometimes it doesn't work, and b) when it doesn't work, it doesn't work *badly*, because everyone assumes that it always does work. A large number of existing design screw-ups were caused that philosophy; we should be looking at fixing those, not adding even more.
Comment 15 Zac Medico gentoo-dev 2012-02-25 03:30:48 UTC
(In reply to comment #12)
> ebuilds can't be "parsed".

For the purpose of parsing the EAPI assignment, I'd suggest that PMS retroactively require that the assignment be in the first N lines, where N is something like 30 or so. Also, PMS should specify a regular expression that all package managers can use to parse the assignment.
Comment 16 Ciaran McCreesh 2012-02-25 13:48:46 UTC
...or we could solve the problem properly, and not break stuff.
Comment 17 Ulrich Müller gentoo-dev 2012-02-25 19:01:53 UTC
(In reply to comment #15)
> For the purpose of parsing the EAPI assignment, I'd suggest that PMS
> retroactively require that the assignment be in the first N lines, where N
> is something like 30 or so. Also, PMS should specify a regular expression
> that all package managers can use to parse the assignment.

Such a solution could work, but it doesn't look very elegant. (We would have to require that the EAPI determined by parsing and the one determined by processing the ebuild with bash are identical, i.e. abort with an error if they're different.)

I'd rather add a new command that would source metadata.env. Or maybe extend the inherit command, such that metadata.env would be implicitly sourced when the first inherit command is encountered.
Comment 18 Zac Medico gentoo-dev 2012-02-25 23:15:28 UTC
(In reply to comment #17)
> Such a solution could work, but it doesn't look very elegant. (We would have to
> require that the EAPI determined by parsing and the one determined by
> processing the ebuild with bash are identical, i.e. abort with an error if
> they're different.)

The fact that we can compare the probed EAPI to the actual EAPI variable after the ebuild is sourced seems like a perfect sanity check. We could easily detect inconsistencies and flag such ebuilds as invalid, providing a reliable feedback mechanism to ebuild developers.

> I'd rather add a new command that would source metadata.env. Or maybe extend
> the inherit command, such that metadata.env would be implicitly sourced when
> the first inherit command is encountered.

Now what your describing here sounds less elegant to me. Being able to reliably probe the EAPI has numerous advantages, one of the most important being that a package manager can detect that a particular ebuild's EAPI is unsupported without the need to incur the performance penalty of a bash process.
Comment 19 Ciaran McCreesh 2012-02-25 23:20:52 UTC
GLEP 55 gives you every advantage you ask for, plus an even larger performance improvement, and it also provides the *reliably* part.
Comment 20 Zac Medico gentoo-dev 2012-02-25 23:26:06 UTC
(In reply to comment #19)
> GLEP 55 gives you every advantage you ask for, plus an even larger performance
> improvement, and it also provides the *reliably* part.

There are advantages and disadvantages to both approaches, but the performance difference is negligible, especially if you consider that package managers typically use cached metadata to obtain the EAPI.
Comment 21 Ciaran McCreesh 2012-02-25 23:29:33 UTC
Uh, the directory read on the ebuild directory comes *before* checking cache. But even ignoring the performance gain, GLEP 55 is still the only technically sound solution.
Comment 22 Zac Medico gentoo-dev 2012-02-25 23:33:10 UTC
I think the probing approach described in comment #15 is pretty sound, especially if you implement the sanity check discussed in comment #18. So, I don't think it's accurate to say that GLEP 55 is the only sound solution.
Comment 23 Ciaran McCreesh 2012-02-25 23:43:26 UTC
"pretty sound" is not "sound", "mostly works" is not "works", "pragmatic" is not "correct", and a system that is built upon a composite of things that are "pretty sound", "mostly working" and "pragmatic" does not itself "mostly work".
Comment 24 Zac Medico gentoo-dev 2012-02-25 23:46:06 UTC
(In reply to comment #23)
> "pretty sound" is not "sound", "mostly works" is not "works", "pragmatic" is
> not "correct", and a system that is built upon a composite of things that are
> "pretty sound", "mostly working" and "pragmatic" does not itself "mostly work".

You, sir, are a master of FUD. /me bows
Comment 25 Ciaran McCreesh 2012-02-25 23:53:22 UTC
And you are a master of nailing more bits onto the side of a slowly collapsing building in such a way that usually only a few people die from falling debris each time.
Comment 26 Zac Medico gentoo-dev 2012-02-26 00:05:15 UTC
Regardless of whether or not we ever migrate to a file name based system like GLEP 55, I think that for existing EAPIs, it would be useful for PMS to require that ebuilds support an EAPI probing approach like the one described in comment #15.
Comment 27 Ciaran McCreesh 2012-02-26 00:14:59 UTC
You mean you want us to put in another hack so we can postpone fixing it properly for even longer?
Comment 28 Zac Medico gentoo-dev 2012-02-26 00:17:01 UTC
I would consider the inclusion of EAPI probing in PMS to be a useful enhancement to support existing EAPIs.
Comment 29 Ciaran McCreesh 2012-02-26 00:20:35 UTC
But it's not safe, and it doesn't solve the global scope problem.
Comment 30 Zac Medico gentoo-dev 2012-02-26 00:30:28 UTC
EAPI probing is quite safe if you implement a feedback mechanism like the one discussed in comment #18. It enables use of EAPI to change the global scope environment that's used to source ebuilds, as long as the package manager supports EAPI probing. Therefore, global scope changes should not be made until EAPI probing support has been deployed for a reasonable amount of time.
Comment 31 Ciaran McCreesh 2012-02-26 00:33:05 UTC
...so in other words it's like GLEP 55, except less powerful, less reliable, with a long delay before introduction, and without future proofing.
Comment 32 Zac Medico gentoo-dev 2012-02-26 00:37:47 UTC
EAPI probing would enhance support for existing EAPIs, and would not interfere with the implementation of GLEP 55, so they should not be considered as mutually exclusive proposals.
Comment 33 Ulrich Müller gentoo-dev 2012-02-26 09:14:38 UTC
It was inevitable that someone would mention GLEP 55. However, it was voted down: <http://www.gentoo.org/proj/en/council/meeting-logs/20100823-summary.txt>. In the same meeting, the council also requested that a solution for the issues raised by GLEP 55 should be found.

The method proposed in comment #15 and comment #18 could work in practice, and it shouldn't be difficult to verify if it works for all ebuilds in the tree.

What I don't like about it is that each ebuilds would have to be read twice which might bring some performance penalty. I remember that long time ago we've been promised some benchmarks: <http://www.gentoo.org/proj/en/council/meeting-logs/20090326-summary.txt> What has happened to them?
Comment 34 Zac Medico gentoo-dev 2012-02-26 09:38:05 UTC
(In reply to comment #33)
> What I don't like about it is that each ebuilds would have to be read twice
> which might bring some performance penalty.

There's zero difference for the vast majority of accesses, since the EAPI is accessible via the metadata cache. So, there's not much point in doing a benchmark for that case.

For the case where the ebuild actually needs to be opened twice, like when running emerge --regen or egencache, the buffer cache should make the difference negligible. So, there's not much point in benchmarking that case either.
Comment 35 Ciaran McCreesh 2012-02-26 12:18:19 UTC
(In reply to comment #33)
> It was inevitable that someone would mention GLEP 55. However, it was voted
> down:

Then perhaps it's time for the Council to reconsider, given that GLEP 55 is clearly still needed.
Comment 36 Ulrich Müller gentoo-dev 2012-02-26 15:21:17 UTC
Sorry that I inject some facts into the discussion. ;) I've scanned the gentoo-x86 tree using this (GNU sed) regexp:

  ^[ \t]*EAPI=\([0-9]\+\|"[0-9]\+"\|'[0-9]\+'\)[ \t]*\(#.*\)\?$

(Using [0-9] for now, because PMS doesn't specify what characters are allowed in an EAPI name.) The EAPI determined this way agrees with the one in the metadata cache, except for the following three ebuilds:

   dev-ml/bin-prot-2.0.3
   www-servers/apache-2.2.21-r1
   www-servers/apache-2.2.21-r2

The first one defines EAPI twice, and the other two fail because apache-2.eclass defines EAPI=2.
Comment 37 David Leverton 2012-02-26 16:24:09 UTC
While I still thing GLEP 55 is the best solution, as a second choice having a specially formatted comment seems more elegant than imposing restrictions on the syntax of the actual bash code.  It would also mean that the EAPI variable could be made read-only before sourcing the ebuild, eliminating the concern about it being changed inconsistently with the parsed value.  There's no need to change existing EAPIs - anything without the comment could be handled the current way.

The only real disadvantage that I can see is that it would require a delay before actually being used, since current PM versions would assume EAPI 0 for such ebuilds.  During the GLEP 55 discussions some people suggested that they'd be OK with a one-time extension change to introduce this sort of thing, so that's a possible solution.
Comment 38 Ulrich Müller gentoo-dev 2012-02-26 18:53:41 UTC
(In reply to comment #37)
> The only real disadvantage that I can see is that it would require a delay
> before actually being used, since current PM versions would assume EAPI 0
> for such ebuilds.

Not necessarily. We could still require that ebuilds assign the EAPI in the traditional way, at least for a transition period. This would imply that the variable cannot be made readonly, at least not immediately.

> During the GLEP 55 discussions some people suggested that they'd be OK with
> a one-time extension change to introduce this sort of thing, so that's a
> possible solution.

You mean the *.eb that was proposed previously? I for my part wouldn't be opposed against it. (But was there consensus about it? I don't remember.)
Comment 39 David Leverton 2012-02-26 20:34:52 UTC
(In reply to comment #38)
> Not necessarily. We could still require that ebuilds assign the EAPI in the
> traditional way, at least for a transition period. This would imply that the
> variable cannot be made readonly, at least not immediately.

That could work, but it does complicate things.

> You mean the *.eb that was proposed previously? I for my part wouldn't be
> opposed against it. (But was there consensus about it? I don't remember.)

I'm not fussy about the exact extension, but I believe that was suggested, yes.
Comment 40 Zac Medico gentoo-dev 2012-02-26 23:50:11 UTC
(In reply to comment #37)
> While I still thing GLEP 55 is the best solution, as a second choice having a
> specially formatted comment seems more elegant than imposing restrictions on
> the syntax of the actual bash code.

This feels hair splitting, but one could also say that it seems more elegant simply make use of existing bash assignment syntax than to embed metadata inside comments.
Comment 41 Ciaran McCreesh 2012-02-27 07:27:13 UTC
Have you any idea how hard it is to parse "bash assignment syntax"?
Comment 42 Zac Medico gentoo-dev 2012-02-27 07:36:45 UTC
(In reply to comment #41)
> Have you any idea how hard it is to parse "bash assignment syntax"?

The idea is to have PMS specify a regular expression that the EAPI assignment is required to match, any ebuilds that do not match will be flagged as invalid, using the feedback mechanism discussed in comment #18.
Comment 43 Ciaran McCreesh 2012-02-27 07:44:00 UTC
...so not "bash assignment syntax" at all then. You want to introduce a different syntax, just for one variable, whose rules apply only to that variable and not anywhere else.
Comment 44 Zac Medico gentoo-dev 2012-02-27 07:46:01 UTC
(In reply to comment #43)
> ...so not "bash assignment syntax" at all then.

It would be a subset of bash assignment. Something like the expression shown in comment #36. The 3 ebuilds that didn't match would be automatically flagged as invalid, and would have to be fixed.
Comment 45 David Leverton 2012-02-27 20:30:11 UTC
(In reply to comment #40)
> This feels hair splitting, but one could also say that it seems more elegant
> simply make use of existing bash assignment syntax than to embed metadata
> inside comments.

Metadata in comments means you don't have to contort yourself around the rules of the host language - since it's ignored by the host, you can define it however you want (as long as you avoid the end-of-comment marker of course)
Comment 46 Zac Medico gentoo-dev 2012-02-27 21:41:00 UTC
(In reply to comment #45)
> (In reply to comment #40)
> > This feels hair splitting, but one could also say that it seems more elegant
> > simply make use of existing bash assignment syntax than to embed metadata
> > inside comments.
> 
> Metadata in comments means you don't have to contort yourself around the rules
> of the host language - since it's ignored by the host, you can define it
> however you want (as long as you avoid the end-of-comment marker of course)

Sure, but a subset of bash assignment syntax would work perfectly well, and would fit the existing framework with minimal changes, so the comment approach feels like over-engineering in comparison.
Comment 47 Brian Harring (RETIRED) gentoo-dev 2012-02-27 23:16:32 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > (In reply to comment #40)
> > > This feels hair splitting, but one could also say that it seems more elegant
> > > simply make use of existing bash assignment syntax than to embed metadata
> > > inside comments.
> > 
> > Metadata in comments means you don't have to contort yourself around the rules
> > of the host language - since it's ignored by the host, you can define it
> > however you want (as long as you avoid the end-of-comment marker of course)
> 
> Sure, but a subset of bash assignment syntax would work perfectly well, and
> would fit the existing framework with minimal changes, so the comment approach
> feels like over-engineering in comparison.

We're not doing bash assignment parsing.  Whatever faults you may have w/ my original way of getting EAPI moving forward, I wasn't that stupid at least.  Afaik, betelgeuse and myself are the only two in this discussion who have actually tried it (regex hacks in portage do not count, period)- it's a fucking rats nest, and trying to keep it matching bash changing behaviour is pointless maintenance work.

Bluntly, it's a stupid, stupid idea and is a fricking hack.  Worse, no one has gotten around to outlining the claimed gains; we already can lock eapi readonly if we wanted to for phase execution.

Continuing my "this is stupid" thread; exactly how did we get from betelgeuse trying to concentrate pkg metadata to ebuilds (which there is no supporting reason given for beyond a bug he filed in '07), to this rehashing of eapi?

Frankly, I don't particularly agree w/ Petteri's original request *anyways*.
Comment 48 Zac Medico gentoo-dev 2012-02-27 23:24:00 UTC
(In reply to comment #47)
> actually tried it (regex hacks in portage do not count, period)- it's a fucking
> rats nest, and trying to keep it matching bash changing behaviour is pointless
> maintenance work.

The idea is to have PMS define a regex that matches a subset of bash assignment syntax, which would be sufficient for EAPI assignments. If bash extends assignment syntax in the future, we wouldn't have to match that unless we decided that you wanted to support the new syntax for EAPI assignments (unlikely).
Comment 49 Ulrich Müller gentoo-dev 2012-02-28 08:13:36 UTC
(In reply to comment #47)
> We're not doing bash assignment parsing.  Whatever faults you may have w/ my
> original way of getting EAPI moving forward, I wasn't that stupid at least. 
> Afaik, betelgeuse and myself are the only two in this discussion who have
> actually tried it (regex hacks in portage do not count, period)- it's a
> fucking rats nest, and trying to keep it matching bash changing behaviour is
> pointless maintenance work.

Do you really expect that the syntax of the bash assignment statement will change in a way that isn't downward compatible? If it really would, we would have much larger problems. I think it is quite safe if we require a subset of bash 3.2 syntax (of even POSIX shell syntax) here.

Of course, if designing everything from scratch, I guess nobody would go for such a solution. We have existing EAPIs though, and AFAICS the following approaches are possible:

1) Parse the bash assignment. I believe that it would work in practice.
   A sanity check as suggested in comment #18 could be added. It's also the
   least intrusive solution. Still, it's sort of a hack.

2) Define the EAPI in a special header comment. With the additional bonus that
   the pattern could be added to sys-apps/file magic and to syntax rules of
   text editors. The problem is that it's incompatible with existing ebuilds,
   so one of the following would be required:
   a) Additionally have an EAPI assignment statement in the ebuild for some
      transition period. This would mean that the variable cannot be readonly.
   b) Do a one time change of the file extension.
Comment 50 Petteri Räty (RETIRED) gentoo-dev 2012-02-28 09:41:02 UTC
Please take the discussions about EAPI parsing to the mailing lists where it belongs.
Comment 51 Ulrich Müller gentoo-dev 2012-02-28 13:10:10 UTC
(In reply to comment #50)
> Please take the discussions about EAPI parsing to the mailing lists where it
> belongs.

I think that discussing it here is perfectly fine. pms-bugs@g.o should reach everyone who is closely involved with EAPIs. Once we've reached consensus here (or find out that we don't), a broader discussion can still be started on gentoo-dev, and we can escalate it to the council if necessary. I'd rather avoid the sort of noise that we had with previous discussions in -dev about the issue.
Comment 52 Steve L 2012-02-28 19:19:20 UTC
It seems that the real problem is that "PMS doesn't specify what characters are allowed in an EAPI name" so why not just settle that issue?
Something like the repo/ keyword names:
^[[:alnum:]]([[:alnum:]_-]*[[:alnum:]])?$
ie "An EAPI name can contain any of [A-Za-z0-9_-] but must not start or end with a hyphen or an underscore," would be my suggestion.

I can't really see that it's such a hardship for ebuild authors to use:
EAPI=token

(no quotes, and no whitespace in the token, which is terminated by whitespace.)

It's trivial to scan for, simple to remember and easy to read, as well as being compatible both now and into the future, so long as you state what a token is.
Coupled with the sanity-check, that seems pretty watertight (so long as ebuilds set the EAPI, and not eclasses.)

The scanner doesn't have to use a regex to verify; it's simple to do in C, so it can be done anywhere. I'll even volunteer to write it in C so you can wrap it in whatever else, if you like.

As previously stated, there is no performance hit worth discussing. If you're using the cache (which is the vast majority) there can't be. If you're opening the ebuild anyway, the scan will simply load the file into cache which would already have to be done, and take a trivial amount of time to give a value which can be rejected if the mangler doesn't support it, or compared to the final value (after you've waited an inordinately longer time for the shell to execute.) After all, you *want* that data, right?

Having the EAPI in XML would take an awful lot longer as well as being more fragile; having it as a specially parsed comment sounds a lot like a regression: it would need at least the same amount of time, and would require a change in format, along with all the attendant aggravation for no gain. Encoding EAPI in the filename breaks encapsulation amongst other issues, and was shot down a long time ago, after all sides had put their points clearly, thoroughly and forcefully, so let's not rehash that discussion, please.

With respect to line number limits, doesn't EAPI have to be set before inherit? In that case, the scanner could just bail on seeing that.

(It isn't hard to scan for if you allow quotes and whitespace and other wacky characters. I just don't see the need for it- after all, other stuff is restricted specifically so it's easy to match and use. Is anyone really concerned that such be allowed in an EAPI name?)
Comment 53 Zac Medico gentoo-dev 2012-02-29 03:32:03 UTC
(In reply to comment #52)
> With respect to line number limits, doesn't EAPI have to be set before
> inherit? In that case, the scanner could just bail on seeing that.

That seems reasonable, but in order to keep the algorithm as simple as possible, I would suggest to have PMS simply state that the first match of the EAPI assignment pattern found in the first N lines be used, where N is something like 30 lines. If no match is found after N lines, then the EAPI defaults to "0".

When designing this search algorithm, we should consider two competing factors:

  (1) Keep the search algorithm simple as possible.
  (2) Support the largest possible subset of bash assignment syntax.

As we try to strike a balance between these two competing factors, (1) should take precedence over (2), since any EAPI assignments that do not work with our chosen search algorithm can simply be detected and fixed as discussed in comment #18.
Comment 54 Ciaran McCreesh 2012-02-29 07:32:39 UTC
(In reply to comment #52)
> I can't really see that it's such a hardship for ebuild authors to use:
> EAPI=token
> 
> (no quotes, and no whitespace in the token, which is terminated by
> whitespace.)

And how many ebuilds follow that?

Even if you change the entire tree, people will still get it wrong, since it's an arbitrary restriction that doesn't apply to anything else and has no obvious reason for being there.
Comment 55 Ulrich Müller gentoo-dev 2012-02-29 10:00:56 UTC
(In reply to comment #53)
> (In reply to comment #52)
> > With respect to line number limits, doesn't EAPI have to be set before
> > inherit? In that case, the scanner could just bail on seeing that.

We don't know how future EAPIs will look like. Maybe inherit syntax will change? So the scanner shouldn't try to be too clever.

> That seems reasonable, but in order to keep the algorithm as simple as
> possible, I would suggest to have PMS simply state that the first match of
> the EAPI assignment pattern found in the first N lines be used, where N is
> something like 30 lines.

I'd be more strict here, and require the assignment to be within the first 10 lines, or in the first non-comment line, or both. The few deviating ebuilds in the tree (about 10 packages if we require N=10) can be easily fixed, and for the future this can be enforced with repoman.

> If no match is found after N lines, then the EAPI defaults to "0".

Hm, this would be a retroactive change. One could imagine also a non-retroactive variant, where the result of the scan would be ignored for EAPIs 0 to 4. Would the latter have any disadvantages?
Comment 56 Zac Medico gentoo-dev 2012-02-29 10:10:17 UTC
(In reply to comment #55)
> > If no match is found after N lines, then the EAPI defaults to "0".
> 
> Hm, this would be a retroactive change. One could imagine also a
> non-retroactive variant, where the result of the scan would be ignored for
> EAPIs 0 to 4. Would the latter have any disadvantages?

The only disadvantage is that is adds an additional case to handle in the code, versus treating all EAPIs the same.
Comment 57 Zac Medico gentoo-dev 2012-02-29 10:17:15 UTC
Also, it delays detection of invalid EAPI assignments in ebuilds with EAPI 5 and later. That's somewhat annoying, since you have to spawn bash before you can find out that it has an invalid EAPI 5 assignment hidden somewhere.
Comment 58 Zac Medico gentoo-dev 2012-02-29 12:09:15 UTC
(In reply to comment #57)
> Also, it delays detection of invalid EAPI assignments in ebuilds with EAPI 5
> and later. That's somewhat annoying, since you have to spawn bash before you
> can find out that it has an invalid EAPI 5 assignment hidden somewhere.

Actually, there's no disadvantage here, since you have to spawn bash to validate the probed EAPI in any case.
Comment 59 Ulrich Müller gentoo-dev 2012-03-01 08:21:44 UTC
Maybe we should work out the details, so that both proposals can be discussed in gentoo-dev.

Proposal 1:
- Ebuilds must contain at most one EAPI assignment statement.
- It must occur within the first N lines of the ebuild (N=10 and N=30 were
  suggested).
- The statement must match the following regular expression (ERE syntax):
  ^[ \t]*EAPI=(['"]?)([A-Za-z0-9._+-]*)\1[ \t]*(#.*)?$
- To be decided if this will apply retroactively to all EAPIs or only to EAPI 5
  and later.
Note: The first and third point are already fulfilled by all ebuilds in the Portage tree. The second point will require few ebuilds to be changed (9 packages for N=10, or 2 packages for N=30).

Proposal 2:
- The EAPI must be declared in a special comment in the first line of the
  ebuild's header.
- No specific syntax has been suggested yet. How about: "The first line of the
  ebuild must contain the exact string 'ebuild', followed by whitespace,
  followed by the EAPI, followed by end-of-line or whitespace."
- To be decided if this should be combined with a one time change of the file
  extension (like .ebuild -> .eb). If not, an EAPI assignment statement in the
  ebuild is still required, at least for a transition period.
Comment 60 Steve L 2012-03-18 21:50:29 UTC
(In reply to comment #54)
> > I can't really see that it's such a hardship for ebuild authors to use:
> > EAPI=token
> > 
> > (no quotes, and no whitespace in the token, which is terminated by
> > whitespace.)
> 
> And how many ebuilds follow that?
> 
> Even if you change the entire tree, people will still get it wrong, since
> it's an arbitrary restriction that doesn't apply to anything else and has no
> obvious reason for being there.

Fine, so we allow quoting. It still makes sense to specify the allowed format of the EAPI name in PMS, just as is done for so many other tokens like keywords or USE flags. After all, the EAPI name defines everything else that can happen, so its omission seems surprising.

(In reply to comment #55)
> > > With respect to line number limits, doesn't EAPI have to be set before
> > > inherit? In that case, the scanner could just bail on seeing that.
> 
> We don't know how future EAPIs will look like. Maybe inherit syntax will
> change? So the scanner shouldn't try to be too clever.
>
Yeah fair enough.
Comment 62 keenblade 2012-05-13 07:11:00 UTC
So, at the moment it seems, it is impossible to inherit EAPI from an eclass. Any idea fix for that?
Comment 63 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-05-13 07:15:18 UTC
(In reply to comment #62)
> So, at the moment it seems, it is impossible to inherit EAPI from an eclass.
> Any idea fix for that?

It is disallowed and we are going to build a guillotine for doing that.
Comment 64 Zac Medico gentoo-dev 2012-05-13 07:19:01 UTC
(In reply to comment #62)
> So, at the moment it seems, it is impossible to inherit EAPI from an eclass.
> Any idea fix for that?

You'll have to set it at the top of the ebuild, without exception. If the eclass requires a particular EAPI, then inside the eclass, you can check if the ebuild has set an appropriate EAPI and die if it has not.
Comment 65 keenblade 2012-05-13 08:57:00 UTC
In mpd overlay, all the plugins inherit EAPI from gmpc-plugin.eclass, not within the ebuild itself. In gmpc-plugin.eclass first line is:
EAPI=2.
So, what I understand here is; I should add the EAPI directly to the ebuilds and remove it from the gmpc-plugin.eclass not to define EAPI twice. 
Is there anything else to do? If not, I'll edit the ebuilds in overlay.
Comment 66 Ulrich Müller gentoo-dev 2012-05-13 09:03:52 UTC
(In reply to comment #65)
> In mpd overlay, all the plugins inherit EAPI from gmpc-plugin.eclass, not
> within the ebuild itself. In gmpc-plugin.eclass first line is:
> EAPI=2.

Such usage was never allowed, see:
<http://devmanual.gentoo.org/ebuild-writing/eapi/#usage-of-eapis>

(Basically, the behaviour of the "inherit" command in future EAPIs cannot be taken for granted. So the EAPI must be defined before.)

> So, what I understand here is; I should add the EAPI directly to the ebuilds
> and remove it from the gmpc-plugin.eclass not to define EAPI twice.

Exactly.
Comment 67 keenblade 2012-05-13 11:14:06 UTC
Thanks guys. I updated the mpd overlay.