Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 526456

Summary: [Future EAPI] Introduce a more flexible yet simpler version syntax
Product: Gentoo Hosted Projects Reporter: Michał Górny <mgorny>
Component: PMS/EAPIAssignee: PMS/EAPI <pms>
Status: CONFIRMED ---    
Severity: normal CC: dev-portage, esigra, hasufell, tsmksubc
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 174380    

Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-22 10:27:45 UTC
Our current rules for ebuild versions are... well, bad. The rules are complex yet pretty limited, requiring additional substitutions and conversions. The version comparison is outlined using 7 (!) algorithms, mostly because the string tries hard to invent specific solutions to a few different issues.

I would like to request replacing it with the syntax and comparison algorithm used by pkg-config/rpm [1]. More specifically:


Syntax: any alphanumeric characters + separators, incl. tilde '~'. for consistency, we may limit it to the same characters as for package names, and require starting with alphanum.

Comparison algorithm:

1. split version number into groups of adjacent letters, groups of adjacent digits and tildes, discarding remaining separators:

  '1.2.3~b4' -> ('1', '2', '3', '~', 'b', 4)
  '5.052b17' -> ('5', '052', 'b', '17')

2. parse all digit groups as decimal numbers:

  ('1', '2', '3', '~', 'b', 4) -> (1, 2, 3, '~', 'b', 4)
  ('5', '052', 'b', '17') -> (5, 52, 'b', 17)

3. to simplify the explanation, let's add end-of-array '$' mark here:

  (1, 2, 3, '~', 'b', 4, $)
  (5, 52, 'b', 17, $)

4. compare left-to-right until one of the components is compared unequal:

4a. two numbers are compared using numeric comparison,

4b. two strings are compared using lexical comparison,

4c. two tildas are considered equal, and two ends-of-array as well,

4d. two different types of groups are compared as following: number > string > end-of-array > '~'.

Explanation: 1.1.1 > 1.1b > 1.1 > 1.1~a (alpha pre-release).


The new syntax is simpler yet more flexible. It can also properly handle existing cases with small modifications:

  1.1_alphaN -> 1.1~alphaN
  1.1_betaN -> 1.1~betaN
  1.1_preN -> 1.1~preN
  1.1_rcN -> 1.1~rcN

without having to store specific rules about whether '_pre' or '_rc' is newer. Since alphas can be named upstream-alike -> '1.1~aN' and '1.1~bN' instead of '1.1_alphaN' and '1.1_betaN', the version substitutions can be made simpler as well.


I've talked to Zac about this and he believes that implementing this in a new EAPI is possible. Since the package manager needs to parse EAPI before attempting to use a package, having EAPI-dependent version scheme shouldn't be a big issue. The new version scheme is flexible enough to allow 'converting' old version numbers to the new scheme for comparison whenever necessary, using the algorithm outlined above for _alpha .. _rc suffixes.


[1]:http://rpm.org/gitweb?p=rpm.git;a=blob;f=lib/rpmvercmp.c;h=b3d08faa4a31354d821cb259106b182410e452fb;hb=HEAD#l16
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-22 10:41:53 UTC
A bit more clarification to string splitting:

1. 'separators' can't include '-' to allow clear splitting between PN and PV (sorry, Debian),

2. version must start with a digit (to allow clear distinction between '-rN' revision),

3. we remove the dumb 'PN can't end with something looking like a version' rule which is dumb because it's a dumb rule. We can clearly distinguish which contexts take PN, and which take PF (PV), and in the latter case we can right-split on '-' to split version. No point in making things hard for the sake of it.
Comment 2 Ulrich Müller gentoo-dev 2014-10-22 11:05:23 UTC
Given the nasty surprise we just have seen in bug 526234, I suggest that we don't use any characters outside of the POSIX portable filename character set in our version syntax. That is, alphanumeric, ".", "-", and "_".

(In reply to Michał Górny from comment #0)
> digits and tildes, discarding remaining separators:
> 
>   '1.2.3~b4' -> ('1', '2', '3', '~', 'b', 4)
>   '5.052b17' -> ('5', '052', 'b', '17')
> 
> 2. parse all digit groups as decimal numbers:
> 
>   ('1', '2', '3', '~', 'b', 4) -> (1, 2, 3, '~', 'b', 4)
>   ('5', '052', 'b', '17') -> (5, 52, 'b', 17)

So"5.052b17" would be equal to "5.52b17", and "1.01" equal to "1.1"?

> 4b. two strings are compared using lexical comparison,

Do I get this right, that would imply:

   1.0b < 1.0_beta1   
   1.0_p1 < 1.0_pre1
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-22 11:47:30 UTC
(In reply to Ulrich Müller from comment #2)
> Given the nasty surprise we just have seen in bug 526234, I suggest that we
> don't use any characters outside of the POSIX portable filename character
> set in our version syntax. That is, alphanumeric, ".", "-", and "_".

This kills the whole proposal since '~' is needed to mean 'pre-release'. I don't really feel like we should limit ourselves based on incompetency of bash upstream.

> (In reply to Michał Górny from comment #0)
> > digits and tildes, discarding remaining separators:
> > 
> >   '1.2.3~b4' -> ('1', '2', '3', '~', 'b', 4)
> >   '5.052b17' -> ('5', '052', 'b', '17')
> > 
> > 2. parse all digit groups as decimal numbers:
> > 
> >   ('1', '2', '3', '~', 'b', 4) -> (1, 2, 3, '~', 'b', 4)
> >   ('5', '052', 'b', '17') -> (5, 52, 'b', 17)
> 
> So"5.052b17" would be equal to "5.52b17", and "1.01" equal to "1.1"?

Yes. Unless we want to compare zero-prefixed as strings, I'm fine either way. However, it makes the rules more complex, and comes back when comparing '1.10' to '1.010'...

> > 4b. two strings are compared using lexical comparison,
> 
> Do I get this right, that would imply:
> 
>    1.0b < 1.0_beta1   
>    1.0_p1 < 1.0_pre1

Yes. Because you are supposed to use '1.0~beta1' and '1.0~pre1'.
Comment 4 Ulrich Müller gentoo-dev 2014-10-22 12:29:41 UTC
(In reply to Michał Górny from comment #3)
> This kills the whole proposal since '~' is needed to mean 'pre-release'.
> I don't really feel like we should limit ourselves based on incompetency
> of bash upstream.

This is not only about bash. Sticking to the "portable character set" pretty much guarantees that things will work on all target systems and with all tools.

> > So"5.052b17" would be equal to "5.52b17", and "1.01" equal to "1.1"?
> 
> Yes. Unless we want to compare zero-prefixed as strings, I'm fine either
> way. However, it makes the rules more complex, and comes back when comparing
> '1.10' to '1.010'...

I'd rather see our existing ambiguities resolved (like 1.01 being considered equal to 1.010) instead of new ones introduced.

> > Do I get this right, that would imply:
> > 
> >    1.0b < 1.0_beta1   
> >    1.0_p1 < 1.0_pre1
> 
> Yes. Because you are supposed to use '1.0~beta1' and '1.0~pre1'.

Sorry, but I don't see how you could introduce it with an EAPI then. What algorithm would you use to compare version numbers between different EAPIs?

For example, suppose a package has ebuilds foo-1.010_beta1, foo-1.01_p1, and foo-1.2 (all EAPI 5) and you add foo-1.010a and foo-1.01c, which are under your future EAPI. What would be be the ordering of these versions?
Comment 5 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-22 12:36:53 UTC
(In reply to Ulrich Müller from comment #4)
> (In reply to Michał Górny from comment #3)
> > This kills the whole proposal since '~' is needed to mean 'pre-release'.
> > I don't really feel like we should limit ourselves based on incompetency
> > of bash upstream.
> 
> This is not only about bash. Sticking to the "portable character set" pretty
> much guarantees that things will work on all target systems and with all
> tools.
> 
> > > So"5.052b17" would be equal to "5.52b17", and "1.01" equal to "1.1"?
> > 
> > Yes. Unless we want to compare zero-prefixed as strings, I'm fine either
> > way. However, it makes the rules more complex, and comes back when comparing
> > '1.10' to '1.010'...
> 
> I'd rather see our existing ambiguities resolved (like 1.01 being considered
> equal to 1.010) instead of new ones introduced.

Suggest a good solution.

> > > Do I get this right, that would imply:
> > > 
> > >    1.0b < 1.0_beta1   
> > >    1.0_p1 < 1.0_pre1
> > 
> > Yes. Because you are supposed to use '1.0~beta1' and '1.0~pre1'.
> 
> Sorry, but I don't see how you could introduce it with an EAPI then. What
> algorithm would you use to compare version numbers between different EAPIs?
> 
> For example, suppose a package has ebuilds foo-1.010_beta1, foo-1.01_p1, and
> foo-1.2 (all EAPI 5) and you add foo-1.010a and foo-1.01c, which are under
> your future EAPI. What would be be the ordering of these versions?

As I already explained, the old EAPI versions would be mapped to new EAPI. That is, '_beta1' would be treated as '~beta1'. That would allow us to preserve the same ordering.

I'm not sure how all that '010', '01' stuff works right now. I guess we can preserve the current rules to avoid issues.
Comment 6 Ulrich Müller gentoo-dev 2014-10-22 13:20:33 UTC
(In reply to Michał Górny from comment #5)
> > I'd rather see our existing ambiguities resolved (like 1.01 being considered
> > equal to 1.010) instead of new ones introduced.
> 
> Suggest a good solution.

I haven't thought intensively about it yet, but I believe that skipping steps 2 and 3 in PMS algorithm 3 (i.e., don't remove any trailing 0s) would get rid of this ambiguity.

The second ambiguity is that we use integer comparison for the first version comparison, with leads to:
0 == 00 == 000 < 01 == 1 < 09 == 9 < 010 == 10

Using the algorithm of filevercmp() from gnulib (basically, fall back to string comparison if integers are equal), this would be resolved to:
0 < 00 < 000 < 01 < 1 < 09 < 9 < 010 < 10

> > For example, suppose a package has ebuilds foo-1.010_beta1, foo-1.01_p1, and
> > foo-1.2 (all EAPI 5) and you add foo-1.010a and foo-1.01c, which are under
> > your future EAPI. What would be be the ordering of these versions?
> 
> As I already explained, the old EAPI versions would be mapped to new EAPI.
> That is, '_beta1' would be treated as '~beta1'. That would allow us to
> preserve the same ordering.
> 
> I'm not sure how all that '010', '01' stuff works right now. I guess we can
> preserve the current rules to avoid issues.

Under present EAPIs, order would be:
   1.010_beta1 < 1.01_p1 < 1.010a < 1.01c < 1.2

And if I understand your proposal right, order would be like this:
   1.01c < 1.01_p1 < 1.2 < 1.010_beta1 < 1.010a

I just don't see how these could be combined in a sane way, unless the new version comparison would preserve the old sort order.
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-22 14:06:53 UTC
(In reply to Ulrich Müller from comment #6)
> (In reply to Michał Górny from comment #5)
> > > I'd rather see our existing ambiguities resolved (like 1.01 being considered
> > > equal to 1.010) instead of new ones introduced.
> > 
> > Suggest a good solution.
> 
> I haven't thought intensively about it yet, but I believe that skipping
> steps 2 and 3 in PMS algorithm 3 (i.e., don't remove any trailing 0s) would
> get rid of this ambiguity.
> 
> The second ambiguity is that we use integer comparison for the first version
> comparison, with leads to:
> 0 == 00 == 000 < 01 == 1 < 09 == 9 < 010 == 10
> 
> Using the algorithm of filevercmp() from gnulib (basically, fall back to
> string comparison if integers are equal), this would be resolved to:
> 0 < 00 < 000 < 01 < 1 < 09 < 9 < 010 < 10

Sounds like it would work. Not sure though if it really helps anyone, or if it rather makes things less predictable. Because having 09 < 1 feels... non-algebraic at least :).

> > > For example, suppose a package has ebuilds foo-1.010_beta1, foo-1.01_p1, and
> > > foo-1.2 (all EAPI 5) and you add foo-1.010a and foo-1.01c, which are under
> > > your future EAPI. What would be be the ordering of these versions?
> > 
> > As I already explained, the old EAPI versions would be mapped to new EAPI.
> > That is, '_beta1' would be treated as '~beta1'. That would allow us to
> > preserve the same ordering.
> > 
> > I'm not sure how all that '010', '01' stuff works right now. I guess we can
> > preserve the current rules to avoid issues.
> 
> Under present EAPIs, order would be:
>    1.010_beta1 < 1.01_p1 < 1.010a < 1.01c < 1.2
> 
> And if I understand your proposal right, order would be like this:
>    1.01c < 1.01_p1 < 1.2 < 1.010_beta1 < 1.010a
> 
> I just don't see how these could be combined in a sane way, unless the new
> version comparison would preserve the old sort order.

Ok, that _p issue I didn't notice. So we could supposedly preserve '_', and treat it to compare less than letters & digits.

The '1.01' vs '1.010' is still unsolved by either of our proposals but I'm not sure if we're really supposed to handle it. We probably would have to map it into some sane version for comparison.
Comment 8 Fabian Groffen gentoo-dev 2014-10-22 14:48:40 UTC
1) what's the problem(s) with the current syntax?
2) you aim towards a "simpler" syntax, yet what you propose looks more complex to me at first, and yet you want to make more flexible as well, perhaps we should agree on that versioning *IS* complex because of the likes of our upstreams?
Comment 9 Ciaran McCreesh 2014-10-22 15:05:19 UTC
On the one hand, you *can* add new version rules in new EAPIs, so long as all version formats can be mapped into one "superformat" that has a consistent ordering.

On the other hand, you can't do this sensibly if you have to parse the ebuilds first to get the EAPI. We're back to needing the GLEP that must not be named.
Comment 10 Ulrich Müller gentoo-dev 2014-10-22 15:18:04 UTC
Link to a previous discussion:
http://thread.gmane.org/gmane.linux.gentoo.devel/61302/focus=61364