Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 270108 - Limit length of version components to 18 digits
Summary: Limit length of version components to 18 digits
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: PMS/EAPI (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: PMS/EAPI
URL: http://www.gentoo.org/proj/en/council...
Whiteboard:
Keywords:
Depends on:
Blocks: 335925
  Show dependency tree
 
Reported: 2009-05-16 22:37 UTC by Ulrich Müller
Modified: 2017-10-08 16:10 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Patch for repoman (repoman.diff,823 bytes, patch)
2010-05-12 15:34 UTC, Ulrich Müller
Details | Diff
PMS patch (0001-Limit-integer-part-of-version-components-to-18-digit.patch,1.15 KB, patch)
2017-09-23 07:49 UTC, Ulrich Müller
Details | Diff
Updated PMS patch (0001-Support-for-integer-version-components-up-to-18-digi.patch,1.20 KB, patch)
2017-09-24 09:37 UTC, Ulrich Müller
Details | Diff
PMS patch v3 (0001-Clarify-that-version-components-can-have-arbitrary-l.patch,1.12 KB, patch)
2017-09-29 17:32 UTC, Ulrich Müller
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Müller gentoo-dev 2009-05-16 22:37:27 UTC
From the council meeting of 2008-06-12, <http://www.gentoo.org/proj/en/council/meeting-logs/20080612-summary.txt>:
> PMS: Versions can have >8 digits. If you want a maximum limit, discuss
> it with relevant people and propose one.

I propose that the maximum length of a numeric version component should be limited to 18 digits, for the following reasons:
- The underlying language of ebuilds is bash-3.0 which can work with integers
  up to 2^63 - 1. This is just enough to represent all 18 digit decimal numbers.
  It doesn't make much sense to allow for longer version components since shell
  arithmetic is commonly used for version comparison in ebuilds and eclasses.
- Maximum length used in the Portage tree is 14 digits, used by two packages
  (sys-apps/net-tools and sys-block/btrace) for a date-base version number of
  the form yyyymmddhhmmss. Only 15 packages are using version components with
  more that 8 digits. So limiting it to 18 will impose no serious limitation.
- It may be convenient for any tools that work on ebuild repositories if they
  can represent versions by numeric types. (As a side remark, it's clear that
  versions with leading zeros require special treatment.)
Comment 1 Alex HeadHunter Pyattaev 2009-05-17 09:43:21 UTC
(In reply to comment #0)
> From the council meeting of 2008-06-12,
> <http://www.gentoo.org/proj/en/council/meeting-logs/20080612-summary.txt>:
> > PMS: Versions can have >8 digits. If you want a maximum limit, discuss
> > it with relevant people and propose one.
> 
> I propose that the maximum length of a numeric version component should be
> limited to 18 digits, for the following reasons:
> - The underlying language of ebuilds is bash-3.0 which can work with integers
>   up to 2^63 - 1. This is just enough to represent all 18 digit decimal
> numbers.
>   It doesn't make much sense to allow for longer version components since shell
>   arithmetic is commonly used for version comparison in ebuilds and eclasses.
> - Maximum length used in the Portage tree is 14 digits, used by two packages
>   (sys-apps/net-tools and sys-block/btrace) for a date-base version number of
>   the form yyyymmddhhmmss. Only 15 packages are using version components with
>   more that 8 digits. So limiting it to 18 will impose no serious limitation.
> - It may be convenient for any tools that work on ebuild repositories if they
>   can represent versions by numeric types. (As a side remark, it's clear that
>   versions with leading zeros require special treatment.)
> 
Agreed. This could be very useful for any non-script based portage utils to know exactly how large the integer for version should be. And indeed, if someone for some reaseon exceeds existing technical limits, he will be greatly surprised=)
Comment 2 Ulrich Müller gentoo-dev 2009-05-17 10:20:11 UTC
There was also a long discussion of the topic at the council meeting of 2008-05-08 from 21:28 to 22:23, see <http://www.gentoo.org/proj/en/council/meeting-logs/20080508.txt>, and the quick summary at <http://dev.gentoo.org/~dberkholz/20080508-summary.txt>:
"The council generally favored allowing versions to have <=18 digits. This allows them to fit into 64 bits (18 signed digits or 19 unsigned) and gives them an upper bound, which some implementations of version parsing could find useful."

Some limitations in existing programs:
- portage-utils-0.2 uses 64 bit integers for numeric version components
  (uint64_t in functions atom_compare and atom_explode).
- versionator.eclass uses bash arithmetic and fails at 19 digits:
     $ version_is_at_least 0 9876543210987654321 && echo true || echo false
     false
     $ version_sort 1 2 3 9876543210987654321
     9876543210987654321 1 2 3
  18 digit versions seem to be fine, though

I can prepare a patch for PMS if there is consensus.
Comment 3 Ciaran McCreesh 2009-05-17 14:26:56 UTC
I'd rather not. It just encourages people to represent parts as integers, which they aren't. Those leading zeroes are significant.

As for versionator... It was supposed to have been replaced by a PM internal ages ago, but we can't do that for much-discussed reasons...
Comment 4 Ulrich Müller gentoo-dev 2009-05-20 19:22:36 UTC
(In reply to comment #3)
> I'd rather not. It just encourages people to represent parts as integers,
> which they aren't. Those leading zeroes are significant.

This doesn't necessarily imply a string type. For example, numeric version components could be represented as two integers, with the second one counting the leading zeros. (And for example portage-utils treats the leading zeros case first, then does numeric comparison with uint64_t.)

> As for versionator... It was supposed to have been replaced by a PM internal
> ages ago, but we can't do that for much-discussed reasons...

But is it feasible to change it to a string-based approach with unlimited precision? I'd expect that this would make it even slower.
Comment 5 Ciaran McCreesh 2009-05-20 19:27:35 UTC
(In reply to comment #4)
> > As for versionator... It was supposed to have been replaced by a PM internal
> > ages ago, but we can't do that for much-discussed reasons...
> 
> But is it feasible to change it to a string-based approach with unlimited
> precision? I'd expect that this would make it even slower.

Versionator's problem isn't its slowness -- it's not slow enough to even come close to the bash costs of generating metadata. Versionator's problem is that it was written back in the days of the old version rules, and even then it didn't cover every side case.

Given the complexity and cost of parsing version specs, this is something that should only be done by the package manager's library. We should be moving towards making that happen, rather than adding in limitations to encourage people to carry on not doing things properly.
Comment 6 Ulrich Müller gentoo-dev 2009-06-12 20:21:56 UTC
@portage-utils: Any opinion on this matter? Can app-portage/portage-utils be changed to support arbitrarily long version numbers, or should we limit the length of components to 18 digits?
Comment 7 Alex HeadHunter Pyattaev 2009-06-13 10:30:42 UTC
Hey people, this argue is just about nothing in particular. The problem is not  in length limit, but in absence of standard implementation. There should be some standard for version number processing, what implementation it would require does not really matter. What matters, it will be the ONLY implementation. I don't really know much about gentoo internals, so I could just propose that some of gentoo gods cold post some draft for further discussion (e.g. forum thread), and this bug can be closed with the link to that discussion.

Comment 8 Ulrich Müller gentoo-dev 2009-06-13 11:00:11 UTC
(In reply to comment #7)
> The problem is not in length limit, but in absence of standard
> implementation. There should be some standard for version number processing,
> what implementation it would require does not really matter.

Sorry, but this bug is about having a length limit or not, and nothing else. The algorithm for version comparison is precisely described in PMS, see <http://dev.gentoo.org/~gentoofan23/pms/eapi-2-approved/pms.html#x1-270002.3>, and we have at least three independent implementations of it in the package managers (in Python, C++, and C, for Portage, Paludis, and Pkgcore, respectively) which I think are all correct. (However I just noticed that Pkgcore is using "long long" in one place.)

> What matters, it will be the ONLY implementation.

I think that's not easily possible. (If you need an example: In ebuild global scope we are limited to bash, so we cannot call anything that's written in C or Python.)
Comment 9 Ulrich Müller gentoo-dev 2009-08-21 21:45:55 UTC
Coming back to this. In other contexts there's Postel's prescription:
"Be generous in what you accept, rigorous in what you emit."

Translated to our case this would mean:
 - Package managers should accept version components with arbitrary length.
 - In-tree usage should be limited to 18 digits (because for longer ones
   there _is_ breakage with existing code, as pointed out above).
Comment 10 Ciaran McCreesh 2009-08-21 21:49:53 UTC
If we're looking at it that way, the 18 digit limit could be considered a pure QA issue, and thus be grounds for repoman but not PMS.
Comment 11 Ulrich Müller gentoo-dev 2009-08-21 21:55:04 UTC
(In reply to comment #10)
> If we're looking at it that way, the 18 digit limit could be considered a pure
> QA issue, and thus be grounds for repoman but not PMS.

"This document aims to fully describe the format of an ebuild repository and the ebuilds therein, [...]"
Comment 12 Ciaran McCreesh 2009-08-21 21:57:53 UTC
Look at it like DESCRIPTION: from a QA perspective, it's supposed to be below $something characters, and it's reasonable to enforce that from the repoman side. From a PMS perspective, there's no imposed limit, and package managers should accept arbitrary lengths.
Comment 13 Martin Väth 2009-08-22 16:26:37 UTC
Sorry to step into the older discussion, I just thought that I share my
experience from eix which also needs to parse and compare versions.
Originally, this was implemented as integers, but I had soon changed
this to string handling, so I really can compare both approaches.

It turned out that neither speed nor required storage had changed
dramatically. In fact, what is really needed for version numbers is
no arithmetic but only comparison, and this can be done essentially by
one lexicographical comparison which is practically as fast as integer
comparison in almost any language (special cases with leading zeros
have to be treated separately anyway).
Concerning the storage, the situation depends of course on the format:
Uncompressed, strings are usually even better, because almost no part
really uses 8 bytes (needed for 64 bits); if strings and integers both
are stored in reasonably compresssed format, there is also not much
difference.

So from my experience, one should really not encourage people to use
integers for version parts as it gives practically no advantages but
just introduces some limits (and BTW requires an additional step to
transform the string into an integer which also costs a lot of time).

Another remark: Although it should be possible to fix versionator.eclass
to work with strings instead of integers, I would just like to advertise
that >=eix-0.16.0 contains the binary "versionsort" which is essentially
an improved (and much faster with non-quadratic runtime) variant of
version_sort from the versionator.eclass
So if really speed is an important issue, ebuilds needing it for building
might just DEPEND on >=eix-0.16.0 ...
Comment 14 Ulrich Müller gentoo-dev 2010-05-12 15:32:58 UTC
(In reply to comment #10)
> If we're looking at it that way, the 18 digit limit could be considered a
> pure QA issue, and thus be grounds for repoman but not PMS.

So let's change this issue into a repoman bug. (The devmanual has already been taken care of in bug 282303.)

See comment #0 and comment #2 for the rationale.
Comment 15 Ulrich Müller gentoo-dev 2010-05-12 15:34:33 UTC
Created attachment 231233 [details, diff]
Patch for repoman

Attached patch works for me.
Comment 16 Zac Medico gentoo-dev 2010-05-12 21:02:59 UTC
(In reply to comment #15)
> Created an attachment (id=231233) [details]
> Patch for repoman

Thanks, that's in git now:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=d4c5043ef89d320086c6dafb946039cc96a3792c
Comment 17 Zac Medico gentoo-dev 2010-08-23 21:55:13 UTC
This is in 2.2_rc68.
Comment 18 Zac Medico gentoo-dev 2010-09-04 08:11:12 UTC
This is fixed in 2.1.9.
Comment 19 Ulrich Müller gentoo-dev 2017-09-23 07:35:56 UTC
Reopening, and reassigning to PMS again.

IMHO it doesn't make much sense to guarantee arbitrary length of components in PMS, while limiting them to 18 digits in the devmanual and enforcing the limit with repoman.
Comment 20 Ulrich Müller gentoo-dev 2017-09-23 07:49:16 UTC
Created attachment 496124 [details, diff]
PMS patch
Comment 21 Ulrich Müller gentoo-dev 2017-09-24 09:37:24 UTC
Created attachment 496258 [details, diff]
Updated PMS patch

Updated wording, more focusing on what shall be supported as a minimum requirement (namely, 18 digits), and what is desirable (namely, arbitrary length).


I also want to note that the limitations mentioned in comment #2 are still present more than 8 years later:

> - portage-utils-0.2 uses 64 bit integers for numeric version components
portage-utils-0.64 is still using atoll(3) and uint64_t.

> - versionator.eclass uses bash arithmetic and fails at 19 digits [...]
Also that limitation is still in place. (We might support arbitrary length in eapi7-ver.eclass but it would be significantly slower than arithmetic tests.)

Neither of them is surprising, because with the 18 digits QA limit in place these code paths will never be tested.
Comment 22 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-09-24 10:10:58 UTC
Me no have strong opinion.

On one hand, I don't like imposing arbitrary limits on numbers that are used as strings most of the time, especially that most of the implementations use unlimited precision and it's not very hard to combine optimized logic with unlimited precision fallback.

On the other hand, I really don't see us using more than 18 digits for any relatively sane project. It's even enough to express a timestamp with one millisecond precision (as a continuous number). In other words, I doubt it will ever happen unless someone purposefully uses awfully long version numbers to break stuff.

But then, given the recent trend in quickly growing version numbers, I wouldn't be surprised if someone started to use unary coding for version numbers.
Comment 23 Ulrich Müller gentoo-dev 2017-09-29 17:32:37 UTC
Created attachment 497120 [details, diff]
PMS patch v3

Thinking about it, limiting the length of versions would have the consequence that foo-1234567890123456789 were a valid package name, while foo-123456789012345678 would be invalid (by https://projects.gentoo.org/pms/6/pms.html#x1-210003.1.2).

So, I retract this. Let's make it explicit though that there is no limit. See attached patch.
Comment 24 Ulrich Müller gentoo-dev 2017-10-08 16:10:17 UTC
(In reply to Ulrich Müller from comment #23)
> Created attachment 497120 [details, diff] [details, diff]
> PMS patch v3

Pushed. Closing again.