Summary: | Limit length of version components to 18 digits | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | Ulrich Müller <ulm> |
Component: | PMS/EAPI | Assignee: | Package Manager Specification <pms> |
Status: | RESOLVED WORKSFORME | ||
Severity: | enhancement | CC: | alex.pyattaev, esigra, martin, mgorny |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://www.gentoo.org/proj/en/council/meeting-logs/20080508.txt | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=188449 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 335925 | ||
Attachments: |
Patch for repoman
PMS patch Updated PMS patch PMS patch v3 |
Description
Ulrich Müller
![]() (In reply to comment #0) > From the council meeting of 2008-06-12, > <http://www.gentoo.org/proj/en/council/meeting-logs/20080612-summary.txt>: > > PMS: Versions can have >8 digits. If you want a maximum limit, discuss > > it with relevant people and propose one. > > I propose that the maximum length of a numeric version component should be > limited to 18 digits, for the following reasons: > - The underlying language of ebuilds is bash-3.0 which can work with integers > up to 2^63 - 1. This is just enough to represent all 18 digit decimal > numbers. > It doesn't make much sense to allow for longer version components since shell > arithmetic is commonly used for version comparison in ebuilds and eclasses. > - Maximum length used in the Portage tree is 14 digits, used by two packages > (sys-apps/net-tools and sys-block/btrace) for a date-base version number of > the form yyyymmddhhmmss. Only 15 packages are using version components with > more that 8 digits. So limiting it to 18 will impose no serious limitation. > - It may be convenient for any tools that work on ebuild repositories if they > can represent versions by numeric types. (As a side remark, it's clear that > versions with leading zeros require special treatment.) > Agreed. This could be very useful for any non-script based portage utils to know exactly how large the integer for version should be. And indeed, if someone for some reaseon exceeds existing technical limits, he will be greatly surprised=) There was also a long discussion of the topic at the council meeting of 2008-05-08 from 21:28 to 22:23, see <http://www.gentoo.org/proj/en/council/meeting-logs/20080508.txt>, and the quick summary at <http://dev.gentoo.org/~dberkholz/20080508-summary.txt>: "The council generally favored allowing versions to have <=18 digits. This allows them to fit into 64 bits (18 signed digits or 19 unsigned) and gives them an upper bound, which some implementations of version parsing could find useful." Some limitations in existing programs: - portage-utils-0.2 uses 64 bit integers for numeric version components (uint64_t in functions atom_compare and atom_explode). - versionator.eclass uses bash arithmetic and fails at 19 digits: $ version_is_at_least 0 9876543210987654321 && echo true || echo false false $ version_sort 1 2 3 9876543210987654321 9876543210987654321 1 2 3 18 digit versions seem to be fine, though I can prepare a patch for PMS if there is consensus. I'd rather not. It just encourages people to represent parts as integers, which they aren't. Those leading zeroes are significant. As for versionator... It was supposed to have been replaced by a PM internal ages ago, but we can't do that for much-discussed reasons... (In reply to comment #3) > I'd rather not. It just encourages people to represent parts as integers, > which they aren't. Those leading zeroes are significant. This doesn't necessarily imply a string type. For example, numeric version components could be represented as two integers, with the second one counting the leading zeros. (And for example portage-utils treats the leading zeros case first, then does numeric comparison with uint64_t.) > As for versionator... It was supposed to have been replaced by a PM internal > ages ago, but we can't do that for much-discussed reasons... But is it feasible to change it to a string-based approach with unlimited precision? I'd expect that this would make it even slower. (In reply to comment #4) > > As for versionator... It was supposed to have been replaced by a PM internal > > ages ago, but we can't do that for much-discussed reasons... > > But is it feasible to change it to a string-based approach with unlimited > precision? I'd expect that this would make it even slower. Versionator's problem isn't its slowness -- it's not slow enough to even come close to the bash costs of generating metadata. Versionator's problem is that it was written back in the days of the old version rules, and even then it didn't cover every side case. Given the complexity and cost of parsing version specs, this is something that should only be done by the package manager's library. We should be moving towards making that happen, rather than adding in limitations to encourage people to carry on not doing things properly. @portage-utils: Any opinion on this matter? Can app-portage/portage-utils be changed to support arbitrarily long version numbers, or should we limit the length of components to 18 digits? Hey people, this argue is just about nothing in particular. The problem is not in length limit, but in absence of standard implementation. There should be some standard for version number processing, what implementation it would require does not really matter. What matters, it will be the ONLY implementation. I don't really know much about gentoo internals, so I could just propose that some of gentoo gods cold post some draft for further discussion (e.g. forum thread), and this bug can be closed with the link to that discussion. (In reply to comment #7) > The problem is not in length limit, but in absence of standard > implementation. There should be some standard for version number processing, > what implementation it would require does not really matter. Sorry, but this bug is about having a length limit or not, and nothing else. The algorithm for version comparison is precisely described in PMS, see <http://dev.gentoo.org/~gentoofan23/pms/eapi-2-approved/pms.html#x1-270002.3>, and we have at least three independent implementations of it in the package managers (in Python, C++, and C, for Portage, Paludis, and Pkgcore, respectively) which I think are all correct. (However I just noticed that Pkgcore is using "long long" in one place.) > What matters, it will be the ONLY implementation. I think that's not easily possible. (If you need an example: In ebuild global scope we are limited to bash, so we cannot call anything that's written in C or Python.) Coming back to this. In other contexts there's Postel's prescription: "Be generous in what you accept, rigorous in what you emit." Translated to our case this would mean: - Package managers should accept version components with arbitrary length. - In-tree usage should be limited to 18 digits (because for longer ones there _is_ breakage with existing code, as pointed out above). If we're looking at it that way, the 18 digit limit could be considered a pure QA issue, and thus be grounds for repoman but not PMS. (In reply to comment #10) > If we're looking at it that way, the 18 digit limit could be considered a pure > QA issue, and thus be grounds for repoman but not PMS. "This document aims to fully describe the format of an ebuild repository and the ebuilds therein, [...]" Look at it like DESCRIPTION: from a QA perspective, it's supposed to be below $something characters, and it's reasonable to enforce that from the repoman side. From a PMS perspective, there's no imposed limit, and package managers should accept arbitrary lengths. Sorry to step into the older discussion, I just thought that I share my experience from eix which also needs to parse and compare versions. Originally, this was implemented as integers, but I had soon changed this to string handling, so I really can compare both approaches. It turned out that neither speed nor required storage had changed dramatically. In fact, what is really needed for version numbers is no arithmetic but only comparison, and this can be done essentially by one lexicographical comparison which is practically as fast as integer comparison in almost any language (special cases with leading zeros have to be treated separately anyway). Concerning the storage, the situation depends of course on the format: Uncompressed, strings are usually even better, because almost no part really uses 8 bytes (needed for 64 bits); if strings and integers both are stored in reasonably compresssed format, there is also not much difference. So from my experience, one should really not encourage people to use integers for version parts as it gives practically no advantages but just introduces some limits (and BTW requires an additional step to transform the string into an integer which also costs a lot of time). Another remark: Although it should be possible to fix versionator.eclass to work with strings instead of integers, I would just like to advertise that >=eix-0.16.0 contains the binary "versionsort" which is essentially an improved (and much faster with non-quadratic runtime) variant of version_sort from the versionator.eclass So if really speed is an important issue, ebuilds needing it for building might just DEPEND on >=eix-0.16.0 ... (In reply to comment #10) > If we're looking at it that way, the 18 digit limit could be considered a > pure QA issue, and thus be grounds for repoman but not PMS. So let's change this issue into a repoman bug. (The devmanual has already been taken care of in bug 282303.) See comment #0 and comment #2 for the rationale. Created attachment 231233 [details, diff]
Patch for repoman
Attached patch works for me.
(In reply to comment #15) > Created an attachment (id=231233) [details] > Patch for repoman Thanks, that's in git now: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=d4c5043ef89d320086c6dafb946039cc96a3792c This is in 2.2_rc68. This is fixed in 2.1.9. Reopening, and reassigning to PMS again. IMHO it doesn't make much sense to guarantee arbitrary length of components in PMS, while limiting them to 18 digits in the devmanual and enforcing the limit with repoman. Created attachment 496124 [details, diff]
PMS patch
Created attachment 496258 [details, diff] Updated PMS patch Updated wording, more focusing on what shall be supported as a minimum requirement (namely, 18 digits), and what is desirable (namely, arbitrary length). I also want to note that the limitations mentioned in comment #2 are still present more than 8 years later: > - portage-utils-0.2 uses 64 bit integers for numeric version components portage-utils-0.64 is still using atoll(3) and uint64_t. > - versionator.eclass uses bash arithmetic and fails at 19 digits [...] Also that limitation is still in place. (We might support arbitrary length in eapi7-ver.eclass but it would be significantly slower than arithmetic tests.) Neither of them is surprising, because with the 18 digits QA limit in place these code paths will never be tested. Me no have strong opinion. On one hand, I don't like imposing arbitrary limits on numbers that are used as strings most of the time, especially that most of the implementations use unlimited precision and it's not very hard to combine optimized logic with unlimited precision fallback. On the other hand, I really don't see us using more than 18 digits for any relatively sane project. It's even enough to express a timestamp with one millisecond precision (as a continuous number). In other words, I doubt it will ever happen unless someone purposefully uses awfully long version numbers to break stuff. But then, given the recent trend in quickly growing version numbers, I wouldn't be surprised if someone started to use unary coding for version numbers. Created attachment 497120 [details, diff] PMS patch v3 Thinking about it, limiting the length of versions would have the consequence that foo-1234567890123456789 were a valid package name, while foo-123456789012345678 would be invalid (by https://projects.gentoo.org/pms/6/pms.html#x1-210003.1.2). So, I retract this. Let's make it explicit though that there is no limit. See attached patch. (In reply to Ulrich Müller from comment #23) > Created attachment 497120 [details, diff] [details, diff] > PMS patch v3 Pushed. Closing again. |