Summary: | [Future EAPI] Functions for version comparison and version component expansion | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | Ulrich Müller <ulm> |
Component: | PMS/EAPI | Assignee: | PMS/EAPI <pms> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | esigra, mgorny, pacho |
Priority: | Highest | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/eapi7-ver.eclass | ||
Whiteboard: | in-eapi-7 | ||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 174380 | ||
Attachments: | Sample implementation in bash |
Description
Ulrich Müller
2013-08-23 07:11:11 UTC
I'd go for two-argument only, one-argument is going to be visually confusing. Let's not make PMS any worse that it is now, while there's no real gain from not writing ${PVR}. As for operators, I think we should stick with the ones used in dep/atom strings for the sake of consistency. We are used to having to quote has_version/best_version arguments, so I guess we'll get used to quote this. And my +1 in general for the idea. While it's not something used widely, this is a logic that's necessary in the PM anyway, and indirectly exposed partially best_version / has_version. (In reply to Michał Górny from comment #1) > I'd go for two-argument only, one-argument is going to be visually > confusing. Let's not make PMS any worse that it is now, while there's no > real gain from not writing ${PVR}. How about putting the operator first then (i.e. Polish notation)? version_test OP VERSION1 VERSION2 > As for operators, I think we should stick with the ones used in dep/atom > strings for the sake of consistency. We are used to having to quote > has_version/best_version arguments, so I guess we'll get used to quote this. That would be =, <, <=, >, and >= then. No != operator? Should ~ be included too? It is asymmetric: "Equal to the specified version, except the revision part of the matching package may be greater than the revision part of the specified version (-r0 is assumed if no revision is explicitly stated)." (In reply to Ulrich Müller from comment #2) > (In reply to Michał Górny from comment #1) > > I'd go for two-argument only, one-argument is going to be visually > > confusing. Let's not make PMS any worse that it is now, while there's no > > real gain from not writing ${PVR}. > > How about putting the operator first then (i.e. Polish notation)? > > version_test OP VERSION1 VERSION2 No, IMO that's going to be not natural to most of the people. We should stick with something simple and explicit: if version_test $(getversion) '>=' 3.4; then ... fi well, when i look at it, -ge would feel fine there too, as bash syntax. > > As for operators, I think we should stick with the ones used in dep/atom > > strings for the sake of consistency. We are used to having to quote > > has_version/best_version arguments, so I guess we'll get used to quote this. > > That would be =, <, <=, >, and >= then. No != operator? Tricky, but there could !* as in blocker syntax :). > Should ~ be included too? It is asymmetric: > "Equal to the specified version, except the revision part of the matching > package may be greater than the revision part of the specified version (-r0 > is assumed if no revision is explicitly stated)." Hmm, maybe it should. if version_test ${llvm-version} ~ 3.4; then ... fi which makes me also think, what about wildcards. Will: if version_test ${llvm-version} == '3.4*'; then ... fi be allowed? All of versionator should be in the PM really. (In reply to Michał Górny from comment #3) > well, when i look at it, -ge would feel fine there too, as bash syntax. That's why it was my first suggestion. ;-) > which makes me also think, what about wildcards. Will: > > if version_test ${llvm-version} == '3.4*'; then > ... > fi > > be allowed? Please don't, or it will get complicated quickly. Next question will be if we also include slots and USE flags. (In reply to Ciaran McCreesh from comment #4) > All of versionator should be in the PM really. Really, all of its two line bash wrapper functions? Let's see, what can be done better in the package manager: - get_all_version_components / get_version_components (but only one function for it, with a flag like -a or --all) - version_compare / version_is_at_least version_sort would also be more efficient if implemented in the PM. However, only two packages (dev-util/nvidia-cuda-toolkit and dev-python/pycuda) are using this function, so presumably it's not worth it. > (In reply to Ciaran McCreesh from comment #4) > > All of versionator should be in the PM really. > > Really, all of its two line bash wrapper functions? *shrug* Would avoid people having to use an eclass for some but not all of them. But I don't really mind, so long as... > - get_all_version_components / get_version_components > (but only one function for it, with a flag like -a or --all) ...at least that isn't done in bash. > version_sort would also be more efficient if implemented in the PM. However, > only two packages (dev-util/nvidia-cuda-toolkit and dev-python/pycuda) are > using this function, so presumably it's not worth it. I wrote that for eselect, rather than tree use. Oh well. (In reply to Ciaran McCreesh from comment #6) > *shrug* Would avoid people having to use an eclass for some but not all of > them. But I don't really mind, so long as... > > > - get_all_version_components / get_version_components > > (but only one function for it, with a flag like -a or --all) > > ...at least that isn't done in bash. O.K., so we have: version_test VERSION1 OP VERSION2 # OP is -eq, -ne, etc. as in test(1) version_components [-a] VERSION Or is "version_expand" a better name for the second function? I've grepped the tree for uses of versionator.eclass functions. Here are the stats for *.ebuild (numbers are no. of packages, not ebuilds): get_version_component_range 447 replace_version_separator 149 replace_all_version_separators 58 get_major_version 52 delete_all_version_separators 31 version_is_at_least 29 delete_version_separator 27 get_version_components 11 get_after_major_version 8 version_format_string 7 get_version_component_count 6 version_compare 4 version_sort 1 get_last_version_component_index 1 get_all_version_components 1 This way obtained using: $ for a in $(grep '() {' eclass/versionator.eclass | cut -d'(' -f1); do echo -n "$a "; find -name '*.ebuild' -exec grep $a {} + | cut -d/ -f2-3 | uniq | wc -l; done | sort -k2 -n -r Ok, so let's state my point here. First of all, I believe there could be two reasons for including a particular function/feature in PM here. Either: 1. it is used commonly, or 2. it 'strongly' belongs in PMS. For example, it is something that PM does anyway so there's no point in having two implementations around. The version comparison function fits (2). It's not used that frequently but it's something that PM does all the time. There's no point in keeping two implementations of it in two different places. Some of the other functions fit (1). Though, I don't feel like adding half a dozen functions is the best thing to do. If possible, I would rather prefer, well, trimming them to one or two functions that can do the most common tasks. get_version_component_range() seems to be the most commonly used function. It is used both with ${PV} and other versions. That's probably the function we should target first. We can squash at least get_major_version() and get_after_major_version() into it. Then there's replace_version_separator() (which can be used with pattern or index) and replace_all_version_separators(). We could supposedly squash them into another function. Then, we can also squash delete_all_version_separators() and delete_version_separator() into it. And I think we shouldn't go further than that. This leaves get_version_components(), version_format_string(), get_version_component_count(), get_last_version_component_index() and get_all_version_components() in the field. But those are used quite rarely, to be honest. Another thing is that most of versionator.eclass' function names don't make real sense. For example, it's not immediately clear what's the difference between get_version_component_range() and get_version_components(). We should avoid repeating the same confusion within EAPI. (In reply to Michał Górny from comment #9) > The version comparison function fits (2). It's not used that frequently but > it's something that PM does all the time. There's no point in keeping two > implementations of it in two different places. Going from there: 1. version_test VERSION1 OP VERSION2 (as outlined in comment #7) > Some of the other functions fit (1). Though, I don't feel like adding half a > dozen functions is the best thing to do. If possible, I would rather prefer, > well, trimming them to one or two functions that can do the most common > tasks. I agree that we shouldn't have ten or more functions. > get_version_component_range() seems to be the most commonly used function. > It is used both with ${PV} and other versions. That's probably the function > we should target first. We can squash at least get_major_version() and > get_after_major_version() into it. 2. version_part RANGE [VERSION] RANGE is either a single number or START-[END], with END defaulting to the total number of components. VERSION defaults to ${PV}. > Then there's replace_version_separator() (which can be used with pattern or > index) and replace_all_version_separators(). We could supposedly squash them > into another function. Then, we can also squash > delete_all_version_separators() and delete_version_separator() into it. 3. version_separator_replace RANGE REPLACEMENT [VERSION] RANGE is a single number or START-[END]; in the latter case all separators from START to END are replaced. Pattern substitution can be done in bash (like ${PV/./-}) so maybe we need not implement it. > 2. version_part RANGE [VERSION] > RANGE is either a single number or START-[END], with END defaulting to > the total number of components. > VERSION defaults to ${PV}. Coming back to this. It seems that PMS wording and versionator.eclass don't agree on what is part of components and what is a separator. PMS says "followed by zero or more of the suffixes _alpha, _beta, _pre, _rc or _p, which themselves may be suffixed by an optional integer" and "followed by the suffix -r followed immediately by an integer". So, 3.0_p2-r1 would have components "3", "0", "_p2", and "-r1" with separators ".", "", and "". Whereas versionator.eclass interprets it as having components "3", "0", "p2", and "r1" with separators ".", "_", and "-". IMHO we should follow versionator.eclass here, which would also agree with ${PR} being "r1" (but _not_ "-r1"). > 3. version_separator_replace RANGE REPLACEMENT [VERSION] > RANGE is a single number or START-[END]; in the latter case all separators > from START to END are replaced. I wonder if empty separators should be counted here? For example, "3.0c-r1" would have three separators ".", "", and "-". (This would be different from replace_version_separator() of versionator.eclass which counts only the nonempty ones.) Recently, it was clarified that a suffix and the integer following it count as separate version components: https://gitweb.gentoo.org/proj/pms.git/diff/names.tex?id=cb99e4dcb5837626320b1fba0277d0fa7c1c9829 I wonder if the proposed replacements for get_version_component_range() and replace_version_separator() should do any validation of their version argument, i.e. should they check if it is valid version syntax under PMS? Without such validation, things would boil down to simple pattern matching: - Strings of digits and strings of letters are version components. - Anything between such components (possibly including the empty string?) is a version separator. For example, 12.3d_pre5 would have the five components "12", "3", "d", "pre", and "5" with the four separators ".", "", "_" and "". Well, version_test is quite clearly useful. However, for the other things I think we should do some in-repo research on how things are used and how they could be replaced without making it PITA and thoroughly confusing ;-). Updated stats: get_version_component_range 485 replace_version_separator 142 replace_all_version_separators 59 get_major_version 54 version_is_at_least 37 delete_all_version_separators 29 delete_version_separator 17 get_version_components 9 get_version_component_count 9 version_format_string 8 get_after_major_version 5 version_compare 4 get_all_version_components 4 get_last_version_component_index 3 version_sort 2 So, after some brainstorming yesterday, /me and ulm came to the following conclusions (@ulm, please correct me if I recall wrong). There will be three functions which should cover almost all of versionator.eclass (names subject to change): version_test, version_replace_sep, version_get_comp. version_test [V1] OP V2 ----------------------- - tests whether V1 is in OP relation to V2, following PMS version comparison algorithm. - V1 is optional and defaults to ${PVR}. - OP is one of -gt, -ge, -lt, -le, -eq, -ne, with the 5 first being equivalent to dep spec operators >, >=, <, <=, = respectively and the last one would provide simple negation of =. - We probably won't provide equivalent for '~' since it's easy enough to provide version without revision (${PV} or ${SOMEVERSION%-r*}). - Though we probably should provide '*' for -eq & -ne equivalent to the one supported in PM version syntax. - V1 & V2 must be valid PMS versions. Version splitting ----------------- For the two remaining functions, we introduce simplified and more flexible version splitting rules that are intended to work better with non-Gentoo versions (and transforming Gentoo versions into non-Gentoo versions). A version string is composed of one or more version components, delimited by (possibly empty) version separators. Each component is a sequence made purely of letters [a-zA-Z] or purely of digits [0-9]. Separators are sequences of characters outside [a-zA-Z0-9]. A version string must start and end with a version component. In other words, version like: 1.2.3b_alpha4 would be split into: 1 . 2 . 3 '' b _ alpha '' 4 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ c1 s1 c2 s2 c3 s3 c4 s4 c5 s5 c6 (c being component indexes, s being separator indexes) replace_version_sep RANGE REPL [RANGE REPL...] [VERSION] -------------------------------------------------------- - replaces version separators by index. Replacement by string can be accomplished using regular ${PV/_/-} syntax. - RANGE = START[-[END]]. One digit means specific index, digit + '-' means specific index and all separators following it, full range means separators from START to END, both START and END included. - START and END both must not be larger than the last separator index. - If RANGE includes empty separators, they are replaced as well. IOW, $(replace_version_sep 3 - 1.2.3b) == '1.2.3-b'. - REPL is a replacement string. It can be any string, and will be repeated for each replaced separator. - Multiple pairs of RANGE & REPL can be provided for better replacement efficiency. The specified RANGEs must be disjunctive. - VERSION is optional and defaults to ${PV} (I don't think we want ${PVR} here). get_version_comp RANGE [VERSION] -------------------------------- - gets part of version string, containing components from specified RANGE along with their separators. - RANGE = START[-[END]]. One digit means specific index, digit + '-' means specific index and all components following it, full range means components from START to END, both START and END included. - START and END both must not be larger than the last component index. - All separators contained between requested version components are included. Separator preceding and following them are not. Final notes ----------- Other versionator.eclass functionalities can be easily obtained by specific use of one or two of the functions: * get_version_component_range -> version_get_comp * replace_version_separator -> version_replace_sep or ${PV/...} * replace_all_version_separators -> version_replace_sep 1- * get_major_version -> version_get_comp 1 * version_is_at_least -> version_test [...] -ge ... * delete_all_version_separators -> version_replace_sep 1- '' * delete_version_separator -> version_replace_sep N '' * get_version_components -> version_replace_sep 1- ' ' * get_version_component_count -> above into array, ${#comps[@]} * version_format_string -> some combination of version_replace_sep and version_get_comp. * get_after_major_version -> version_get_comp 2- * version_compare -> version_test * get_all_version_components -> no replacement since it both includes components in separators for no good reason ;-) * version_sort -> no replacement, that's just a crazy thing Created attachment 418072 [details] Sample implementation in bash A sample implementation is attached, which shows that this can be coded in bash in a reasonably compact form. Some differences to what we had previously: (In reply to Michał Górny from comment #15) > A version string is composed of one or more version components, delimited by > (possibly empty) version separators. Each component is a sequence made > purely of letters [a-zA-Z] or purely of digits [0-9]. Separators are > sequences of characters outside [a-zA-Z0-9]. A version string must start and > end with a version component. If it doesn't, then an implicit first (or last) empty component will be added. This is more in line with what cut(1) does. IMHO we should only error out when range syntax is malformed, but allow an arbitrary version string. > replace_version_sep RANGE REPL [RANGE REPL...] [VERSION] > -------------------------------------------------------- > - START and END both must not be larger than the last separator index. Don't error out here, but silently ignore anything beyond the last separator. > - Multiple pairs of RANGE & REPL can be provided for better replacement > efficiency. The specified RANGEs must be disjunctive. Last range wins. > get_version_comp RANGE [VERSION] > -------------------------------- > - START and END both must not be larger than the last component index. Again, silently ignore anything beyond the last separator. This is in line with cut(1) behaviour. IMHO, all this hinges on the version comparison function being implemented in a way that is easily and efficiently accessible from ebuilds. Using IPC or duplicating the implementation in bash would render this proposal meaningless. The version splitting functions will have to be implemented in bash in any case, and IIUC the conclusion was _not_ to use PMS version rules there, because the functions are intended to convert from and to non-Gentoo versions. So also there the argument why this should go into the package manager (as opposed to an eclass) is weakened. The work already done could be used as the basis of a versionator-r1.eclass, of course. > IMHO, all this hinges on the version comparison function being implemented in a > way that is easily and efficiently accessible from ebuilds. Using IPC or > duplicating the implementation in bash would render this proposal meaningless. IPC implemented correctly is plenty fast enough. > The version splitting functions will have to be implemented in bash in any > case, and IIUC the conclusion was _not_ to use PMS version rules there, because > the functions are intended to convert from and to non-Gentoo versions. I suspect this argument doesn't hold up to scrutiny: where do ebuilds take non-Gentoo versions as inputs? (In reply to Ciaran McCreesh from comment #18) > IPC implemented correctly is plenty fast enough. Even in global scope? Several popular eclasses call version_is_at_least in global scope. > I suspect this argument doesn't hold up to scrutiny: where do ebuilds take > non-Gentoo versions as inputs? There are examples in the tree, e.g. ebuilds doing pattern substitution in PV and using the result as argument for get_version_component_range. The current reference implementation is provided for testing purposes as eapi7-ver.eclass in ::gentoo. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=08d405f72e85125b64f5074e49714e759d85eaf6 commit 08d405f72e85125b64f5074e49714e759d85eaf6 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2018-02-19 15:38:28 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2018-03-11 11:43:41 +0000 Add EAPI 7 version comparison & manipulation functions The function code is copied from eapi7-ver.eclass which has been written by Ulrich Müller and me. Bug: https://bugs.gentoo.org/482170 bin/eapi.sh | 4 + bin/eapi7-ver-funcs.sh | 191 ++++++++++++++++++++ bin/isolated-functions.sh | 4 + pym/portage/tests/bin/test_eapi7_ver_funcs.py | 240 ++++++++++++++++++++++++++ 4 files changed, 439 insertions(+)} The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/pms.git/commit/?id=c9c5a2f0cf7671d73a78a544d28c361ecd5a2b3c commit c9c5a2f0cf7671d73a78a544d28c361ecd5a2b3c Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2017-09-27 21:23:22 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2018-04-05 16:48:17 +0000 EAPI 7 has version manipulation and comparison functions Bug: https://bugs.gentoo.org/482170 eapi-differences.tex | 4 +++ pkg-mgr-commands.tex | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+)} |