Summary: | rewrite of version handling code | ||
---|---|---|---|
Product: | Portage Development | Reporter: | John Mylchreest (RETIRED) <johnm> |
Component: | Core - Ebuild Support | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | danarmak, dberkholz, dolsen, ed, henrik, sascha-gentoo-bugzilla, sbc28 |
Priority: | Lowest | Keywords: | InVCS |
Version: | 2.2 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 115839 | ||
Attachments: |
rewritten version handling functions
A list of current ebuilds that contain an unnumbered endversion code which allows numberless suffixes test script Patch to gentoo-howto.xml Porthole's new string sort routine include changed pkgsplit() function, changed cvs versioning syntax |
Description
John Mylchreest (RETIRED)
2004-01-06 11:14:15 UTC
Created attachment 25267 [details]
rewritten version handling functions
Don't know who wrote the old versioning code, but it hurts to even look at it.
I've rewritten it incorporating a few enhancements:
- the version can now contain an arbitrary number of _beta_alpha_pre_rc_p
suffixes
- if the version string starts with cvs- it's always newer than a non-cvs
version
- undefined comparison between x.yA and x.y.z switched: with the old code x.yA
was newer, now x.y.z is newer (quick poll in -dev showed 4:0 votes for the new
behavior)
The code is IMO a lot cleaner and more readable, also ververify and vercmp now
use the same regexp to check version strings. Speed of the old and the new code
is about the same. Last but not least the new code is about 50% shorter.
It seems very good. I've noticed three things (But I didn't run the code, so maybe my fault..) 1. 1.0.0_{pre/alpha...} wouldn't match. regrex should be like re.compile("^(cvs-)?(\\d+)((\\.\\d+)*)([a-zA-Z]?)(_(pre|p|beta|alpha|rc)\\d*)?(\ -r(\\d+))?$") If you use the regrex, you might need to change another lines like .group(n) as well. 2. These codes are unnecessary. # Let's make life easy and use integers unless we're forced to use floats elif (vlist1[i][0] != "0" and vlist2[i][0] != "0"): list1.append(string.atoi(vlist1[i])) list2.append(string.atoi(vlist2[i])) 3. suffix_value = {"pre": -2, "p": 0, "alpha": -4, "beta": -3, "rc": -1} Maybe, "p":1 is better because "": 0 is reasonable. (I think if you regards "" as 0, this code might be more simpler.) ok, let me reply to this: 1) I don't think a suffix without a number should be valid as it will be problematic in version comparison (and if you don't need a number you can just as well use a 1) 2) they are necessary, otherwise 1.2 would be greater than 1.10 3) I just copied that from the old code, as long as the ordering is the same and the value of "p" is changed in all places the actual value shouldn't matter much hi genone, 1) there are several _pre/_alpha ebuilds in the portage tree. I don't understand it's valid or not from the developer policy document. If not, we need to fix them before the patch will be included. 2) In comparing 1.2 and 1.10 case, it will be 0.2 < 1. Is this problem? 3) Old codes are not always good, so you made this patch?. If we can make it easily and more readable, we should fix it. 1) do you have a list of these packages ? I think this would be -dev material 2) without that case we would have 1.2 > 1.10 (as 0.2 > 0.10, it's a string concatenation, not an addition), the old code assured the right comparison by adding trailing zeroes until both numbers had the same lengths before adding the decimal point 3) as I said, I don't care much about the actual value as long as the ordering is the same (forget the comment about "in all places" I made before). Created attachment 25711 [details]
A list of current ebuilds that contain an unnumbered endversion
1) I talked with carpaski about this. He said _alpha/_pre was valid. If you have idea of this, can you talk with him? 2) I don't understand when the logic is needed. Can you show me the example case? I'd like to include this patch to portage. 2) The same logic is actually used in the existing code. The code that is outlined above ensures that a version string 2.6.1 is less than 2.6.10 (to use kernel versions as the example) by comparing the final component as the integers 1 and 10 respectively. If the code is omitted, the float-method that follows will convert the final components to .1 and .10 respectively causing 2.6.1 to equal 2.6.10. I thought like this.. 1 -> 0.1 10 -> 1 then, 0.1 < 1 When floats are used, the integer component is always 0, so that 10 and 01 become .1 and .01. Essentially, the code ensures that two different strings are never equal. I have not found any examples in portage, which is a good thing, but as an example use the version string "1.1" and "1.01". This is unusual because "1.1" would normally be "1.10". These would be considered equal without the code. To sum up, another example :) ver1="2.1.5" ver2="2.01.50" vlist1=["2","1","5"] vlist2=["2","01","50"] # Without integers the third element is equal: list1=[.2,0.1,0.5] list2=[.2,0.01,0.5] # Without floats the second element is equal: list1=[2,1,5] list2=[2,1,50] # With both: list1=[2,0.1,5] list2=[2,0.01,50] OK. I understand that now :) Thanks. 1. You can't ignore old used syntax. Installed packages will more than likely break portage. So the no-number syntax on suffixes needs to be supported. 2. The code doesn't match portage in how it compares ending letters. Justify the differences and/or check all the usages. Here's some code I wrote to test this stuff out... First run takes a while, but it caches the versions when it finishes. http://gentoo.twobit.net/portage/check_vers/ [ver1, ver2, portage_result, genone_result] >>> import portage >>> portage.pickle_read("bad_list.pickle")[0] ['0.2', '0j', -106, 2] >>> portage.pickle_read("bad_list.pickle")[500] ['2.2a', '2.2.13', 97, -13] 1.) Yes, I realized that and it's not really a problem to support the old syntax, will change the patch 2.) As I said in comment #1 the new behavior seems to have more support by devs (IIRC the people who answered were vapier, weeve, ciaranm and bazik). The ebuild docs don't cover this case, so nobody should rely on the current behavior. The logic behind this is that 2b should be the same as 2b.0.0 which would be newer than 2.1.0 (as 2b > 2). I also wrote a little test script to check for real problems (only comparing versions in a package, not between different packages) using your code as a template, which results in only 2 actual problems (app-arch/arj and media-video/kavi2svcd). Personally I don't care, that's why I asked other people in the first place. I'll put this on -dev for further discussion. Created attachment 26490 [details]
code which allows numberless suffixes
Created attachment 26491 [details]
test script
takes only a few seconds to run, so caching isn't required.
Looking at these results, I see that current portage results in 2.2a > 2.2.13 and the rewrite does the reverse.
[ver1, ver2, portage_result, genone_result]
['0.2', '0j', -106, 2]
['2.2a', '2.2.13', 97, -13]
I attempted this with the unit testing as well (which is currently based on 2.0.47 I believe).
Here it is with 2.0.50-r1:
>>> import portage
>>> portage.vercmp("1.0b","1.0.1")
98
Your regexp would allow "1.0.0_p_pre1_alpha"(example). Jason: yeah, just was about to change my code when I realized that I had it all backwards in my last comment :-/ So the rewrite handles the case foo-1.0.1 vs foo-1.0b correctly if we take arj and kavi2svcd as guideline (according to the dates in the Changelog). Nick, Nakano: want to commit? Docs-Team: can we add a notice about this in the ebuild docs? Created attachment 26883 [details, diff]
Patch to gentoo-howto.xml
This patch adds information to the Gentoo Development Howto (in which ebuilds
are explained). It adds the notion that (1) x.yz is less recent than x.y.t and
that (2) one can use a combination of suffixes.
Is this correct? And are we talking here about the suffixes reflecting upstream
versions or also about the Gentoo-specific ones (i.e. those after -r#)?
There is a logic error in the new code. I just re-wrote porthole's version string sorting code to follow(use) this new code. Of course it is somewhat modified, but follows nearly all the same steps. The error is in assuming the when there is a lack of a suffix in a version you are appending a suffix number of "0". It should be the highest number possible for a suffix. I changed my code to pad an empty suffix with "9999999999" (10 digits in length). I found out by stumbling upon dev-embedded/sdcc package that uses dates as the suffix number. Here is the version list: ['dev-embedded/sdcc-2.4.0_p20040507', 'dev-embedded/sdcc-2.4.0_p20040331', 'dev-embedded/sdcc-2.4.0', 'dev-embedded/sdcc-2.4.0_p20040304'] Since all -2.4.0_pXXXXXXXX versions are leading up to 2.4.0 , then versions without a suffix should get the highest suffix rank possible. and the sorted list: ['dev-embedded/sdcc-2.4.0_p20040304', 'dev-embedded/sdcc-2.4.0_p20040331', 'dev-embedded/sdcc-2.4.0_p20040507', 'dev-embedded/sdcc-2.4.0'] Your new code snipit: for i in range(0, max(len(list1), len(list2))): if len(list1) <= i: s1 = ("p","0") else: s1 = suffix_regexp.match(list1[i]).groups() if len(list2) <= i: s2 = ("p","0") Should become: for i in range(0, max(len(list1), len(list2))): if len(list1) <= i: s1 = ("p","9999999999") else: s1 = suffix_regexp.match(list1[i]).groups() if len(list2) <= i: s2 = ("p","9999999999") Brian Dolbec <dol-sen> Created attachment 31566 [details]
Porthole's new string sort routine
This is the modified code for the portholes sort routine which I re-wrote to
use this new code logic/methods. It seems to work quite nicely.
Brian
I have one more question. Masatomo Nakano wrote: Your regexp would allow "1.0.0_p_pre1_alpha"(example). If I try to include that version string, portage.catpkgsplit(ebuild) returns an empty string. The new code then returns None from this code. match1 = ver_regexp.match(val1) # checking that the versions are valid if not match1 or not match1.groups(): dprint("!!! syntax error in version:") dprint(val1) return None This is the proper behaviour isn't it? Brian, With the versions of dev-embedded/sdcc that you mentioned, the correct order should be: ['dev-embedded/sdcc-2.4.0', 'dev-embedded/sdcc-2.4.0_p20040304', 'dev-embedded/sdcc-2.4.0_p20040331', 'dev-embedded/sdcc-2.4.0_p20040507'] The _p suffix is short for patch level. If the dated ebuilds are in fact earlier than the final then it is a bug in the naming scheme. The _pre, _rc, _beta and _alpha suffixes exist for that purpose. With regard to the "1.0.0_p_pre1_alpha" ability, it was talked about on IRC and decided against. The versioning rewrite should match the prior functionality exactly. Jason (or anyone else), can you send me a log of that conversation, I can't remember it and the main point I started this rewrite was to extend the versioning system. I don't have a log, but I seem to remember Nick saying something to that effect. I may be wrong. I had assumed that the extended version wasn't working because you had intentionally adjusted it so. If that's not the case, I guess it's a bug. ;) *** Bug 51135 has been marked as a duplicate of this bug. *** Created attachment 31767 [details]
include changed pkgsplit() function, changed cvs versioning syntax
Ok, I forgot to fix the pkgsplit function, and thinking about it portage would
likely get problems with the new cvs versioning code and old -cvs ebuilds, so I
changed the syntax for cvs versions a bit.
Two short remarks 1. mysql-4.1.6-gamma 2. Couldn't the comparison algorithm be overridable from each package? Make the portage default clean and easy and leave it up to the problematic packages to provide an algorithm for their needs. Think Strategy Pattern. 2. Couldn't the comparison algorithm be overridable from each package? Make the portage default clean and easy and leave it up to the problematic packages to provide an algorithm for their needs. Think Strategy Pattern. Err.. No. Think "massive pita trying to export pythonic code out of a bash ebuild, and into a core portion of portage's pkg handling, on a per pkg basis"... Side note, documenting the allowed version components would be a good thing... A bunch of stuff is already documented in ebuild(5). See e.g. the "Atom Versions" section. Re: Think "massive pita trying to export pythonic code out of a bash ebuild, and into a core portion of portage's pkg handling, on a per pkg basis"... Of course, a really simple (but probably hopelessly slow) sloution would be to just deklare version relationships, without any algorithm. Either as a linked-list-of-ebuild kind of thing, or a simple list in the package dir. Fixed on or before 2.0.51.22-r1 Looking through the batch of bugs, I'm not sure that some of these are actually fixed in stable. Others, the requirements have possibly changed after the initial fix was committed. If you think this bug has been closed incorrectly, please reopen or ask that it be reopened. Yeah, this one definitely wasn't in .51. Now in trunk. Released in 2.1_pre1. *** Bug 9202 has been marked as a duplicate of this bug. *** |