The general idea is proposed here: [03:44] * slyfox (~slyfox@gentoo/developer/slyfox) has joined [03:45] <slyfox> decided to have a peek who portage takes 7 minutes to do nothing on my system [03:45] <slyfox> most of the time is spent in regexp matching in vercmp function [03:47] <slyfox> http://dpaste.com/04GKZ0Z [03:48] <slyfox> would be nice if portage converted string version representation into list of integers only once :) [04:44] <slyfox> this lazy hack seems to shave off a minute of runtime: http://dpaste.com/2DV7KG9.txt contents of the paste: diff --git a/pym/portage/versions.py b/pym/portage/versions.py index a028d9353..77b6732d7 100644 --- a/pym/portage/versions.py +++ b/pym/portage/versions.py @@ -270,2 +270,12 @@ def vercmp(ver1, ver2, silent=1): +def memoize(f): + memo = {} + def helper(vl,vr): + if (vl,vr) not in memo: + memo[(vl,vr)] = f(vl,vr) + return memo[(vl,vr)] + return helper + +vercmp = memoize(vercmp) + def pkgcmp(pkg1, pkg2): I prefer not to memoize with global variables, since that turns into a global memory leak for some use-cases. I would create a class like this and have vercmp save the cache as an attribute of the version string: if sys.hexversion < 0x3000000: _unicode = unicode else: _unicode = str class _version(_unicode): __slots__ = ('_vercmp_cache',) If the vercmp function is passed two _version instances having separate caches, it can merge their caches together.
There's a patch in the following branch: https://github.com/zmedico/portage/tree/bug_603994
@slyflox, can you test my patch to see how the performance compares to your patch?
Created attachment 457858 [details, diff] vercmp: memoize results
Created attachment 457896 [details, diff] vercmp: memoize results * optimize for vercmp(ver1, ver2) == -1 * vercmp(ver2, ver1) * make depgraph.schedulerGraph() clear the vercmp cache(s)
I didn't test against my patch but did test against vanilla portage. My command: $ time emerge -vuDN @world @preserved-rebuild --keep-going --jobs=2 --with-bdeps=y --complete-graph -p --backtrack=100 NO PATCH: 5 minutes 50 seconds +/- 5 seconds Zac's PATCH: 5 minutes 20 seconds +/- 5 seconds
Created attachment 458274 [details, diff] vercmp: memoize results (In reply to Sergei Trofimovich from comment #5) > I didn't test against my patch but did test against vanilla portage. > > My command: > > $ time emerge -vuDN @world @preserved-rebuild --keep-going --jobs=2 > --with-bdeps=y --complete-graph -p --backtrack=100 > > NO PATCH: 5 minutes 50 seconds +/- 5 seconds > Zac's PATCH: 5 minutes 20 seconds +/- 5 seconds I've updated the patch to use a config._vercmp_cache attribute which serves as a shared cache that dbapi instances propagate to the _pkg_str instances that they create. This should give some more cache hits.
Created attachment 458276 [details, diff] vercmp: memoize results Fixed bug in bintree.py.
We handled this in bug 732378. *** This bug has been marked as a duplicate of bug 732378 ***