Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 603994 - sys-apps/portage: memoize vercmp function
Summary: sys-apps/portage: memoize vercmp function
Status: RESOLVED DUPLICATE of bug 732378
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All All
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2016-12-29 07:45 UTC by Zac Medico
Modified: 2022-03-05 03:28 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
vercmp: memoize results (vercmp-memoize-results-bug-603994.patch,2.83 KB, patch)
2016-12-29 20:48 UTC, Zac Medico
Details | Diff
vercmp: memoize results (vercmp-memoize-results-bug-603994.patch,3.52 KB, patch)
2016-12-29 23:39 UTC, Zac Medico
Details | Diff
vercmp: memoize results (vercmp-memoize-results-bug-603994.patch,6.52 KB, patch)
2017-01-02 03:41 UTC, Zac Medico
Details | Diff
vercmp: memoize results (vercmp-memoize-results-bug-603994.patch,6.52 KB, patch)
2017-01-02 04:18 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zac Medico gentoo-dev 2016-12-29 07:45:44 UTC
The general idea is proposed here:

[03:44] * slyfox (~slyfox@gentoo/developer/slyfox) has joined
[03:45] <slyfox> decided to have a peek who portage takes 7 minutes to do nothing on my system
[03:45] <slyfox> most of the time is spent in regexp matching in vercmp function
[03:47] <slyfox> http://dpaste.com/04GKZ0Z
[03:48] <slyfox> would be nice if portage converted string version representation into list of integers only once :)
[04:44] <slyfox> this lazy hack seems to shave off a minute of runtime: http://dpaste.com/2DV7KG9.txt


contents of the paste:

diff --git a/pym/portage/versions.py b/pym/portage/versions.py
index a028d9353..77b6732d7 100644
--- a/pym/portage/versions.py
+++ b/pym/portage/versions.py
@@ -270,2 +270,12 @@ def vercmp(ver1, ver2, silent=1):

+def memoize(f):
+    memo = {}
+    def helper(vl,vr):
+        if (vl,vr) not in memo:
+            memo[(vl,vr)] = f(vl,vr)
+        return memo[(vl,vr)]
+    return helper
+
+vercmp = memoize(vercmp)
+
 def pkgcmp(pkg1, pkg2):

I prefer not to memoize with global variables, since that turns into a
global memory leak for some use-cases. I would create a class like this and have vercmp save the cache as an attribute of the version string:

if sys.hexversion < 0x3000000:
    _unicode = unicode
else:
    _unicode = str

class _version(_unicode):
    __slots__ = ('_vercmp_cache',)

If the vercmp function is passed two _version instances having separate caches, it can merge their caches together.
Comment 1 Zac Medico gentoo-dev 2016-12-29 10:58:08 UTC
There's a patch in the following branch:

https://github.com/zmedico/portage/tree/bug_603994
Comment 2 Zac Medico gentoo-dev 2016-12-29 11:00:29 UTC
@slyflox, can you test my patch to see how the performance compares to your patch?
Comment 3 Zac Medico gentoo-dev 2016-12-29 20:48:22 UTC
Created attachment 457858 [details, diff]
vercmp: memoize results
Comment 4 Zac Medico gentoo-dev 2016-12-29 23:39:40 UTC
Created attachment 457896 [details, diff]
vercmp: memoize results

* optimize for vercmp(ver1, ver2) == -1 * vercmp(ver2, ver1)
* make depgraph.schedulerGraph() clear the vercmp cache(s)
Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev 2016-12-30 15:18:11 UTC
I didn't test against my patch but did test against vanilla portage.

My command:

$ time emerge -vuDN @world @preserved-rebuild --keep-going --jobs=2 --with-bdeps=y --complete-graph -p --backtrack=100

NO PATCH:    5 minutes 50 seconds +/- 5 seconds
Zac's PATCH: 5 minutes 20 seconds +/- 5 seconds
Comment 6 Zac Medico gentoo-dev 2017-01-02 03:41:19 UTC
Created attachment 458274 [details, diff]
vercmp: memoize results

(In reply to Sergei Trofimovich from comment #5)
> I didn't test against my patch but did test against vanilla portage.
> 
> My command:
> 
> $ time emerge -vuDN @world @preserved-rebuild --keep-going --jobs=2
> --with-bdeps=y --complete-graph -p --backtrack=100
> 
> NO PATCH:    5 minutes 50 seconds +/- 5 seconds
> Zac's PATCH: 5 minutes 20 seconds +/- 5 seconds

I've updated the patch to use a config._vercmp_cache attribute which serves as a shared cache that dbapi instances propagate to the _pkg_str instances that they create. This should give some more cache hits.
Comment 7 Zac Medico gentoo-dev 2017-01-02 04:18:32 UTC
Created attachment 458276 [details, diff]
vercmp: memoize results

Fixed bug in bintree.py.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-03-05 03:28:50 UTC
We handled this in bug 732378.

*** This bug has been marked as a duplicate of bug 732378 ***