Having run "equery belongs md5sum" which seemed to run slowly, I investigated and saw that most of the run time is due to regular expression matching. The regular expression used has some redundancy: /md5sum$|^/current/working/dir/md5sum$
Although we have two branches, the first branch will match everything the second will. The second branch is there in case a symlink in the current path points to a file with a different name. So for example, let /bin/ksh -> /bin/mksh, then executing "equery belongs ksh" in /bin will search for /ksh$|^/bin/mksh$.
I have written a patch to improve this while retaining the symlink functionality. This improves the run time of "equery belongs" by 9% for me. I have tested this only on Python 3.3 and don't know Python well myself. This was an excuse to learn it a little. This patch will need to be adjusted to work with all versions.
--- old_helpers.py 2013-07-06 11:44:38.524514791 +0100
+++ helpers.py 2013-07-06 16:03:06.241003837 +0100
@@ -338,6 +338,46 @@
+ def _filter_redundant_regexes(rs):
+ Given a list of regular expressions to be joined via "|".join(),
+ removes those branches that have non-unique suffixes:
+ ['a$', 'ba$'] --> ['a$']
+ for the purpose of improving efficiency.
+ This is used in _prepare_search_regex to make filename matching
+ more efficient.
+ @type rs: list
+ @param rs: file path regular expressions each ending with '$'
+ @rtype: list
+ @return: list with possibly fewer branches
+ @raise ValueError: if a regular expression does not end with '$'
+ # sort in place
+ rs.sort(key= len, reverse= True)
+ rs_filtered = 
+ # keep a list item only if no suffix of it is present
+ for i in range(len(rs)):
+ found = False
+ for j in range(i + 1, len(rs)):
+ if rs[i].endswith(rs[j]):
+ found = True
+ if not found:
+ if not rs[i].endswith('$'):
+ raise ValueError(
+ "_filter_redundant_regexes needs all regexes to end with '$'")
+ # we assume a search with very few hits; sort order is unimportant
+ return rs_filtered
def _prepare_search_regex(self, queries):
"""Create a regex out of the queries"""
@@ -357,7 +397,9 @@
query = "/%s$" % re.escape(query)
- result = "|".join(result)
+ # remove same redundancy here, 9% improvement in run time with "equery belongs"
+ result = "|".join( self._filter_redundant_regexes(result) )
Please attach your patch to the bug.
Created attachment 353022 [details, diff]
patch to speed up regex file search, written for Python 3
There's no need to credit me for this in the Changelog, Gentoo owns it now.