After profiling emerge, I noticed that count() was getting called quite a few times. Changing two lines in update_dbentry seems to produce a noticeable speedup. In portage_update.py in update_dbentry() change (both occurances): if mycontent.count(old_value): to if old_value in mycontent: I'm using portage 2.1.2.2 but this appears to be relevant to the current SVN as well. The new code should behave identically to the existing code. Reproducible: Always Steps to Reproduce: 1. time emerge -p file 2. apply patch 3. time emerge -p file Actual Results: Emerge is particularly slow on my system, but it improved dependency checking time considerably, from 35 seconds to 25 seconds. It may help on faster systems as well. Python 2.4.
offhand, this code path should only run as needed for emerge ops; testing wise, you need to verify also that the speed up you're seeing isn't via kernel caching; basically pre-run the ops till you get stable result, then patch, rerun avging. Either way, the count usage there is daft as jason pointed out...
It turns out that the "str in str" construct is only included in Python 2.3 and above. However, "if mycontent.find(old_value) != -1:" (the -1 is important!) works on Python 2.2 and seems to run even faster than "str in str", at least for me.
(In reply to comment #2) > It turns out that the "str in str" construct is only included in Python 2.3 and > above. Prior being has_key, rather then __contains__ I hate has_key. :) Meanwhile, portage is >=2.3; if you look through the code, quite a bit of 'in' usage, so wouldn't worry about that. Meanwhile, didn't answer the question of why this codepath is getting invoked multiple times- should be some form of mtime checks to do it only when required iirc.
(In reply to comment #0) > In portage_update.py in update_dbentry() change (both occurances): > if mycontent.count(old_value): > to > if old_value in mycontent: Thanks, that's in svn r6560. (In reply to comment #3) > Meanwhile, didn't answer the question of why this codepath is getting invoked > multiple times- should be some form of mtime checks to do it only when required > iirc. As a temporary workaround for bug 122089, update_dbentry is used for ebuilds that do not exist in the portage tree. It's done in memory, so the updates are not persistent. Generally, it's only a corner case because usually there is a live ebuild in there portage tree that the metadata is pulled from. In the future, we'll use metadata directly from the vdb, so the "in memory" approach will cease to be feasible. I was planning to handle *DEPEND updates in the vdb via and emaint module, but now that *DEPEND from the vdb is cached in /var/cache/edb/vdb_metadata.pickle, it may be feasible to update the whole vdb on the fly (based on the $PORTDIR/profiles/updates mtime checks). I'll try that and see how the performance measures up.
This has been released in 2.1.2.8.