The recent addition of these lines resulted in a big performance boost: lib/portage/dep/__init__.py 20:from functools import lru_cache 402:@lru_cache(1024) lib/portage/versions.py 13:from functools import lru_cache 111:@lru_cache(1024) 309:@lru_cache(10240) The values 1024 and 10240 were chosen for some trade-off of speed and ram usage. However, with 64 gigs of ram, my trade-offs might be different than your trade-offs. Therefore, I propose a user-configurable scaling knob, PORTAGE_LRU_CACHE_SCALING_FACTOR. These would then be used like: @lru_cache(1024 * PORTAGE_LRU_CACHE_SCALING_FACTOR) This way, users with excess ram could trade it for a performance boost. Reproducible: Always
We can hook this into the portage.data._init(settings) function which is used to initialize things with user configuration settings.
In https://bugs.python.org/issue24969 there's a rejected proposal to allow runtime resize of lru_cache instances, with rationale give that a lru_cache instance has a __wrapped__ attribute which can be used to unwrap and re-wrap the wrapped function. A disadvantage of the function re-wrapping approach is that it would not be possible to replace all previous imports of the wrapped function with new lru_cache instance, so we would need to add an additional decorator to serve as a layer of indirection for cases when the wrapped function has been imported into another module. However, I expect the performance impact of this extra decorator layer to be negligible.
I did some measurements to try to determine whether it's worth to fix this or not. Procedure --------- - I found all the calls to ``lru_cache`` in portage (6, at the time of this test) and parametrized their ``maxsize`` argument with a global constant. See https://github.com/palao/portage/tree/spike/display-dependencies-cache - I prepared a local ebuild repo with a subslot bumped perl ebuild (to trigger a large amount of rebuilds) - the test consists in running ``emerge -p perl`` (using that ebuild) Results ------- I measured the wall-clock time with ``sys-process/time-1.9-r1``. Each cache ``maxsize`` was tested 3 times. The table summarizes the results: +----------------+--------------+ | cache maxsize | wall time [s]| +================+==============+ | no cache | 62(6) | +----------------+--------------+ | 16 | 44(2) | +----------------+--------------+ | 512 | 41(2) | +----------------+--------------+ | 1024 | 40.2(1.1) | +----------------+--------------+ | 2048 | 41.4(5) | +----------------+--------------+ | 4096 | 39.6(1.9) | +----------------+--------------+ | 8192 | 40(2) | +----------------+--------------+ | 16384 | 36.6(8) | +----------------+--------------+ | 32768 | 39(2) | +----------------+--------------+ | 65536 | 39(3) | +----------------+--------------+ The values are the average of the three runs. The errors are the standard deviations. The syntax is: 40.2(1.1) == 40.2 +- 1.1 The system used for the tests is a x1 carbon (8th gen): - Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz - 16 GB RAM Conclusion ---------- This quick test seems to show a very limited improvement in runtime with large cache sizes beyond a clear initial gain. Unless there are good indications against this result, I think this test tells us that it is not worth to add yet another global constant to portage to parametrize the size of the caches: it has a limited benefit for a high maintenance cost. Maybe more extensive tests are required to take a more solid decision though.