747016 – sys-apps/portage: allow configurable lru_cache scaling

Bug 747016 - sys-apps/portage: allow configurable lru_cache scaling

Summary: sys-apps/portage: allow configurable lru_cache scaling

Status:	CONFIRMED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Core (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Portage team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	835380
	Show dependency tree

Reported:	2020-10-07 08:36 UTC by Jason A. Donenfeld
Modified:	2023-12-29 18:24 UTC (History)
CC List:	4 users (show)

See Also:	https://github.com/python/cpython/issues/69157
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jason A. Donenfeld gentoo-dev

2020-10-07 08:36:12 UTC

The recent addition of these lines resulted in a big performance boost:

lib/portage/dep/__init__.py
20:from functools import lru_cache
402:@lru_cache(1024)
lib/portage/versions.py
13:from functools import lru_cache
111:@lru_cache(1024)
309:@lru_cache(10240)

The values 1024 and 10240 were chosen for some trade-off of speed and ram usage. However, with 64 gigs of ram, my trade-offs might be different than your trade-offs. Therefore, I propose a user-configurable scaling knob, PORTAGE_LRU_CACHE_SCALING_FACTOR. These would then be used like:

    @lru_cache(1024 * PORTAGE_LRU_CACHE_SCALING_FACTOR)

This way, users with excess ram could trade it for a performance boost.


Reproducible: Always

Comment 1 Zac Medico gentoo-dev

2020-10-10 06:38:26 UTC

We can hook this into the portage.data._init(settings) function which is used to initialize things with user configuration settings.

Comment 2 Zac Medico gentoo-dev

2020-12-21 00:18:43 UTC

In https://bugs.python.org/issue24969 there's a rejected proposal to allow runtime resize of lru_cache instances, with rationale give that a lru_cache instance has a __wrapped__ attribute which can be used to unwrap and re-wrap the wrapped function.

A disadvantage of the function re-wrapping approach is that it would not be possible to replace all previous imports of the wrapped function with new lru_cache instance, so we would need to add an additional decorator to serve as a layer of indirection for cases when the wrapped function has been imported into another module. However, I expect the performance impact of this extra decorator layer to be negligible.

Comment 3 David Palao 2023-12-29 18:24:24 UTC

I did some measurements to try to determine whether it's worth to fix this or not.


Procedure
---------
 
- I found all the calls to ``lru_cache`` in portage (6, at the time of this test) and parametrized their ``maxsize`` argument with a global constant. See https://github.com/palao/portage/tree/spike/display-dependencies-cache
- I prepared a local ebuild repo with a subslot bumped perl ebuild (to trigger a large amount of rebuilds)
- the test consists in running ``emerge -p perl`` (using that ebuild)


Results
-------

I measured the wall-clock time with ``sys-process/time-1.9-r1``. Each cache ``maxsize`` was tested 3 times. The table summarizes the results:


+----------------+--------------+
| cache maxsize  | wall time [s]|
+================+==============+
|     no cache   |   62(6)      |
+----------------+--------------+
|           16   |   44(2)      |
+----------------+--------------+
|          512   |   41(2)      |
+----------------+--------------+
|         1024   |   40.2(1.1)  |
+----------------+--------------+
|         2048   |   41.4(5)    |
+----------------+--------------+
|         4096   |   39.6(1.9)  |
+----------------+--------------+
|         8192   |   40(2)      |
+----------------+--------------+
|        16384   |   36.6(8)    |
+----------------+--------------+
|        32768   |   39(2)      |
+----------------+--------------+
|        65536   |   39(3)      |
+----------------+--------------+

The values are the average of the three runs. The errors are the standard deviations. The syntax is: 40.2(1.1) == 40.2 +- 1.1

The system used for the tests is a x1 carbon (8th gen):
- Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
- 16 GB RAM


Conclusion
----------

This quick test seems to show a very limited improvement in runtime with large cache sizes beyond a clear initial gain.

Unless there are good indications against this result, I think this test tells us that it is not worth to add yet another global constant to portage to parametrize the size of the caches: it has a limited benefit for a high maintenance cost.

Maybe more extensive tests are required to take a more solid decision though.