It would be useful to mmap package metadata cache entries in order to eliminate package metadata from the heap. For parallization of dependency calculations (bug 660860), we'll probably want to use a DB with mmap support to store the dependency calculation where concurrent processes can efficiently collaborate on it. LMDB is a candidate since it supports mmap, and we should beware that writemap=True with an undersized map_size value will trigger SIGBUS in concurrent processes: https://github.com/jnwatson/py-lmdb/issues/269#issuecomment-729750375 > What we have here is two bugs, one in py-lmdb and one in lmdb. The > first bug is that a non-zero default value of map_size on Environment is > inappropriate. Passing 0 is generally the right answer most of the time. > > That bug triggers a second bug in the underlying lmdb where opening a > database with write_map=True and explicitly specifying a map_size too > small will ftruncate the file out from underneath another open process.
There is possibly some support also in SQLite: https://sqlite.org/mmap.html
I want to create a something like memcached or redis that's entirely based on files and uses zero copy. I'm not sure if portage will use it or not, but it's a related zero-copy / mmap idea. This is the related discussion from #gentoo-portage today: > [13:25:51] <zmedico> adelks: are you making heavy use of mmap? I want portage to use mmap more... > [13:26:16] <adelks> zmedico: not at all as I don't even know what it is > [13:26:18] <zmedico> heaps are overrated and mmap is awesome > [13:26:27] <adelks> xD > [13:27:46] <zmedico> adelks: see bug 787770 > [13:27:48] <willikins> https://bugs.gentoo.org/787770 "[TRACKER] sys-apps/portage: use database(s) with mmap support to reduce memory footprint"; Portage Development, Core; CONF; zmedico:dev-portage > [13:29:30] <zmedico> adelks: the idea is that you try to access most things in a zero-copy mmap sort of way, which is basically as efficient as you can get > [13:30:44] <adelks> zmedico: I am aware of memoryviews in Cython, does this concept exist in compiled languages ? > [13:30:53] <adelks> I will read the bug report > [13:31:23] <zmedico> I actually want to implement a something like memcached or redis that's entirely based on files and uses zero copy > [13:31:33] <adelks> But in any case, when I will implement multi-threading, the database will for sure be shared in memory > [13:32:23] <adelks> zmedico: that could improve emerge without even touching its code right ? > [13:32:40] <adelks> would it be the same if the repository be loaded in ramfs ? > [13:32:59] <adelks> I suppose not, although it would help ? > [13:33:50] <zmedico> ramfs won't help > [13:34:27] <adelks> got it zmedico > [13:34:35] <zmedico> adelks: yeah if we change the underlying portage APIs to utilize mmeap then we don't have to touch a lot of code > [13:34:40] <zmedico> *mmap > [13:38:22] <zmedico> mmap effectivly offloads more of the memory management to the OS > [13:39:01] <adelks> zmedico: in my code, I want to read the files from the disk exactly once > [13:39:37] <adelks> I was thinking of the RAM usage for that, it shouldn't be much ? I mean at most there are like 50 000 packages ? 100kb of metadata each ? > [13:40:13] <zmedico> if you use mmap, access files just like they're RAM > [13:40:33] <adelks> I was reding the doc https://docs.python.org/3/library/mmap.html now I understand better > [13:40:53] <zmedico> an the OS *will* cache them in RAM when appropriate > [13:41:19] <zmedico> so mmap gives you similar results to caching in RAM > [13:41:34] <zmedico> but without eating heap memory > [13:42:17] <zmedico> and it's zero-copy, which is *faster* than reading things into RAM!!! > [13:42:49] <adelks> I wonder how filesystems intervene into this grand scheme > [13:43:52] <zmedico> filesystems can trigger... SIGBUS ad noted here: https://bugs.gentoo.org/787770#c0 > [13:44:51] <zmedico> so you don't want to shrink mmaped files > [13:45:17] <zmedico> just don't, it's easy ;-P > [13:46:36] <zmedico> portage overwrites cache files via atomic rename > [13:47:15] <zmedico> so hopefully you'd never see a SIGBUS for a mmaped portage cache file > [13:48:05] <zmedico> and if you saw one one day you'd be surprised :-D