It would be useful to mmap package metadata cache entries in order to eliminate package metadata from the heap.
For parallization of dependency calculations (bug 660860), we'll probably want to use a DB with mmap support to store the dependency calculation where concurrent processes can efficiently collaborate on it.
LMDB is a candidate since it supports mmap, and we should beware that writemap=True with an undersized map_size value will trigger SIGBUS in concurrent processes:
> What we have here is two bugs, one in py-lmdb and one in lmdb. The
> first bug is that a non-zero default value of map_size on Environment is
> inappropriate. Passing 0 is generally the right answer most of the time.
> That bug triggers a second bug in the underlying lmdb where opening a
> database with write_map=True and explicitly specifying a map_size too
> small will ftruncate the file out from underneath another open process.
There is possibly some support also in SQLite:
I want to create a something like memcached or redis that's entirely based on files and uses zero copy. I'm not sure if portage will use it or not, but it's a related zero-copy / mmap idea.
This is the related discussion from #gentoo-portage today:
> [13:25:51] <zmedico> adelks: are you making heavy use of mmap? I want portage to use mmap more...
> [13:26:16] <adelks> zmedico: not at all as I don't even know what it is
> [13:26:18] <zmedico> heaps are overrated and mmap is awesome
> [13:26:27] <adelks> xD
> [13:27:46] <zmedico> adelks: see bug 787770
> [13:27:48] <willikins> https://bugs.gentoo.org/787770 "[TRACKER] sys-apps/portage: use database(s) with mmap support to reduce memory footprint"; Portage Development, Core; CONF; zmedico:dev-portage
> [13:29:30] <zmedico> adelks: the idea is that you try to access most things in a zero-copy mmap sort of way, which is basically as efficient as you can get
> [13:30:44] <adelks> zmedico: I am aware of memoryviews in Cython, does this concept exist in compiled languages ?
> [13:30:53] <adelks> I will read the bug report
> [13:31:23] <zmedico> I actually want to implement a something like memcached or redis that's entirely based on files and uses zero copy
> [13:31:33] <adelks> But in any case, when I will implement multi-threading, the database will for sure be shared in memory
> [13:32:23] <adelks> zmedico: that could improve emerge without even touching its code right ?
> [13:32:40] <adelks> would it be the same if the repository be loaded in ramfs ?
> [13:32:59] <adelks> I suppose not, although it would help ?
> [13:33:50] <zmedico> ramfs won't help
> [13:34:27] <adelks> got it zmedico
> [13:34:35] <zmedico> adelks: yeah if we change the underlying portage APIs to utilize mmeap then we don't have to touch a lot of code
> [13:34:40] <zmedico> *mmap
> [13:38:22] <zmedico> mmap effectivly offloads more of the memory management to the OS
> [13:39:01] <adelks> zmedico: in my code, I want to read the files from the disk exactly once
> [13:39:37] <adelks> I was thinking of the RAM usage for that, it shouldn't be much ? I mean at most there are like 50 000 packages ? 100kb of metadata each ?
> [13:40:13] <zmedico> if you use mmap, access files just like they're RAM
> [13:40:33] <adelks> I was reding the doc https://docs.python.org/3/library/mmap.html now I understand better
> [13:40:53] <zmedico> an the OS *will* cache them in RAM when appropriate
> [13:41:19] <zmedico> so mmap gives you similar results to caching in RAM
> [13:41:34] <zmedico> but without eating heap memory
> [13:42:17] <zmedico> and it's zero-copy, which is *faster* than reading things into RAM!!!
> [13:42:49] <adelks> I wonder how filesystems intervene into this grand scheme
> [13:43:52] <zmedico> filesystems can trigger... SIGBUS ad noted here: https://bugs.gentoo.org/787770#c0
> [13:44:51] <zmedico> so you don't want to shrink mmaped files
> [13:45:17] <zmedico> just don't, it's easy ;-P
> [13:46:36] <zmedico> portage overwrites cache files via atomic rename
> [13:47:15] <zmedico> so hopefully you'd never see a SIGBUS for a mmaped portage cache file
> [13:48:05] <zmedico> and if you saw one one day you'd be surprised :-D