Summary: | [TRACKER] sys-apps/portage: use database(s) with mmap and zero-copy support to reduce memory footprint | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Zac Medico <zmedico> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | CONFIRMED --- | ||
Severity: | enhancement | CC: | flx.bier, gentoo, kingjon3377, mattst88, sam |
Priority: | Normal | Keywords: | Tracker |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=660860 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 835380 |
Description
Zac Medico
2021-05-02 20:22:33 UTC
There is possibly some support also in SQLite: https://sqlite.org/mmap.html I want to create a something like memcached or redis that's entirely based on files and uses zero copy. I'm not sure if portage will use it or not, but it's a related zero-copy / mmap idea.
This is the related discussion from #gentoo-portage today:
> [13:25:51] <zmedico> adelks: are you making heavy use of mmap? I want portage to use mmap more...
> [13:26:16] <adelks> zmedico: not at all as I don't even know what it is
> [13:26:18] <zmedico> heaps are overrated and mmap is awesome
> [13:26:27] <adelks> xD
> [13:27:46] <zmedico> adelks: see bug 787770
> [13:27:48] <willikins> https://bugs.gentoo.org/787770 "[TRACKER] sys-apps/portage: use database(s) with mmap support to reduce memory footprint"; Portage Development, Core; CONF; zmedico:dev-portage
> [13:29:30] <zmedico> adelks: the idea is that you try to access most things in a zero-copy mmap sort of way, which is basically as efficient as you can get
> [13:30:44] <adelks> zmedico: I am aware of memoryviews in Cython, does this concept exist in compiled languages ?
> [13:30:53] <adelks> I will read the bug report
> [13:31:23] <zmedico> I actually want to implement a something like memcached or redis that's entirely based on files and uses zero copy
> [13:31:33] <adelks> But in any case, when I will implement multi-threading, the database will for sure be shared in memory
> [13:32:23] <adelks> zmedico: that could improve emerge without even touching its code right ?
> [13:32:40] <adelks> would it be the same if the repository be loaded in ramfs ?
> [13:32:59] <adelks> I suppose not, although it would help ?
> [13:33:50] <zmedico> ramfs won't help
> [13:34:27] <adelks> got it zmedico
> [13:34:35] <zmedico> adelks: yeah if we change the underlying portage APIs to utilize mmeap then we don't have to touch a lot of code
> [13:34:40] <zmedico> *mmap
> [13:38:22] <zmedico> mmap effectivly offloads more of the memory management to the OS
> [13:39:01] <adelks> zmedico: in my code, I want to read the files from the disk exactly once
> [13:39:37] <adelks> I was thinking of the RAM usage for that, it shouldn't be much ? I mean at most there are like 50 000 packages ? 100kb of metadata each ?
> [13:40:13] <zmedico> if you use mmap, access files just like they're RAM
> [13:40:33] <adelks> I was reding the doc https://docs.python.org/3/library/mmap.html now I understand better
> [13:40:53] <zmedico> an the OS *will* cache them in RAM when appropriate
> [13:41:19] <zmedico> so mmap gives you similar results to caching in RAM
> [13:41:34] <zmedico> but without eating heap memory
> [13:42:17] <zmedico> and it's zero-copy, which is *faster* than reading things into RAM!!!
> [13:42:49] <adelks> I wonder how filesystems intervene into this grand scheme
> [13:43:52] <zmedico> filesystems can trigger... SIGBUS ad noted here: https://bugs.gentoo.org/787770#c0
> [13:44:51] <zmedico> so you don't want to shrink mmaped files
> [13:45:17] <zmedico> just don't, it's easy ;-P
> [13:46:36] <zmedico> portage overwrites cache files via atomic rename
> [13:47:15] <zmedico> so hopefully you'd never see a SIGBUS for a mmaped portage cache file
> [13:48:05] <zmedico> and if you saw one one day you'd be surprised :-D
|