Some opportunities for parallelization might include: * parallel construction of package instances * parallel backtracking * parallel resolution of independent parts of the dependency graph
Storing the dependency calculation in a shared DB might be a useful way to enable parallelization, by acting as means for concurrent processes to collaborate on a calculation.
Also, a DB can make dependency calculations more scalable by removing the need to store the whole calculation in memory at once.
LMDB is a candidate since it supports mmap, and we should beware that writemap=True with an undersized map_size value will trigger SIGBUS in concurrent processes: https://github.com/jnwatson/py-lmdb/issues/269#issuecomment-729750375 > What we have here is two bugs, one in py-lmdb and one in lmdb. The > first bug is that a non-zero default value of map_size on Environment is > inappropriate. Passing 0 is generally the right answer most of the time. > > That bug triggers a second bug in the underlying lmdb where opening a > database with write_map=True and explicitly specifying a map_size too > small will ftruncate the file out from underneath another open process.