Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 890907 - Unpack and prepare binpkgs for merging in parallel
Summary: Unpack and prepare binpkgs for merging in parallel
Status: UNCONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Binary packages support (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 835380
  Show dependency tree
 
Reported: 2023-01-15 06:52 UTC by Michael Jones
Modified: 2023-05-20 08:00 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Jones 2023-01-15 06:52:09 UTC
Consider systems that have very slow single-core performance, but lots of coresm such as aarch64 systems.

I have such a system, and have portage build binpkgs of all installed packages.

If I run `emerge --emptytree @world`, I get around 200 binpkgs queued for installation. This is great coverage of binary packages, but what I observe is that this takes a *very long time*, hours, even though we're only dealing with a few GB worth of extracted files.

Watching the list of processes in htop, i see that portage is scheduling the decompression of these binary packages in about the same order that it would do the installation of packages from source.

I propose that binpkgs should have the extraction/decompression done without regard to the installation order, and in parallel up to the number of --jobs, and that the decompression/extraction should proceed ahead of the installation of the decompressed/extracted packages. Installation of the extracted/decompressed files can still happen in the normal order.



Reproducible: Always
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-01-15 11:55:10 UTC
What compression are you using? Even on slower machines, often you can use e.g. zstd and get very low resource usage & fast decompression

.... but also, you could just use a format for binpkgs which supports parallel compression and decompression, like xz?
Comment 2 Fabian Groffen gentoo-dev 2023-01-15 14:15:34 UTC
I think it's much more that portage should try to unpack X binpkgs at a time (jobs?) and only serialise the actual "merge to live-fs and VDB-update".  I wonder if the existing parallel emerge code would actually achieve this.  With the vast amount of cores available, the disk IO should be completely saturated since creating and moving files should be the main bottle-neck here.  With the final merge and VDB update requiring a serial execution style you want to burn as much IOs as you can in preparational steps (unpacking).
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-01-15 14:20:03 UTC
That's a fair point.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-01-15 14:20:42 UTC
(In reply to Sam James from comment #3)
> That's a fair point.

(+ qmerge being so much faster for doing this kind of shows the potential here, even though it's not parallel IIRC).
Comment 5 Fabian Groffen gentoo-dev 2023-01-15 14:24:02 UTC
(In reply to Sam James from comment #4)
> (In reply to Sam James from comment #3)
> > That's a fair point.
> 
> (+ qmerge being so much faster for doing this kind of shows the potential
> here, even though it's not parallel IIRC).

Correct, qmerge is completely single-threaded at this point.  If it is faster than Portage, the overhead likely lives in filesystem interaction and mask calculations.
Comment 6 Michael Jones 2023-01-15 17:48:23 UTC
(In reply to Fabian Groffen from comment #2)
> I think it's much more that portage should try to unpack X binpkgs at a time
> (jobs?) and only serialise the actual "merge to live-fs and VDB-update".

This is what I'm suggesting, yes.
Comment 7 Michael Jones 2023-01-15 17:49:43 UTC
(In reply to Sam James from comment #1)
> What compression are you using? Even on slower machines, often you can use
> e.g. zstd and get very low resource usage & fast decompression

I've used different compression, and settings, you're right that zstd is faster, but it still bottlenecks on the same order of package installation, just faster.
Comment 8 Sheng Yu 2023-01-16 00:19:02 UTC
The current one is decompressing the binpkg after pkg_setup, which I don't think it is easy to move to another queue.
You can try "parallel-install" feature, if it does do it in parallel. (and the issue in the above link)