Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 890979 - emerge should optionally provide explanation of why a package is compiled from source instead of installed via binpkg
Summary: emerge should optionally provide explanation of why a package is compiled fro...
Status: UNCONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Binary packages support (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-15 20:37 UTC by Michael Jones
Modified: 2024-02-17 06:43 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Jones 2023-01-15 20:37:14 UTC
I have a gentoo image generator that runs on jenkins. This generator builds the same set of packages every time, and stores the binpkgs for the next installation.

I see about a 50% hit rate.

Some of that can be explained by using the latest gentoo overlay, so package updates would invalidate the binpkg's dependencies. This is expected, and the behavior i want.

But i don't think that this explains all of the skipped binpkgs.

When I run emerge, i use --verbose --tree --unordered-display, to see a nice printout of the dependency relationship between the packages being built.

But this printout doesn't tell me why a package that i know i have a binpkg for is being built from source.

I wish there was some flag i could tell emerge that would make it report this information as part of the "--tree" display.
Comment 1 John Helmert III archtester Gentoo Infrastructure gentoo-dev Security 2023-01-15 20:53:25 UTC
I suppose you're not running with --getbinpkgonly or --usepkgonly? That will get you around issues like the tree state being out of sync between binpkg producers/consumers by making the consumers use the available binpkgs rather than requiring a binpkg for the current tree state.
Comment 2 Michael Jones 2023-01-16 02:24:05 UTC
The point of the image builder running in jenkins is to produce a fully up to date image. The binpkgs are only there as an optimization if and only if a particular package has no changed dependencies or use flags. So getpkgonly and usepkgonly wouldn't fit the goal of the up to date image.
Comment 3 John Helmert III archtester Gentoo Infrastructure gentoo-dev Security 2023-01-16 02:29:59 UTC
In that case, I'm not sure I see the utility in the output you're asking for for an automated binary package generation system?
Comment 4 Alec Warner (RETIRED) archtester gentoo-dev Security 2023-01-16 04:10:12 UTC
(In reply to John Helmert III from comment #3)
> In that case, I'm not sure I see the utility in the output you're asking for
> for an automated binary package generation system?

OP is basically suggesting he has an expensive task (an image build with lots of packages in it.)
OP uses binpkgs to reduce task time and cost (binpkgs are cheaper, and faster.)
OP observes a 50% cache hit rate of binpkgs usage between similar image builds.
OP is expecting a higher hit rate, but doesn't really have tools to debug why the hit rate is so low, because portage does not really tell him *why* a binary package was not suitable to use. (here OP notes that some miss rate is expected due to ebuild churn invalidating entries.)

The "automated binary package generation system" is just a tool OP is using to reduce image time, and obviously if OP can get the cache hit rate higher, they can reduce the time further.

I wouldn't actually expect "emerge" to do this task, but it might be nice to have a tool that did. This has nothing to do with merging packages at all, and instead is just debugging visibility, IMHO.

`<sometool> --show-visibility <atom>` for example might show you all Pkg objects from all repos (vdb-installed, bin-pkg, repos.conf) that match atom.
`<sometool> --best-visible <atom> --verbose` for example might do what op wants, and display all the available package objects for atom, and try to explain why a given atom was 'best'; and this tool would then help OP improve their hit rate.

Then they could do something like:
  Run emerge -evtp --unordered to get a list of atoms to be installed and their type (ebuild vs binpkg.) Save that list:
  Iterate over the list and run `<sometool --best-visible <atom> --verbose` for each atom in the list, and record the reason the ebuild is selected.
  sort and count the reasons to know how to attack their caching problem.

All of this is dependent on some tooling and sane output.
Comment 5 Michael Jones 2023-01-16 05:17:16 UTC
(In reply to Alec Warner from comment #4)
> (In reply to John Helmert III from comment #3)
> > In that case, I'm not sure I see the utility in the output you're asking for
> > for an automated binary package generation system?
> 
> OP is basically suggesting he has an expensive task (an image build with
> lots of packages in it.)
> OP uses binpkgs to reduce task time and cost (binpkgs are cheaper, and
> faster.)
> OP observes a 50% cache hit rate of binpkgs usage between similar image
> builds.
> OP is expecting a higher hit rate, but doesn't really have tools to debug
> why the hit rate is so low, because portage does not really tell him *why* a
> binary package was not suitable to use. (here OP notes that some miss rate
> is expected due to ebuild churn invalidating entries.)
> 
> The "automated binary package generation system" is just a tool OP is using
> to reduce image time, and obviously if OP can get the cache hit rate higher,
> they can reduce the time further.
> 
> I wouldn't actually expect "emerge" to do this task, but it might be nice to
> have a tool that did. This has nothing to do with merging packages at all,
> and instead is just debugging visibility, IMHO.
> 
> `<sometool> --show-visibility <atom>` for example might show you all Pkg
> objects from all repos (vdb-installed, bin-pkg, repos.conf) that match atom.
> `<sometool> --best-visible <atom> --verbose` for example might do what op
> wants, and display all the available package objects for atom, and try to
> explain why a given atom was 'best'; and this tool would then help OP
> improve their hit rate.
> 
> Then they could do something like:
>   Run emerge -evtp --unordered to get a list of atoms to be installed and
> their type (ebuild vs binpkg.) Save that list:
>   Iterate over the list and run `<sometool --best-visible <atom> --verbose`
> for each atom in the list, and record the reason the ebuild is selected.
>   sort and count the reasons to know how to attack their caching problem.
> 
> All of this is dependent on some tooling and sane output.

This is an accurate elaboration, for the most part.

I'm wanting diagnostic tools. 

> I wouldn't actually expect "emerge" to do this task, but it might be nice to
> have a tool that did. This has nothing to do with merging packages at all,
> and instead is just debugging visibility, IMHO.

Emerge has output explaining why a package rebuild is occurring, including explaining which dependency package is causing the package rebuild.

Emerge also has output telling me that it's merging a binary package.

When I ask emerge to install packages, i want some `--ultraverbosebinpkg` that i can enable that'll say "Compiling X from source instead of installing candidate binpkg because the dependencies are different" or something.

If it were done as a separate tool, then that separate tool would need to perform all of the same dependency / useflag computations that have to be done when doing emerge in the first place.

Further, if it were a separate tool that I had to feed the package name to query for to, it becomes unusable. This system is automated (It runs on a jenkins server), no human has the ability to log into the worker agent and run a diagnostic program manually. So i'd be forced to write some kind of parser for the output of emerge that then runs the separate diagnostic tool on each package. Better that emerge just do it all in one go as it's already doing all the same work.
Comment 6 Michael Jones 2023-01-16 05:17:58 UTC
You can see for yourself what I'm dealing with here: https://ci.genpi64.com/job/GenPi64/job/Build.Dist/job/master/32/
Comment 7 Michael Jones 2024-02-17 06:38:17 UTC
I built a new machine with a modern CPU recently, and have shifted from allowing each of my machine's to compile their packages locally, to instead rsyncing the /var/cache/binpkg folder from the more powerful machine before running emerge.

I keep all my machine's on identical /etc/portage settings via a git repo hosted on github.

And the new "binhost" machine was installed via rsyncing one of my other machines's harddrive onto it.

My hit-rate for installing binpkgs over things getting compiled locally is abysmal. Right now I'm looking at 20 binpkgs out of 500 ;/
Comment 8 Michael Jones 2024-02-17 06:39:02 UTC
Clarification: This is separate from the raspberry pi work I spoke about in previous comments.
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-17 06:43:39 UTC
>  When I ask emerge to install packages, i want some `--ultraverbosebinpkg` that i can enable that'll say "Compiling X from source instead of installing candidate binpkg because the dependencies are different" or something.

We already do partly have that. The default value of --binpkg-changed-deps and --binpkg-respect-use will tell you if a binpkg cannot be used for either of those reasons, but if you set either, then it will suppress the messages. Some of it may be guarded behind --verbose too.

Could you clarify which cases you're interested in which don't fall into those two?

I can think of one: on IRC, we had an example earlier where someone couldn't get the qtwebengine binpkg installed because the binpkg was built against an older libvpx, but they had libvpx in package.accept_keywords on the consumer side (but not producer). Any others we need to cater for?
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-17 06:43:56 UTC
(also, I agree that nonetheless, a "please tell me why any binpkg couldn't be used" would be useful, just explaining that there is some offering here already)