Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 586658

Summary: app-portage/gentoolkit: let eclean handle binpkgs with .xpak suffix
Product: Portage Development Reporter: Manuel Mommertz <2kmm>
Component: ToolsAssignee: Portage Tools Team <tools-portage>
Status: RESOLVED FIXED    
Severity: enhancement CC: dan, esigra
Priority: Normal Keywords: InVCS, PATCH
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 346443    
Attachments: Patch for eclean
Keep only packages with BUILD_TIME equal to installed one.
Keep only packages with BUILD_TIME equal to installed one.
optimize with portage binarytree API
optimize with portage binarytree API

Description Manuel Mommertz 2016-06-22 06:17:04 UTC
Created attachment 438390 [details, diff]
Patch for eclean

With the attached patch eclean can handle binary packges with .xpak suffix which are created with FEATURES=binpkg-multi-instance. The patched eclean is used in a production environment since several weeks now without any problems.
Comment 1 Zac Medico gentoo-dev 2016-06-23 16:33:28 UTC
(In reply to Manuel Mommertz from comment #0)
> Created attachment 438390 [details, diff] [details, diff]
> Patch for eclean

Looks good.

However, it seems that the -d/--deep will preserve multiple xpak files for the same package version. It would be nice if it would delete all but the one that is currently installed. The xpak metadata includes a BUILD_TIME variable which could be compared with the BUILD_TIME of the installed package.
Comment 2 Manuel Mommertz 2016-06-23 19:54:43 UTC
The assumption that the newest file is the right is not always correct. Think of a shared package dir where another system can add a second package with differing useflags. One has to match based on useflags.

Possible implementation:

if dbapi.cpv_exists(cpv) and (not destructive or dbapi.aux_get(cpv, "USE/IUSE/IUSE_EFFECTIVE (what to use here?) == get_same_from_binpkg:
    # exclusion because pkg still exists (in porttree or vartree)

Though this would change behavior for normal tbz2 too (currently binpkg with differing use-flags is kept)

I will try to implement and test this when I find some time (~1 week maybe)
Comment 3 Paul Varner (RETIRED) gentoo-dev 2016-06-23 20:05:21 UTC
Thanks for the second set of eyes, Zac, I had reviewed and thought it was good as well. I've gone ahead and pushed this patch to the repository and it can be tested with gentoolkit-9999

If there are any more patches to handle the --deep option better, I will review and add them as well.
Comment 4 Zac Medico gentoo-dev 2016-06-23 20:17:54 UTC
(In reply to Manuel Mommertz from comment #2)
> The assumption that the newest file is the right is not always correct.

I was not suggesting to use the "newest" file. I was suggesting to compare the BUILD_TIME metadata of the binary package to that of the installed packages. The BUILD_TIME metadata can be used to check if the binary package and the installed package have the same identity. It's also possible to use BINPKGMD5, however the md5 of a binary package changes when package moves are applied. Therefore, BUILD_TIME is more useful because it remains constant regardless of package moves.

> Think of a shared package dir where another system can add a second package
> with differing useflags. One has to match based on useflags.

Well, using -d/--deep does not make sense, unless we provide a way for eclean to see what packages are currently installed on those other systems.

> Possible implementation:
> 
> if dbapi.cpv_exists(cpv) and (not destructive or dbapi.aux_get(cpv,
> "USE/IUSE/IUSE_EFFECTIVE (what to use here?) == get_same_from_binpkg:
>     # exclusion because pkg still exists (in porttree or vartree)
> 
> Though this would change behavior for normal tbz2 too (currently binpkg with
> differing use-flags is kept)
> 
> I will try to implement and test this when I find some time (~1 week maybe)

In the context of -d/--deep, I don't think it makes any sense for eclean to look at USE flags, given that it's much simpler to compare the BUILD_TIME metadata. I suppose it could compare both, but that seems redundant.

(In reply to Paul Varner from comment #3)
> Thanks for the second set of eyes, Zac, I had reviewed and thought it was
> good as well. I've gone ahead and pushed this patch to the repository and it
> can be tested with gentoolkit-9999

Nice! This is a big improvement over the previous lack of binpkg-multi-instance support.

> If there are any more patches to handle the --deep option better, I will
> review and add them as well.

I might create a patch to implement my BUILD_TIME idea sometime this week.
Comment 5 Manuel Mommertz 2016-06-24 07:17:35 UTC
(In reply to Zac Medico from comment #4)
> (In reply to Manuel Mommertz from comment #2)
> > The assumption that the newest file is the right is not always correct.
> 
> I was not suggesting to use the "newest" file. I was suggesting to compare
> the BUILD_TIME metadata of the binary package to that of the installed
> packages.

Ok, that could work partly. Though it breaks if installation and binpkg creation are done in separate steps (maybe because the binpkg is created on an other machine).

> > Think of a shared package dir where another system can add a second package
> > with differing useflags. One has to match based on useflags.
> 
> Well, using -d/--deep does not make sense, unless we provide a way for
> eclean to see what packages are currently installed on those other systems.

We use a script that runs eclean -pdq on build systems with a shared pkg-dir to remove only packages that all systems want to clear.
 
> > Possible implementation:
> > 
> > if dbapi.cpv_exists(cpv) and (not destructive or dbapi.aux_get(cpv,
> > "USE/IUSE/IUSE_EFFECTIVE (what to use here?) == get_same_from_binpkg:
> >     # exclusion because pkg still exists (in porttree or vartree)
> > 
> > Though this would change behavior for normal tbz2 too (currently binpkg with
> > differing use-flags is kept)
> > 
> > I will try to implement and test this when I find some time (~1 week maybe)
> 
> In the context of -d/--deep, I don't think it makes any sense for eclean to
> look at USE flags, given that it's much simpler to compare the BUILD_TIME
> metadata. I suppose it could compare both, but that seems redundant.

I don't see a big difference in comparing BUILD_TIME files vs. USE files. Please tell if it isn't that simple.
Comment 6 Zac Medico gentoo-dev 2016-06-24 07:44:39 UTC
(In reply to Manuel Mommertz from comment #5)
> (In reply to Zac Medico from comment #4)
> > (In reply to Manuel Mommertz from comment #2)
> > > The assumption that the newest file is the right is not always correct.
> > 
> > I was not suggesting to use the "newest" file. I was suggesting to compare
> > the BUILD_TIME metadata of the binary package to that of the installed
> > packages.
> 
> Ok, that could work partly. Though it breaks if installation and binpkg
> creation are done in separate steps (maybe because the binpkg is created on
> an other machine).

No, it doesn't break, because the BUILD_TIME is only assigned once at build time. It remains constant, forever.

> > > Think of a shared package dir where another system can add a second package
> > > with differing useflags. One has to match based on useflags.
> > 
> > Well, using -d/--deep does not make sense, unless we provide a way for
> > eclean to see what packages are currently installed on those other systems.
> 
> We use a script that runs eclean -pdq on build systems with a shared pkg-dir
> to remove only packages that all systems want to clear.

My understanding of -d/--deep is that it removes everything except the specific packages that are installed. So running it on the shared pkg-dir from multiple systems would result in each one of those removing some packages that are not installed on the current system but might be installed on one of the other systems.

> > > Possible implementation:
> > > 
> > > if dbapi.cpv_exists(cpv) and (not destructive or dbapi.aux_get(cpv,
> > > "USE/IUSE/IUSE_EFFECTIVE (what to use here?) == get_same_from_binpkg:
> > >     # exclusion because pkg still exists (in porttree or vartree)
> > > 
> > > Though this would change behavior for normal tbz2 too (currently binpkg with
> > > differing use-flags is kept)
> > > 
> > > I will try to implement and test this when I find some time (~1 week maybe)
> > 
> > In the context of -d/--deep, I don't think it makes any sense for eclean to
> > look at USE flags, given that it's much simpler to compare the BUILD_TIME
> > metadata. I suppose it could compare both, but that seems redundant.
> 
> I don't see a big difference in comparing BUILD_TIME files vs. USE files.
> Please tell if it isn't that simple.

You can think of the BUILD_TIME as being a way to uniquely identify a particular build of a particular package. The USE flags are nothing like that. You can rebuild a package many times with the same USE flags, but the BUILD_TIME will be different for each build.
Comment 7 Manuel Mommertz 2016-06-24 12:09:30 UTC
(In reply to Zac Medico from comment #6)
> (In reply to Manuel Mommertz from comment #5)
> > (In reply to Zac Medico from comment #4)
> > > (In reply to Manuel Mommertz from comment #2)
> > > > The assumption that the newest file is the right is not always correct.
> > > 
> > > I was not suggesting to use the "newest" file. I was suggesting to compare
> > > the BUILD_TIME metadata of the binary package to that of the installed
> > > packages.
> > 
> > Ok, that could work partly. Though it breaks if installation and binpkg
> > creation are done in separate steps (maybe because the binpkg is created on
> > an other machine).
> 
> No, it doesn't break, because the BUILD_TIME is only assigned once at build
> time. It remains constant, forever.

What I meant was like:
emerge <package>; wait some time; emerge -B <package>

For this, eclean would delete the generated package.
 
> > > > Think of a shared package dir where another system can add a second package
> > > > with differing useflags. One has to match based on useflags.
> > > 
> > > Well, using -d/--deep does not make sense, unless we provide a way for
> > > eclean to see what packages are currently installed on those other systems.
> > 
> > We use a script that runs eclean -pdq on build systems with a shared pkg-dir
> > to remove only packages that all systems want to clear.
> 
> My understanding of -d/--deep is that it removes everything except the
> specific packages that are installed. So running it on the shared pkg-dir
> from multiple systems would result in each one of those removing some
> packages that are not installed on the current system but might be installed
> on one of the other systems.

Yes that should happen. Therefore we use -p to only print the list of files without cleaning them. Then our script removes the files which are on all lists.

> > > > Possible implementation:
> > > > 
> > > > if dbapi.cpv_exists(cpv) and (not destructive or dbapi.aux_get(cpv,
> > > > "USE/IUSE/IUSE_EFFECTIVE (what to use here?) == get_same_from_binpkg:
> > > >     # exclusion because pkg still exists (in porttree or vartree)
> > > > 
> > > > Though this would change behavior for normal tbz2 too (currently binpkg with
> > > > differing use-flags is kept)
> > > > 
> > > > I will try to implement and test this when I find some time (~1 week maybe)
> > > 
> > > In the context of -d/--deep, I don't think it makes any sense for eclean to
> > > look at USE flags, given that it's much simpler to compare the BUILD_TIME
> > > metadata. I suppose it could compare both, but that seems redundant.
> > 
> > I don't see a big difference in comparing BUILD_TIME files vs. USE files.
> > Please tell if it isn't that simple.
> 
> You can think of the BUILD_TIME as being a way to uniquely identify a
> particular build of a particular package. The USE flags are nothing like
> that. You can rebuild a package many times with the same USE flags, but the
> BUILD_TIME will be different for each build.

Tested with new patch. You are right, it works as expected. While with looking at USE I now have 11 binpkgs of gentoolkit which are not cleaned as all have the same useflags. I thought that emerge itself would overwrite a binpkg if USE is identical. It does not. :)
Comment 8 Manuel Mommertz 2016-06-24 12:16:45 UTC
Created attachment 438640 [details, diff]
Keep only packages with BUILD_TIME equal to installed one.

Functional works but I'm not exactly sure if the source formating is as good as it can be. Maybe some python-native might have a nicer version ;)
Comment 9 Zac Medico gentoo-dev 2016-06-24 15:45:09 UTC
Created attachment 438670 [details, diff]
Keep only packages with BUILD_TIME equal to installed one.

(In reply to Manuel Mommertz from comment #8)
> Created attachment 438640 [details, diff] [details, diff]
> Keep only packages with BUILD_TIME equal to installed one.
> 
> Functional works but I'm not exactly sure if the source formating is as good
> as it can be. Maybe some python-native might have a nicer version ;)

Looks good, but fails to apply to latest git because a slight context mismatch. I've attached your patch with context updated so it applies cleanly.
Comment 10 Brian Dolbec (RETIRED) gentoo-dev 2016-06-24 18:57:55 UTC
Zac, if your happy with it, go ahead and push it.

Thank you for the work on this everyone...
Comment 11 Paul Varner (RETIRED) gentoo-dev 2016-06-24 21:16:23 UTC
(In reply to Brian Dolbec from comment #10)
> Zac, if your happy with it, go ahead and push it.
> 
> Thank you for the work on this everyone...

Patch looks good to me as well. Zac, if you are happy with it, go ahead and push it.
Comment 13 Zac Medico gentoo-dev 2016-06-24 21:41:57 UTC
(In reply to Zac Medico from comment #12)
> Pushed:
> 
> https://gitweb.gentoo.org/proj/gentoolkit.git/commit/
> ?id=78a446d0859fd1d3c2922d821fe0664099287312

Opps, fixed to use my @gentoo.org address as commiter:

https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=824953dd70d650ee0b2c057b0dfb44efb8f56a9b
Comment 14 Zac Medico gentoo-dev 2016-06-25 22:38:00 UTC
We could optimize this to get the binary package BUILD_TIME via the bindbapi.aux_get method, which pulls it from $PKGDIR/Packages. Using portage.xpak.tbz2(path).getfile('BUILD_TIME') is relatively slow, because it forces BUILD_TIME to be read from each and every xpak file.
Comment 15 Zac Medico gentoo-dev 2016-06-26 00:06:02 UTC
Created attachment 438818 [details, diff]
optimize with portage binarytree API

Use the portage binarytree API to optimize binary package access,
so that metadata is read from $PKGDIR/Packages instead of from the
individual binary packages. Symlinks will now be ignored, since
portage hasn't used symlinks for years, and there's no harm
in ignoring them now. The APIs used are compatible with very old
portage, though they internally support binpkg-multi-instance
in recent versions of portage.
Comment 16 Manuel Mommertz 2016-06-26 07:27:09 UTC
I first wanted to do the patch like this, but I didn't found a way to distinguish between different packages of the same version. I still don't understand how this works. Your patch does what it should but when I implement debugging output:

for cpvs in clean_me[cpv]: print(cpv + ' ' + cpvs + ' ' + bin_dbapi.aux_get(cpvs, ['BUILD_TIME'])[0], file=sys.stderr)

I get lines like this:

net-fs/autofs-5.1.1-r1 net-fs/autofs-5.1.1-r1 1466490418
net-fs/autofs-5.1.1-r1 net-fs/autofs-5.1.1-r1 1466490580

How can this be that aux_get outputs different results for the same input? Are there hidden characters in the string of cpvs or something?
Comment 17 Zac Medico gentoo-dev 2016-06-26 07:54:41 UTC
Each string returned from cpv_all is actually an instance of _pkg_str, which is unicode string with various attributes that describe the specific package instance:

https://gitweb.gentoo.org/proj/portage.git/tree/pym/portage/versions.py?h=portage-2.3.0#n354
Comment 18 Manuel Mommertz 2016-06-26 08:10:26 UTC
(In reply to Zac Medico from comment #17)
> Each string returned from cpv_all is actually an instance of _pkg_str, which
> is unicode string with various attributes that describe the specific package
> instance:
> 
> https://gitweb.gentoo.org/proj/portage.git/tree/pym/portage/versions.
> py?h=portage-2.3.0#n354

Ah, thanks. With this in mind I understand how it works and it looks good. But I would suggest to maybe not localy redefine variables like in 'for cpv in clean_me[cpv]' to avoid confusion when just skimming through the code.
Comment 19 Zac Medico gentoo-dev 2016-06-26 08:37:09 UTC
Created attachment 438834 [details, diff]
optimize with portage binarytree API

Renamed some cpv to pkg in list comprehensions.
Comment 20 Zac Medico gentoo-dev 2016-07-01 06:21:27 UTC
(In reply to Zac Medico from comment #19)
> Created attachment 438834 [details, diff] [details, diff]
> optimize with portage binarytree API
> 
> Renamed some cpv to pkg in list comprehensions.

Pushed:

https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=1952ecb9f2912968dd2f4487089fa50ed3d3bad1
Comment 21 Zac Medico gentoo-dev 2018-04-28 11:36:23 UTC
This was fixed in gentoolkit-0.3.2.