Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 150031 - Add ability to store/distribute different binpkgs built from the same ebuild in a package repository
Summary: Add ability to store/distribute different binpkgs built from the same ebuild ...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Binary packages support (show other bugs)
Hardware: All All
: High enhancement
Assignee: Portage team
URL: http://thread.gmane.org/gmane.linux.g...
Whiteboard:
Keywords: InVCS
: 380187 493744 511172 (view as bug list)
Depends on:
Blocks: 142579 484436
  Show dependency tree
 
Reported: 2006-10-04 01:40 UTC by Arne Babenhauserheide
Modified: 2018-02-08 23:28 UTC (History)
28 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arne Babenhauserheide 2006-10-04 01:40:19 UTC
It might be useful, if useflags would be appended to the package names (those which are set for the  package), so one could easily keep and test packages with different useflag combinations. 

This would also make it possible to create repositories of binary packages, for example for managing a set of office pcs, which mostly have the same configuration, but differ in a couple of useflags (for example you could have a "scan" pc and a "printer" pc, but they could get the packages from the same rep). 

Architectural and compiler specific differences can easily be taken into account by using a directory-structure.
Comment 1 Marius Mauch (RETIRED) gentoo-dev 2006-10-08 07:18:09 UTC
While I agree that we need more info about a binpackage in it's name simply appending active use flags isn't the way to go.
Comment 2 Arne Babenhauserheide 2006-10-10 06:58:50 UTC
What is nother way? 
Comment 3 Jakub Moc (RETIRED) gentoo-dev 2008-02-17 21:04:07 UTC
How about WONTFIX pending a more sane suggestion?
Comment 4 Arne Babenhauserheide 2008-02-25 08:08:00 UTC
I just changed the summary, so that this bug now simply reads 

"Include more info about a binpkg". 

How this should be archieved doesn't seem to be clear yet, but as I understand, it is consensus among the three who answered in here, that binpkgs should somehow contain more information. 
Comment 5 Arne Babenhauserheide 2008-02-25 08:17:06 UTC
One more idea: 

Store binpkgs on a directory which contains a file with information about their associated useflags. 

Binpkgs which fit more than one directory, because their specific useflags didn't change, could just be created in the first one and then hardlinked into the later created directories. 

The directory name could just be the date when the useflags where last changed (and binpkgs created). 

This would mean, that portage would have to check on binpkg creation, if the file with information in the most recent binpkg dir contains the curent useflags. 

If it does, it could just use the directory. 

If it doesn't, it would have to check inside the file for the useflags of each package. If they aren't affected, it could simply hardlink the package into the new directory. 
If they are affected, it simply wouldn't hardlink them in. 

That way there would always be a directory with current binpkgs which would still look quite clean. 

The problem I see with this approach is, though, that creating binpkgs for several sets of different machines wouldn't be solved by this. 


One alternative I see would be, to take a hash of the useflags as directory name and to set a symlink on the most recent directory for easy reference. 

What do you think? 

Is bugzilla the right place to discuss this? 
Comment 6 Arne Babenhauserheide 2009-01-01 19:57:28 UTC
How about saving useflags in a directory structure like the following: 

machine_type/
	active_package_useflags/
		packages with same active useflags

	other_active_package_useflags/
		packages with these active useflags. 

If the length of the list of active useflags exceeds a certain threshhold (i.e. 32 chars - the length of a sha1 hash in base32 encoding), it could be replaced by a sha1 hash over the full list. 

To check if we have a binpackage for a certain package, we only need to get the active useflags for the package (for example sorted alphabetically) and join them to a string. If the string is longer than 32 chars, we then compute the sha1 of the string in base32. 

Then we just check if the dir 
machine_type/use_string/
contains a binpackage we need. 

The full path to a binpackage could then look like this: 
/var/packages/x86_64-pc-linux-gnu/2BFXV4EC5B25XH6B7X7OXJ34GOCM2KM2/dev-lang/python-2.5.2-r8.tbz2

(2BFXV4EC5B25XH6B7X7OXJ34GOCM2KM2 is the Uppercase sha1 hash of "berkdb doc examples expat gdbm ipv6 ncurses readline sqlite ssl threads tk", and I used the CHOST for the machine type)

Should the machine type also include the gcc version, the glib version or similar? 
Comment 7 Zac Medico gentoo-dev 2009-01-01 22:02:11 UTC
The layout should be more like $PKGDIR/$CATEGORY/$PN/$SLOTS_HASH/$USE_HASH/python-2.5.2-r8.tbz2 since we should support an arbitrary number of "slots". Slotable variables include things like, CHOST, multilib ABI, python version, perl version, and any other $LANGUAGE for which slotting makes sense. In addition to slotables, ideally the new layout should account for subpackages as well. Subpackages will allow you to have a binary package that's split into an arbitrary number of subpackages that separate the package into parts that can be installed separately.
Comment 8 Arne Babenhauserheide 2009-01-01 22:56:47 UTC
Do you have an idea how the binpackage can support subpackages? 

Could that be done via symlinks and install-information? 
Or just install information with the info which files from which binpackage are needed? 

Do binpackages have to be safe or can they be identified via an ID file which contains binpackage and script hashes (or something different) for safe installation? 

But why do we need subpackage support? 

Every binpackage fits an ebuild, and ebuilds can be as small as needed. So subpackage support can simply be done via meta-ebuilds or similar. 

We don't need packages which are smaller than ebuilds, I think. 
Comment 9 Arne Babenhauserheide 2009-01-28 10:45:34 UTC
From a discussion in IRC (#gentoo-kde) we found one advantage of putting the hashes into the filenames: 

$PKGDIR/$CATEGORY/$PN/python-2.5.2-r8-$SLOTS_HASH-$USE_HASH.tbz2

Advantages: 
* The files can just be shared without having to preserve the directory structure. 

Also metadata could be added to the tail of the tar archives, so all data is preserved even when the filenames get lost. 
Comment 10 Arne Babenhauserheide 2009-01-28 11:46:08 UTC
We just found out that xpak already stores metadata in the binpackage, so we'd only need to add the filename or path changes and Gentoo could use a transparent binary layer :) 

(sounds a bit too hypeable, but I currently find no better word for that feature ;) )
Comment 11 Arne Babenhauserheide 2009-01-28 12:10:07 UTC
I just learned why subpackages would be nice: seperate packages into sets for different USE flags. 

But to me it looks like that would be a lot harder than just the simple changes to allow for a simple binpackage structure needed to allow for seperate binpackages with different useflags and other SLOTs. 
Comment 12 Fpemud 2009-01-30 06:51:07 UTC
I'd like this porblem to be solved too. 
I don't like to compile one package twice, it is time consuming.


As I know, a binpkg depends no only USE FLAG, but also: CFLAG,CXXFLAG,LDFLAG etc.

I think the structure of /var/db/pkg is good for this.
/var/db/pkg is the database to track installed packages.

/var/db/pkg's structure:
  /var/db/pkg
    |----dev-util
          |----cvs-1.12.12-r4
                |----CBUILD
                |----CFLAGS
                |----CHOST
                |----CONTENTS
                |----COUNTER
                |----CXXFLAGS
                |----DEPEND
                |----EAPI
                |----FEATURES
                |----IUSE
                |----KEYWORDS
                |----LDFLAGS
                |----cvs-1.12.12-r4.ebulid
                ...

I think the binpkg repository's structure would be like:
  binpkg_repository
   |--dev-util
      |--cvs-1.12.12-r4
         |--binpkg1
            |--CBUILD           	// i686-pc-linux-gnu
            |--CHOST            	// i686-pc-linux-gnu
            |--CFLAGS           	// -O2 -march=i686 -pipe
            |--CXXFLAGS			// 
            |--KEYWORDS			// alpha ~amd64 ...
            |--IUSE			// crypt doc emacs ...
            |--LDFLAGS			// -Wl,-O1
            |--cvs-1.12.12-r4.tbz2	// binpkg file
         |--binpkg2
            |--CBUILD           	// i386-pc-linux-gnu
            |--CHOST            	// i386-pc-linux-gnu
            |--CFLAGS           	// -O2 -march=i386 -pipe
            |--CXXFLAGS			// 
            |--KEYWORDS			// alpha ~amd64 ...
            |--IUSE			// crypt  ...
            |--LDFLAGS			// -Wl,-O2
            |--cvs-1.12.12-r4.tbz2	// binpkg file

note:
1. the name of "binpkg1","binpkg2" is to be discussed, i don't know what it should be currently.
2. i don't know which of CHOST and CBUILD represents the destination architecture, the dest arch should be put in the binpkg repository, we don't need src arch 

Comment 13 Fpemud 2009-01-30 07:01:25 UTC
when emerging a package, portage first check if current settings are same as the settings in the "binpkg1","binpkg2" directory. 
if same, portage use the tbz2 file directly, or portage will comiple the package, and creates a new "binpkg3" dir, put the new generated tbz2 file and current settings in.
Comment 14 Arne Babenhauserheide 2009-01-30 08:06:09 UTC
I think that binpackages should be easily shareable, and that wouldn't be the case with all the single files floating around. 

But the files in /var/db/pkg show all the necessary information which has to be included (excepting information which is in teh ebuild. We emerge stuff via ebuilds, so any information which is already in the ebuild can be left out). 

For this information we need to be able to do the following two seperable actions: 
- Check the environment for one given binpackage 
- Find a binpackage which fits our environment. 

The first can already be done via the xpak tail of the binpackage tar archives, so there's no need to change anything for that. 

In the second one we don't need to be able to read the information. We just need to be able to check, if it fits our system. To archieve that, we can just store a hash of the environment in the filename or path of each binpackage. 

To optimize this, I would seperate it into two parts: One which doesn't change very often and can be used to identify one type of system (i.e. amd64 with standard optimizations) and one which varies from user to user (i.e. USE flags).  (this very clean idea for this comes from Zac)

The first parts gives the "SLOT", the second part the active USE flags of the package. 

To make it more human readable, I'd first turn the part (i.e. "SLOT" or USE flags) into a string and only hash it, if that string is longer than the hash would be. 

As Hash I'd use sha1 encoded as Base32, uppercase, since sha1 is quite safe against collisions and Base32 doesn't contain any characters which have special meanings in package names. 
Comment 15 Philipp Riegger 2009-03-24 22:42:07 UTC
I would prefer to have the USE-flags on a per-package basis, because they really depend on a package. Therefore some hierarchy like .../category/package/use-flags/package-version.tar.bz2 or .../category/package/package-version.use-flags.tar.bz2 would be nice. I would prefer the last one.

The USE-flags could be managed in the following way: Require a revision bump if a package changes IUSE (might be hard if an eclass changes something, but definitely doable). Sort the USE-flags and create a binary string with a 1 for every enabled flag and a 0 for every non enabled one. Well, this is not a string, ist a binary number with possibly leading zeroes, compress those into hex or base64 or whatever and use it to indicate the USE-flags. For most packages the number of use-flags is small, for some with lots of USE-flags the filename length would be increased by 1 for every 6 USE-flags. Should be ok, I think.

Another thing that might be needed is a special binary revision. If you have package foo, that debends on lib bar, and a new version of the lib hits the tree with a different ABI (or whatever, that foo needs not be changed but recompiled), it would be nice to indicate that foo should be reinstalled, by increasing the binary revision.
Comment 16 Arne Babenhauserheide 2009-03-25 08:42:20 UTC
The problem I see with bitwise use string is that it will never be human readable. 

The binary revision is something else, because this kind of dependency doesn't get hand-edited by the user, so it doesn't benefit from being user-readable. 


I think when including this we'd have three seperate elements: 
* Local system settings: Active USE-flags. 
* Binary compatibility requirements: The necessary libs and dependencies. 
* Host SLOT: Hardware requirements. 

A package with wrong USE flags can be installed anyway, a package with wrong binary requirements just won't work, though, so these should be kept seperate. 

The Host SLOT signifies a cathegory which fits for all packages created by a certain computer and shouldn't change as long as no libraries change in a way which affects the whole system (like the glibc major version), the user doesn't play with CFLAGS, and similar (like installing a new CPU :) ). 


Does this seperation sound useful, or did I overlook/misunderstand something? 


Besides: From what I see in a random binpackage (fretsonfire), the IUSE part of the xpak looks like the right part for the USE flag info of the name, so we could just use that directly without having to do much error-prone conversion. 

Remember that the binpackage name just needs to have an ID with which it can be found; it doesn't need to include anythign which can be found in the ebuild, since anyone installing a binpackage should also have the corresponding ebuild. 

The requirement for each ID element of the name is, that it can be generated directly by any Gentoo installation which only knows the ebuild and the already installed libraries. 


Question: What do we do if we don't already have one of the binary requirements? How can we then find the correct binpackage? 

Does the info contained in a binpackage suffice? What I see (in openarena-0.8.1.tbz via vim) is: 


CXX: x86_64-pc-linux-gnu-g++

NEEDED: /usr/games/bin/openarena-ded libdl.so.2,libm.so.6,libc.so.6
/usr/games/bin/openarena libSDL-1.2.so.0,libpthread.so.0,libGL.so.1,libvorbisfile.so.3,libvorbis.so.0,libogg.so.0,libdl.so.2,libm.so.6,libc.so.6

CFLAGS: -O2 -pipe -march=k8

NEEDED.ELF.2: X86_64;/usr/games/bin/openarena-ded;;;libdl.so.2,libm.so.6,libc.so.6
X86_64;/usr/games/bin/openarena;;;libSDL-1.2.so.0,libpthread.so.0,libGL.so.1,libvorbisfile.so.3,libvorbis.so.0,libogg.so.0,libdl.so.2,libm.so.6,libc.so.6


Also there seems to be some binary "DEPEND". I don't know what it contains, though, so I can't judge if it would be needed for binary compatibility. 

Can we get these values from our own system (fast enough -> without building the package ourselves), so we can just include a hash with which we can find a fitting binpackage and the right libs for it? 

And if yes: Which of these values have to be included as hash in the package name for binary compatibility, and which are systemwide, so they can be included in the Host SLOT? 
Comment 17 Philipp Riegger 2009-03-25 12:53:05 UTC
I'm not sure what you mean by "user readable" is you consider a hash over a long string more readable than a packed presentation of that string. The last one, could be converted to the user readable format, not the first one.

If you write about CFLAGS: Maybe it would be wise to leave CFLAGS out, that don't do anything (at least concerning the binary). Something like "-march=prescott -mmmx" and "-march=prescott" are the same, "-pipe" only changes the behaviour of gcc, not the created binary.

I usually try to make gentoo behave like other binary distros when it comes to binary packages. And they only have the version and this "binary revision" or whatever and maybe the right dependencies. But to make it usable, we don't even need dependencies and which LIBS are needed and all that stuff. Portage checks that USE-flags and all that stuff matches and then the package with the biggest binary revision is installed. That works without lots of magic and foo and before we discuss here more and more, we shoudl maybe implement a simple working system instead of trying to create a complicated, error-prone one.
Comment 18 Arne Babenhauserheide 2009-03-25 13:15:20 UTC
By usser-readable I mean the default string representations of the active useflags, which are only hashed, if they get longer than the hash (that's what I proposed above). 

That way a package with three active USE flags just has these USE flags in the package name, while a package with many active USE flags (like mplayer) has a simple hash. 

The advantage here is that only the active USE flags are tracked - for most packages simple no USE flag, for most others only one or two. 

For the Host SLOT I honestly don't care much, but I think that a hash over the string will be far easier to write and especially to maintain. Also a hash has a guaranteed length, which isn't true for a bitmask (though the difference in length might make that difference fall away). 

I think that CFLAGS should be in, though, since some users have kinda crazy settings in there which only work for their specific setup. 

The problem with binary dependencies is, that the ebuild might state that a lib is compatible, since it is recognized in the configure process, but it can well happen that you have to revdep-rebuild to make your programs work again after a library update. 

If you do a binary installation, you can't just rebuild with the new lib, and the two binary packages (one for old lib one for new lib) must not be mixed - so they should have different names and the names should make it possible for portage to decide which binary package to install. 

Gentoo isn't just a binary distro with a fixed set of packages which all use the same libs. Different from Ubuntu and similar the basic libraries can get updated while all the rest stays version-fixed. 
Comment 19 Philipp Riegger 2009-03-25 13:34:16 UTC
Your hash-system has 2 disadvantages: Binary packages of the same source package are scattered all over the tree (directory tree). Also, if you have the hash, you are less than unreadable for the user. For my system, you write 2 simple conversion functions and can have command line tools, web interfaces, everything you need. It's all the same, no special cases. Also, I don't see any advantage in only tracking active USE-flags. It mages a packed representation harder.

CFLAGS are a story of their own, that's for sure. Maybe one should create an assembler file with -fverbose-asm and record those CFLAGS and not the ones from the make.conf.

Last about the library stuff. If you update a library and therefore need to update an application, we proposed 2 ways: You want to save library dependencies in the file (which might not be enough, since the ABI might change without the filename, which really sucks) and then you have to calculate, download several packages or the metadata of them, check which is usable, install it. In my version the build system would create the new package for the lib, check which application is broken (ok, lots of apps need to be installed here) and then just build new packages for them. It would then increase the binary revision by one and commit all the packages in 1 transaction to the server. So, whenever you get the chance to install a lib that might break some application, either you installed the app froma binary and therefore you get a new binary package, because the binary version is higher, or you compiled the package on your own and have to reinstall it on your own.

And, last but not least, there is still the preserved libs feature of portage which can help here.
Comment 20 Arne Babenhauserheide 2009-03-25 15:33:17 UTC
> In my version the build system would create the new package for the
lib, check which application is broken (ok, lots of apps need to be installed
here) and then just build new packages for them. It would then increase the
binary revision by one and commit all the packages in 1 transaction to the
server. 

How do you make that work with multiple unconnected servers? 

What happens if you and I rebuild at the same time and send the files to different servers? 

And what do you do if you don't have all applications installed which depend on the lib? 

I'd like this system to be robust enough to support community-built binpackages (with some trust system to ensure that you can always find the responsible person if something breaks). 


For the USE flags I doubt that you would get any real gain by doing USE flags bitwise. From what "eix -I" and some grepping and seding tells me, most packages have at least two USE flags, but none of them is active. So using only the active USE flags has no cost here, using a bitmask takes one char. 

Also the simple advantage is, that people can look at the binpackages and *see* with which USE flags they were built. 


Last to the hash system: There is one binpackage per installed ebuild, just as in the current system. 
The only change is that the name of the binpackages gets two extra parts appended: USE flags and a Host SLOT. (Three with binary compatibility)

So it's the current system + a way to find matching binpackages for different configurations. The binpackages themselves already contain ways to check, if they can be used, but we currently can't search for this information efficiently. 

Did you read the whole discussion in this bug? 


I am currently leaning more towards adding the USE hash and SLOTs hash to the filenames, since these could be shared more easily, but using a directory structure is cleaner. 
Comment 21 Philipp Riegger 2009-03-25 16:07:53 UTC
(In reply to comment #20)
> How do you make that work with multiple unconnected servers? 

With some kind of locking, it can work.

> What happens if you and I rebuild at the same time and send the files to
> different servers? 

Nothing happens, because the servers are different. If you want to sync them, you have to think of something.

> And what do you do if you don't have all applications installed which depend on
> the lib? 

You figure out, if you usually build binpackages for them and if you do, you install your latest binpackage and check it.

> I'd like this system to be robust enough to support community-built binpackages
> (with some trust system to ensure that you can always find the responsible
> person if something breaks). 

The thing is, I discuss some kind of build server while you discuss the distributed system. Each have different applications, but creating a binary package that only supports one would be stupid. Therefore I suggest adding the binary revision and handling the packages built by your distributed system like live ebuilds, with -b9999 or something like that. Those yould be unmased with a FEATURE or some other flag in the make.conf. Therefore one can decide to only use the packages built on the trustworthy server or also use the packages built by the distributed system.

> For the USE flags I doubt that you would get any real gain by doing USE flags
> bitwise. From what "eix -I" and some grepping and seding tells me, most
> packages have at least two USE flags, but none of them is active. So using only
> the active USE flags has no cost here, using a bitmask takes one char. 
> 
> Also the simple advantage is, that people can look at the binpackages and *see*
> with which USE flags they were built. 

It's difficult to talk about cost here, since it does not really exist. We get a problem if filenames and paths get too long to be suported by http, the filesystem or whatever. This happens with none of our representations.

You propose a mixed representation based on the combination on use-flags and a hash. I don'T like it because it's not clean: Representation depends on what is represented, binary packages for the same ebuild are scattered all over the place. If a package gets a new USE-flag or loses one, the next binary package might end up in a completely different directory.

How is that easily usable? How can a user find out if a new version is available without the help of some tools? How much time and effort would it be to delete binary packages because of security issues, license problems or other reasons? How many different directories would you end up with and are they supported on currently used file systems? Is the access fast enough to be practical? 

> Last to the hash system: There is one binpackage per installed ebuild, just as
> in the current system. 
> The only change is that the name of the binpackages gets two extra parts
> appended: USE flags and a Host SLOT. (Three with binary compatibility)

How do you handle packages being built against different library versions? They would end up with the same filename. And not every user can update every library just to overcome a problem in the naming scheme.

> So it's the current system + a way to find matching binpackages for different
> configurations. The binpackages themselves already contain ways to check, if
> they can be used, but we currently can't search for this information
> efficiently. 

You could provide a cach file on the server. RPM uses some kind of index file. 

> Did you read the whole discussion in this bug? 

I guess so. I skiped uninteresting parts and parts, were the same was told again and again. If you think I miss a special comment, please tell me which it is. One big problem with this discussion is, that it was scattered between bugzilla and the soc maininglist. I tried to keep it seperated, but it does not seem to be possible. You refer to your distributed system before mentioning it once in this bug.

> I am currently leaning more towards adding the USE hash and SLOTs hash to the
> filenames, since these could be shared more easily, but using a directory
> structure is cleaner. 

Other distributions use different directories for different SLOTS, how you call them. I would stick to this. Gentoo also uses different directories for different architectures. I would also prefer to have the USE-flag in the filename, sionce that describes the package itself and not the "linux distribution" (replace this with a better name, iff you want) which it belongs to.
Comment 22 Arne Babenhauserheide 2009-03-25 17:11:37 UTC
> If you think I miss a special comment, please tell me which it is. 

It's not one post but the three different approaches: 
* SLOT and USE in directory structure
* SLOT in directory, USE in directory
* Both in filename. 

And the reasons for using readable USE flags in the filename. 

> One big problem with this discussion is, that it was scattered between
> bugzilla and the soc maininglist. 

I didn't know that there was discussion on this in the soc list. 

Your you give me a link? 

> I tried to keep it seperated, but it does not
> seem to be possible. You refer to your distributed system before mentioning it
> once in this bug.

That is a long-term goal we talked about in IRC a few years ago and I seem to have missed writing it here before... sorry for that. 

In short: It would be great if we had a way for users to get trusted binpackage providers and for them to tell portage to use binpackages whenever possible and to create and upload new ones, where the binpackages don't exist, yet. 

As soon as this works, it could be extended to a trusted p2p network in which people simply share their binpackages and download those binpackages they need. 


The idea comes from the experience that sharing the distfiles in Gnutella led to many people downloading them. 


> Other distributions use different directories for different SLOTS, how you call
> them. I would stick to this. Gentoo also uses different directories for
> different architectures. I would also prefer to have the USE-flag in the
> filename, sionce that describes the package itself and not the "linux
> distribution" (replace this with a better name, iff you want) which it belongs
> to.

The name SLOT comes from zmedico, though I'd love to be able to claim that it was my idea :) 

By using a Hash, the SLOT can contain many different host type definitions, and the system won't have to change a bit when the included information changes - it's still just a hash over stuff we look for. 

Using a SLOT dir for machine type and USE flags in the name sounds also good to me. 

With USE-fllags in the filename the files don't get scattered, by the way. 

It would look similar to this (needs to be checked against package naming scheme, if it's compatible): 

SLOT1/
  portage-2.2_rc25-epydoc.tbz
  portage-2.2_rc27-epydoc.tbz
  python-2.5.4-r2-GYFBQKRKWSM67J6GO63XUE3JPDIJUSRA.tbz
  ...
SLOT2/
  ...
Comment 23 Philipp Riegger 2009-03-25 18:52:34 UTC
(In reply to comment #22)
> > One big problem with this discussion is, that it was scattered between
> > bugzilla and the soc maininglist. 
> 
> I didn't know that there was discussion on this in the soc list. 
> 
> Your you give me a link? 

I somehow had identified you with the person I'm writing with on gentoo-soc. You can find the thread at <http://archives.gentoo.org/gentoo-soc/>.
Comment 24 Philipp Riegger 2009-03-26 14:04:58 UTC
(In reply to comment #22)
> In short: It would be great if we had a way for users to get trusted binpackage
> providers and for them to tell portage to use binpackages whenever possible and
> to create and upload new ones, where the binpackages don't exist, yet. 
> 
> As soon as this works, it could be extended to a trusted p2p network in which
> people simply share their binpackages and download those binpackages they need. 

Ok, but this is beyond the scope of this bug. Here we should discuss htings that would make sense to be added to the binpackage as metadata which would enable everything we want to do with it.

> By using a Hash, the SLOT can contain many different host type definitions, and
> the system won't have to change a bit when the included information changes -
> it's still just a hash over stuff we look for. 

The problem I see is, that it will become hard to find out, which binpackages for your system are available. Maybe you don't care about USE-flags or one about one special CFLAG, then you might want to find not the same SLOT, but almost the same one. Would that be possible?
Comment 25 Arne Babenhauserheide 2009-03-26 15:23:08 UTC
> Maybe you don't care about USE-flags or one
> about one special CFLAG, then you might want to find not the same SLOT, but
> almost the same one. Would that be possible?

If you don't care about USE flags, you just ignore the USE hash part of the binpackages, so this is easy. 

But I see an advantage of your approach here: It would be easier to just compare bitwise "does the package have the USE flags I need, I don't care about additional capabilities". As much as I dislike losing readable USE flags in the binpackage, this is a major advantage, since you can then do more complex checks from the names without having to download the binpackages and checking the xpak. 

But at the same time doing this on the binpackage level kills dependency tracking, since a new USE flag can imply a new dependency which is needed for getting the binpackage to run. This means you'll have to recalculate dependencies for every alternate USE flag combination, which is so expensive that any string conversion or hash algorithm pales in comparision. 

Just saying "I don't care about USE flags" only works for mostly selfcontained packages, else you need to enable he right USE flags. 

For CHOST and similar (the Host SLOT): How do you decide what is unimportant? 

Anything which isn't a Hash is in danger of becoming arbitrarily long if you need to include more information, and if you leave stuff out, you take away the option of using that information - and in fact force users to use packages with non-fitting settings. 

With the Hash it is far easier to change the included information later on without breaking earlier versions: Just change the information and hash again and all resolution will still work, even for old versions (they will just not see the new packages as being compatible, but they won't get false information). 

The only thing we need is being able to find the files based on known strings. 

To allow for more complex comparisions with hashes we'd just have to hash each combination of active USE flags -> since most packages have just two or three USE flags there are only a few possible combinations. And since we'll have to recalculate dependencies anyway if we use different USE flags, the cost of doing multiple hashes is negligible in comparision. 
Comment 26 Arne Babenhauserheide 2010-02-21 22:53:40 UTC
Question: What would be a good (uri and filesystem safe) delimiter for USE flags? 

Space isn't URL safe, - and _ are being used in some flags. 
Comment 27 Zac Medico gentoo-dev 2010-02-21 23:15:58 UTC
Given that some packages support extremely large numbers of USE flags, I suspect that it's going to be much more practical to use an id number or hash to guarantee unique file names.
Comment 28 Arne Babenhauserheide 2010-02-22 22:34:03 UTC
My idea is still to create a hash, if the string would be longer than the hash (that way package names for packages with few useflags remain human readable). 

But there's no useful USE-flag delimiter, always using the hash would be a useful alternative. 
Comment 29 Zac Medico gentoo-dev 2010-06-02 04:52:41 UTC
There's some discussion about adding hashes to filenames in this thread:

http://archives.gentoo.org/gentoo-portage-dev/msg_6ead086db61f438bfbac01c97d3da390.xml
Comment 30 Arne Babenhauserheide 2010-06-02 15:29:27 UTC
Thanks for the info! 

Could you maybe post a link to this bug there? It might help, because files are then unique for each configuration and can be hardlinked to specific layouts. 
Comment 31 Joe Pelkey 2010-08-26 18:11:58 UTC
Since I've also wanted this for a while, here are my two cents:

Each binpkg file could be placed in following path:
${PKGDIR}/${CATEGORY}/${PF}/<computed directory name>/${PF}.tbz2

The computed directory name could be
.${CHOST}.<USE flags>.<date/time>

where date/time is a modified ISO 8601 representation, and where USE flags are concatenated by some character that's safe everywhere (like . or +), and . also is whatever character that's deemed safe.  The full length of the computed directory name could be truncated to 255 characters, or whatever's deemed sane, by truncating the "${CHOST}.<USE flags>" part (since it's only for human eyes anyway).

There could also be a variable added to portage, e.g. BINPKG_REBUILD_CHANGED_VARIABLES, which could contain a list of a valid subset of filenames within /var/db/pkg/${PF} (sorry, don't know the general term for those variables), that when any of them are different than what's already defined in a preexisting binpkg when installing or creating binaries, the package will be rebuilt.

The date/time component obviates any need to hash, only one extra level of the directory structure is needed, the most desired to know parts of the variables are within the directory structure itself, the leading . (or whatever char) in the computed dir part allows preexisting utilities to easily ignore this proposed directory structure (which is also fully backward compatible with the current structure), and the binpkg files don't need to be modified in filename or xpak content from their current form.  The only detriment I can see is that new binaries can't be placed in ${PKGDIR}/All (but I'm not even sure newer portages would write to ${PKGDIR}/All anyway, so it may be moot).
Comment 32 Arne Babenhauserheide 2010-08-27 05:27:54 UTC
Hi Joe, 

The problem I see with the proposed directory name with date/time is that it might not be a simple to compute without any additional knowledge anymore. 

The reason for the hash is, that portage can then easily compute the name for any binpackage, local ones as well as foreign ones: Just get your own CHOST and the active useflags for the package and you can query for it. 
Comment 33 Joe Pelkey 2010-08-27 23:14:38 UTC
Hey Arne,

The reasons I was leaning away from hashes are:

1. The only variables that could sanely be incorporated into the hash would have to be relatively constant and absolutely relevant ones, like CHOST, and ones integral to the package installation definition, like SLOT and USE.  While I personally would need no more than that, some have suggested filtering binpkgs on more criteria than those variables (like CFLAGS/CXXFLAGS/LDFLAGS) is desirable, and one binpkg per hash dir would preclude the capability of creating packages with the same CHOST+USE+SLOT but with the other variables differing.  Incorporating those *FLAGS or other variables into the hash would be difficult since they would have to be converted to a canonical form before hashing, with irrelevant flags (like -pipe) removed and with flags that mean the same thing but specified in a different manner (-O3 and -O6 for example) to be converted to the same thing.  More importantly, if someone didn't want to differentiate builds based on any of those variables, they would nevertheless be forced to in order to generate the correct hash.

2. There's always the chance, albeit extremely slim, of hash collisions.

3. From a convenience standpoint, if the filepath feature that differentiates builds is a hash, then users would need to use qtbz2/qxpak (or some other program) to find out which variables the binpkgs were built with.

That being said, an O(1) path lookup given by hashes would be adventageous, but to use hashes in a way to please the most gentoo users would need a fallback of doing a directory search if the hash failed to match.  For example:

I compile sys-apps/somepack-5.6.7 with SLOT=0, USE="flag1 flag2 flag3", CHOST=x86_64-pc-linux-gnu, and CFLAGS="-march=core2 -O2".  The USE+CHOST+SLOT hash to 12345, and the binpkg is stored in
${PKGDIR}/sys-apps/somepack-5.6.7/12345/somepack-5.6.7.tbz2 .

Then I compile sys-apps/somepack-5.6.7 with the same USE, CHOST, and SLOT, but CFLAGS="-march=athlon64 -O2".  The hash would still be 12345, so the fallback would have to be to place it in a directory like 12345.2, or with a date appended, or something.

Since portage could check the xpak to see if the binpkg in the quickly-found hashed dir matches the user criteria or not, portage could fallback to checking the xpak of the binpkgs in dirs that start with the hash to find a more suitable one.
Comment 34 Arne Babenhauserheide 2010-08-30 00:20:01 UTC
Hi Joe, 

I think there’s a misunderstanding to why I want to use hashes. It’s not because I don’t want cleartext names (I would prefer having cleartext as default and only hashing, when the cleartext would be longer than the hash), but because I want to avoid arbitrarily long filepaths and still allow for arbitrary information being added.¹ 

The idea of also differenciating binpackages by not essentially incompatible information (like not quite relevant CFLAGS) sounds interesting, but I think it can be done easily by just computing the SLOT hashes for the different combination of safe flags which moves the task of distinguishing safely changeable flags from unsafe ones from the storing server to the portage which does the install. 

And I would make all CFLAGS part of the SLOT and then let portage decide which changed SLOTS to try. This would result in some unnecessary tests (far less than the time to actually get the package), but make the whole storage system independent of the question which inconsistencies are thought as safe (like -03 instead of -02) or unsafe (like -mcpu=cell instead of -march=k8). 

Additionally, hashing allows for easily changing the information which is included in the SLOT without having to create a new parser. When you add information, the old binpackages won’t be found as compatible, but you still just check if the hash matches. 

¹: My idea for hashing is something like this: 

if len(RAW_SLOT) < hash_len: 
    SLOT = RAW_SLOT
else: 
    SLOT = hash(SLOT)

Same for USE. 

It would then look like this: 

${PKGDIR}/${CATEGORY}/${PN}/${PN}-${PV}-${SLOT}-{$USE}.tbz2

Besides: I think "," should be safe for useflags, but I’m not perfectly sure about that. + would be problematic in a filename, because it gets parsed to a space when accessing via HTTP (if I recall that correctly). 

And I would prefer seperating the hashes from the rest of the filename by "-", because that’s already used to seperate the version from the PN. A problem with "-" is that it exists in some useflags and in CFLAGS. For the intended one-way usage (create a config to check → generate SLOT and USE → check for the filename) this would be no problem, though, and the pattern recognition in humans is sophisticated enough to resolve the name despite the dual usage of the seperator in the seperated values. 

PS: For CFLAGS without hashing, we have to replace the spaces, since they are not URL-safe (and a pain to use in the shell). 
Comment 35 Arne Babenhauserheide 2010-08-30 01:06:15 UTC
… "," would be a bad candidate. See http://www.ietf.org/rfc/rfc2396.txt

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

‘The "reserved" syntax class above refers to those characters … which may not be allowed within a particular component of the generic URI syntax.’ (line 439)


These would be OK for URIs: 

unreserved  = alphanum | mark

mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Since !,*,',(,) are not nice to use in the shell, we’re left with -,_,. or ~. ~ would be misleading (“why my home?”) so I’d leave it out, too. . is used in the version and can be used multiple times, so this would border to being consistent. 


The unhashed ${USE} part (only active use flags) for my current python installation would then be 

berkdb.examples.gdbm.ipv6.ncurses.readline.sqlite.ssl.threads.tk.wide-unicode.xml

Since this is longer than a base32 encoded sha1 hash, it would be hashed to 

SE3L3SXRLCPCX7OQN7TLKXHK6TF6T5EI


Bash on the other hand only has three active use flags on my system, so its ${USE} part would look like this: 

examples.net.nls


PS: code for the selective hashing: 

from hashlib import sha1
from base64 import b32encode
def conditional_hash(data): 
    """Hash the data if it would be longer than a b32 encoded sha1 hash."""
    if len(data) >= 32: 
        s = sha1()
        s.update(data)
        data = b32encode(s.digest())
    return data
Comment 36 Leho Kraav (:macmaN @lkraav) 2011-04-07 08:06:22 UTC
has anyone prototyped any of this, or will it be left for gsoc?
Comment 37 Arne Babenhauserheide 2011-04-07 16:52:39 UTC
I don’t know of any prototype.

GSoC would be cool!
Comment 38 Sebastian Luther (few) 2011-04-07 19:33:41 UTC
(In reply to comment #37)
> I don’t know of any prototype.
> 
> GSoC would be cool!

Does that mean you would apply as a student or you know someone who would?
Comment 39 Arne Babenhauserheide 2011-04-21 07:49:40 UTC
Sebastian: I already applied for GNU Hurd. It was a close tie between Gentoo, Mercurial, Freenet and the Hurd, but in the end the Hurd won because it’s about Python bindings to low-level code: Something I wanted to use in real-life code for years.

If you give me 2 weeks with a primary Portage developer whom I can regularly pester for stuff like “where is this in the code” (and no other stuff to do) I *think* I can implement this request. After all, most of it is already in place: Only the filepaths and filenames need changing.
Comment 40 David J Cozatt 2011-06-18 22:37:42 UTC
on #gentoo irc azazello mentioned what seemd a nice suggestion that we discussed. 

Instead of category/ based binpkg's do it as /distfile is done. one directory for built packages 
after reading the above thread I'd propose this and a single flatfile with a string for each binpackage and at least a digest or md5sum
Would speculate would be able to do away with running the fixpackages script locally since the category names are no longer determined by it's stored directory it would just change with the portage tree.

Might ease maintenance of tools and time running them.
Comment 41 Zac Medico gentoo-dev 2011-08-22 17:07:59 UTC
*** Bug 380187 has been marked as a duplicate of this bug. ***
Comment 42 Leho Kraav (:macmaN @lkraav) 2011-11-30 14:16:23 UTC
anyone perhaps capable of taking this on with a bounty of some sort?
Comment 43 Nirbheek Chauhan (RETIRED) gentoo-dev 2011-12-15 12:40:46 UTC
Changing subject to reflect the actual goal of this bug, the implementation detail itself is currently under debate.
Comment 44 Leho Kraav (:macmaN @lkraav) 2012-07-29 16:07:57 UTC
I wonder if a minimal viable product for this would be to somehow allow me to have both USE=minimal and USE=-minimal binpkgs built without overwriting each other. Or does that pretty much require solving the whole problem scope at once anyway?
Comment 45 Dennis Schridde 2012-07-29 18:06:27 UTC
(In reply to comment #44)
> I wonder if a minimal viable product for this would be to somehow allow me
> to have both USE=minimal and USE=-minimal binpkgs built without overwriting
> each other. Or does that pretty much require solving the whole problem scope
> at once anyway?
I think that was exactly the original reason for creating this bugreport: A solution to the useflag problem.
Comment 46 Leho Kraav (:macmaN @lkraav) 2012-07-29 19:04:27 UTC
Yes. My main motivation was bumping this to get some mindshare here again and perhaps try to leave out stuff like handling different architectures, Prefix, and who knows what else (who = portage hackers). I don't have enough deep-portage know-how yet to correctly pick the pain points here.
Comment 47 Sebastian Luther (few) 2013-12-09 20:15:14 UTC
*** Bug 493744 has been marked as a duplicate of this bug. ***
Comment 48 Arne Babenhauserheide 2014-11-03 13:45:23 UTC
(In reply to David J Cozatt from comment #40)
> on #gentoo irc azazello mentioned what seemd a nice suggestion that we
> discussed. 
> 
> Instead of category/ based binpkg's do it as /distfile is done. one
> directory for built packages 
> after reading the above thread I'd propose this and a single flatfile with a
> string for each binpackage and at least a digest or md5sum
> Would speculate would be able to do away with running the fixpackages script
> locally since the category names are no longer determined by it's stored
> directory it would just change with the portage tree.
> 
> Might ease maintenance of tools and time running them.

So you mean just using 

${PKGDIR}/${PN}-${PV}-${SLOT}-{$USE}.tbz2

with ${SLOT} and ${USE} hashed if they are too long?

Additionally a file with the hashes for the package files?

I could imagine that this could create a problem with the number of files per directory. I already have 3.9k binpackages, and at 10k files even `ls` can get pretty slow.

Including the category as directory would be as close as possible to the current scheme, and should never clash with the current method (since the SLOT hash should never be part of a version-string).
Comment 49 Zac Medico gentoo-dev 2014-11-03 20:31:17 UTC
*** Bug 511172 has been marked as a duplicate of this bug. ***
Comment 50 Zac Medico gentoo-dev 2014-12-24 02:09:00 UTC
I've posted a proposal here:

http://thread.gmane.org/gmane.linux.gentoo.portage.devel/5031
Comment 51 Zac Medico gentoo-dev 2015-01-16 03:12:41 UTC
I have an experimental implementation in the following branch:

	https://github.com/zmedico/portage/tree/binpkg-support-integration

For anyone who is interested in experimenting with it, I've created an overlay containing a sys-apps/portage-9999 ebuild that pulls from the above branch:

	https://github.com/zmedico/portage-binpkg-support-overlay

The overlay includes a README.md file with additional information.
Comment 52 Zac Medico gentoo-dev 2015-02-17 08:39:40 UTC
I've posted patches for review here:

http://thread.gmane.org/gmane.linux.gentoo.portage.devel/5239
Comment 54 Leho Kraav (:macmaN @lkraav) 2015-06-28 20:25:07 UTC
Heya Zac. Tyvm for your work here. Has it been decided yet which portage release version is going to ship this feature?
Comment 55 Leho Kraav (:macmaN @lkraav) 2015-06-28 20:27:35 UTC
Ah, I see it's already in 2.2.19 NEWS file. Lovely.
Comment 56 Zac Medico gentoo-dev 2015-06-28 23:01:41 UTC
Yes, this is fixed since 2.2.19.
Comment 57 marco 2015-06-29 08:07:11 UTC
Thanks for adding this feature
Marco
Comment 58 Christian Affolter 2015-07-01 12:21:30 UTC
Thank you so much for adding this feature! Finally, it is possible to combine the flexibility and individuality provided by the USE flags together with the 	comfort of binary-packages within mass-deployments. Thanks again!
Comment 59 waynedpj 2015-07-03 14:59:17 UTC
(In reply to Zac Medico from comment #56)
> Yes, this is fixed since 2.2.19.

thanks for implementing this feature, what a long road!

if this make it possible to have USE flags with a binary packages, is there a BINHOST that supports this setup yet?

thanks again.
Comment 60 Harald Weiner 2015-07-09 12:59:36 UTC
As portage-2.2.20 is now stable I have tried out the new binpkg option. After setting FEATURES=binpkg-multi-instance in make.conf it works like a charm. Thank you a lot for implementing this feature, it will save us a lot of unnecessary re-compilation time :-).
Comment 61 Leho Kraav (:macmaN @lkraav) 2015-07-09 14:08:11 UTC
Really interested how this will affect --binpkg-respect-use and friends.
Comment 62 Zac Medico gentoo-dev 2015-07-09 19:44:59 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #61)
> Really interested how this will affect --binpkg-respect-use and friends.

It will search all of the available packages (including remote packages if you use --getbinpkg) until it finds one with the desired USE settings.
Comment 63 Leho Kraav (:macmaN @lkraav) 2015-07-11 09:21:04 UTC
(In reply to Zac Medico from comment #62)
> (In reply to Leho Kraav (:macmaN @lkraav) from comment #61)
> > Really interested how this will affect --binpkg-respect-use and friends.
> 
> It will search all of the available packages (including remote packages if
> you use --getbinpkg) until it finds one with the desired USE settings.

Yep, looks like works great. So now the next challenge is how to auto-build all the desired USE flag configurations on the binhost. Feels like we'd want to enumerate a set of package.use.N files for hints and build all variations of the binpkg in one go?
Comment 64 Jacob Godserv 2015-07-11 12:36:58 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #63)
> (In reply to Zac Medico from comment #62)
> > (In reply to Leho Kraav (:macmaN @lkraav) from comment #61)
> > > Really interested how this will affect --binpkg-respect-use and friends.
> > 
> > It will search all of the available packages (including remote packages if
> > you use --getbinpkg) until it finds one with the desired USE settings.
> 
> Yep, looks like works great. So now the next challenge is how to auto-build
> all the desired USE flag configurations on the binhost. Feels like we'd want
> to enumerate a set of package.use.N files for hints and build all variations
> of the binpkg in one go?

That's a bit out of scope for this bug.
Comment 65 Leho Kraav (:macmaN @lkraav) 2015-07-11 13:18:37 UTC
(In reply to Jacob Godserv from comment #64)
> > Yep, looks like works great. So now the next challenge is how to auto-build
> > all the desired USE flag configurations on the binhost. Feels like we'd want
> > to enumerate a set of package.use.N files for hints and build all variations
> > of the binpkg in one go?
> 
> That's a bit out of scope for this bug.

Not trying to start working on it in this bug. But this bug has a bunch of interested people already subscribed. Because resources are stretched for everybody, trying to blindly follow policy with an immediate new bug is of lower value than just looking to first calibrate a potentially valuable follow-up idea in an existing bug, then branch off to a new bug. If Zac says yes, it looks like the right thing to do (regardless of who finally executes), let's kick off the new bug then.
Comment 66 Zac Medico gentoo-dev 2015-07-11 19:10:41 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #63)
> (In reply to Zac Medico from comment #62)
> > (In reply to Leho Kraav (:macmaN @lkraav) from comment #61)
> > > Really interested how this will affect --binpkg-respect-use and friends.
> > 
> > It will search all of the available packages (including remote packages if
> > you use --getbinpkg) until it finds one with the desired USE settings.
> 
> Yep, looks like works great. So now the next challenge is how to auto-build
> all the desired USE flag configurations on the binhost. Feels like we'd want
> to enumerate a set of package.use.N files for hints and build all variations
> of the binpkg in one go?

That would be a job for a tinderbox tool, or something like that. Currently, portage doesn't provide any tools like that, and it's questionable whether such a tool should be included with portage (rather than being a separate project that makes use of portage).
Comment 67 Leho Kraav (:macmaN @lkraav) 2015-07-11 19:13:46 UTC
(In reply to Zac Medico from comment #66)
> 
> That would be a job for a tinderbox tool, or something like that. Currently,
> portage doesn't provide any tools like that, and it's questionable whether
> such a tool should be included with portage (rather than being a separate
> project that makes use of portage).

What's the optimal binpkg-multi-instance usage automation pattern until then? How do you use it, Zac? Just manually managing USE flags per individual package build however you need it?
Comment 68 Zac Medico gentoo-dev 2015-07-11 19:28:58 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #67)
> What's the optimal binpkg-multi-instance usage automation pattern until
> then?

There are many possible applications. For example, maybe a user just wants have an easy way to roll-back to one of the previous builds of a package. On a larger scale, it's possible to mix packages from multiple hosts/profiles in a shared PKGDIR, possible shared via nfs.

> How do you use it, Zac?

I use it as an easy way do roll-backs.

> Just manually managing USE flags per
> individual package build however you need it?

Since I only use it for the roll-back ability, the packages are built with whatever my USE/package.use settings are at the time.
Comment 69 waynedpj 2015-07-11 19:38:15 UTC
(In reply to Zac Medico from comment #68)
> (In reply to Leho Kraav (:macmaN @lkraav) from comment #67)
> > What's the optimal binpkg-multi-instance usage automation pattern until
> > then?
> 
> There are many possible applications. For example, maybe a user just wants
> have an easy way to roll-back to one of the previous builds of a package. On
> a larger scale, it's possible to mix packages from multiple hosts/profiles
> in a shared PKGDIR, possible shared via nfs.
> 

or possibly with http://ipfs.io ?  :)
Comment 70 Daniel M. Weeks 2015-07-14 17:10:47 UTC
I don't want to clutter this bug report, especially now that it's closed but I have been working on a solution for automated building of multiple binary packages. I have a small write-up on my blog: http://danweeks.net/p-blog?id=6