Home | Docs | Forums | Lists | Bugs | Planet | Store | GMN | Get Gentoo!
Not eligible to see or edit group visibility for this bug.
View Bug Activity | Format For Printing | XML | Clone This Bug
It might be useful, if useflags would be appended to the package names (those which are set for the package), so one could easily keep and test packages with different useflag combinations. This would also make it possible to create repositories of binary packages, for example for managing a set of office pcs, which mostly have the same configuration, but differ in a couple of useflags (for example you could have a "scan" pc and a "printer" pc, but they could get the packages from the same rep). Architectural and compiler specific differences can easily be taken into account by using a directory-structure.
While I agree that we need more info about a binpackage in it's name simply appending active use flags isn't the way to go.
What is nother way?
How about WONTFIX pending a more sane suggestion?
I just changed the summary, so that this bug now simply reads "Include more info about a binpkg". How this should be archieved doesn't seem to be clear yet, but as I understand, it is consensus among the three who answered in here, that binpkgs should somehow contain more information.
One more idea: Store binpkgs on a directory which contains a file with information about their associated useflags. Binpkgs which fit more than one directory, because their specific useflags didn't change, could just be created in the first one and then hardlinked into the later created directories. The directory name could just be the date when the useflags where last changed (and binpkgs created). This would mean, that portage would have to check on binpkg creation, if the file with information in the most recent binpkg dir contains the curent useflags. If it does, it could just use the directory. If it doesn't, it would have to check inside the file for the useflags of each package. If they aren't affected, it could simply hardlink the package into the new directory. If they are affected, it simply wouldn't hardlink them in. That way there would always be a directory with current binpkgs which would still look quite clean. The problem I see with this approach is, though, that creating binpkgs for several sets of different machines wouldn't be solved by this. One alternative I see would be, to take a hash of the useflags as directory name and to set a symlink on the most recent directory for easy reference. What do you think? Is bugzilla the right place to discuss this?
How about saving useflags in a directory structure like the following: machine_type/ active_package_useflags/ packages with same active useflags other_active_package_useflags/ packages with these active useflags. If the length of the list of active useflags exceeds a certain threshhold (i.e. 32 chars - the length of a sha1 hash in base32 encoding), it could be replaced by a sha1 hash over the full list. To check if we have a binpackage for a certain package, we only need to get the active useflags for the package (for example sorted alphabetically) and join them to a string. If the string is longer than 32 chars, we then compute the sha1 of the string in base32. Then we just check if the dir machine_type/use_string/ contains a binpackage we need. The full path to a binpackage could then look like this: /var/packages/x86_64-pc-linux-gnu/2BFXV4EC5B25XH6B7X7OXJ34GOCM2KM2/dev-lang/python-2.5.2-r8.tbz2 (2BFXV4EC5B25XH6B7X7OXJ34GOCM2KM2 is the Uppercase sha1 hash of "berkdb doc examples expat gdbm ipv6 ncurses readline sqlite ssl threads tk", and I used the CHOST for the machine type) Should the machine type also include the gcc version, the glib version or similar?
The layout should be more like $PKGDIR/$CATEGORY/$PN/$SLOTS_HASH/$USE_HASH/python-2.5.2-r8.tbz2 since we should support an arbitrary number of "slots". Slotable variables include things like, CHOST, multilib ABI, python version, perl version, and any other $LANGUAGE for which slotting makes sense. In addition to slotables, ideally the new layout should account for subpackages as well. Subpackages will allow you to have a binary package that's split into an arbitrary number of subpackages that separate the package into parts that can be installed separately.
Do you have an idea how the binpackage can support subpackages? Could that be done via symlinks and install-information? Or just install information with the info which files from which binpackage are needed? Do binpackages have to be safe or can they be identified via an ID file which contains binpackage and script hashes (or something different) for safe installation? But why do we need subpackage support? Every binpackage fits an ebuild, and ebuilds can be as small as needed. So subpackage support can simply be done via meta-ebuilds or similar. We don't need packages which are smaller than ebuilds, I think.
From a discussion in IRC (#gentoo-kde) we found one advantage of putting the hashes into the filenames: $PKGDIR/$CATEGORY/$PN/python-2.5.2-r8-$SLOTS_HASH-$USE_HASH.tbz2 Advantages: * The files can just be shared without having to preserve the directory structure. Also metadata could be added to the tail of the tar archives, so all data is preserved even when the filenames get lost.
We just found out that xpak already stores metadata in the binpackage, so we'd only need to add the filename or path changes and Gentoo could use a transparent binary layer :) (sounds a bit too hypeable, but I currently find no better word for that feature ;) )
I just learned why subpackages would be nice: seperate packages into sets for different USE flags. But to me it looks like that would be a lot harder than just the simple changes to allow for a simple binpackage structure needed to allow for seperate binpackages with different useflags and other SLOTs.
I'd like this porblem to be solved too. I don't like to compile one package twice, it is time consuming. As I know, a binpkg depends no only USE FLAG, but also: CFLAG,CXXFLAG,LDFLAG etc. I think the structure of /var/db/pkg is good for this. /var/db/pkg is the database to track installed packages. /var/db/pkg's structure: /var/db/pkg |----dev-util |----cvs-1.12.12-r4 |----CBUILD |----CFLAGS |----CHOST |----CONTENTS |----COUNTER |----CXXFLAGS |----DEPEND |----EAPI |----FEATURES |----IUSE |----KEYWORDS |----LDFLAGS |----cvs-1.12.12-r4.ebulid ... I think the binpkg repository's structure would be like: binpkg_repository |--dev-util |--cvs-1.12.12-r4 |--binpkg1 |--CBUILD // i686-pc-linux-gnu |--CHOST // i686-pc-linux-gnu |--CFLAGS // -O2 -march=i686 -pipe |--CXXFLAGS // |--KEYWORDS // alpha ~amd64 ... |--IUSE // crypt doc emacs ... |--LDFLAGS // -Wl,-O1 |--cvs-1.12.12-r4.tbz2 // binpkg file |--binpkg2 |--CBUILD // i386-pc-linux-gnu |--CHOST // i386-pc-linux-gnu |--CFLAGS // -O2 -march=i386 -pipe |--CXXFLAGS // |--KEYWORDS // alpha ~amd64 ... |--IUSE // crypt ... |--LDFLAGS // -Wl,-O2 |--cvs-1.12.12-r4.tbz2 // binpkg file note: 1. the name of "binpkg1","binpkg2" is to be discussed, i don't know what it should be currently. 2. i don't know which of CHOST and CBUILD represents the destination architecture, the dest arch should be put in the binpkg repository, we don't need src arch
when emerging a package, portage first check if current settings are same as the settings in the "binpkg1","binpkg2" directory. if same, portage use the tbz2 file directly, or portage will comiple the package, and creates a new "binpkg3" dir, put the new generated tbz2 file and current settings in.
I think that binpackages should be easily shareable, and that wouldn't be the case with all the single files floating around. But the files in /var/db/pkg show all the necessary information which has to be included (excepting information which is in teh ebuild. We emerge stuff via ebuilds, so any information which is already in the ebuild can be left out). For this information we need to be able to do the following two seperable actions: - Check the environment for one given binpackage - Find a binpackage which fits our environment. The first can already be done via the xpak tail of the binpackage tar archives, so there's no need to change anything for that. In the second one we don't need to be able to read the information. We just need to be able to check, if it fits our system. To archieve that, we can just store a hash of the environment in the filename or path of each binpackage. To optimize this, I would seperate it into two parts: One which doesn't change very often and can be used to identify one type of system (i.e. amd64 with standard optimizations) and one which varies from user to user (i.e. USE flags). (this very clean idea for this comes from Zac) The first parts gives the "SLOT", the second part the active USE flags of the package. To make it more human readable, I'd first turn the part (i.e. "SLOT" or USE flags) into a string and only hash it, if that string is longer than the hash would be. As Hash I'd use sha1 encoded as Base32, uppercase, since sha1 is quite safe against collisions and Base32 doesn't contain any characters which have special meanings in package names.
I would prefer to have the USE-flags on a per-package basis, because they really depend on a package. Therefore some hierarchy like .../category/package/use-flags/package-version.tar.bz2 or .../category/package/package-version.use-flags.tar.bz2 would be nice. I would prefer the last one. The USE-flags could be managed in the following way: Require a revision bump if a package changes IUSE (might be hard if an eclass changes something, but definitely doable). Sort the USE-flags and create a binary string with a 1 for every enabled flag and a 0 for every non enabled one. Well, this is not a string, ist a binary number with possibly leading zeroes, compress those into hex or base64 or whatever and use it to indicate the USE-flags. For most packages the number of use-flags is small, for some with lots of USE-flags the filename length would be increased by 1 for every 6 USE-flags. Should be ok, I think. Another thing that might be needed is a special binary revision. If you have package foo, that debends on lib bar, and a new version of the lib hits the tree with a different ABI (or whatever, that foo needs not be changed but recompiled), it would be nice to indicate that foo should be reinstalled, by increasing the binary revision.
The problem I see with bitwise use string is that it will never be human readable. The binary revision is something else, because this kind of dependency doesn't get hand-edited by the user, so it doesn't benefit from being user-readable. I think when including this we'd have three seperate elements: * Local system settings: Active USE-flags. * Binary compatibility requirements: The necessary libs and dependencies. * Host SLOT: Hardware requirements. A package with wrong USE flags can be installed anyway, a package with wrong binary requirements just won't work, though, so these should be kept seperate. The Host SLOT signifies a cathegory which fits for all packages created by a certain computer and shouldn't change as long as no libraries change in a way which affects the whole system (like the glibc major version), the user doesn't play with CFLAGS, and similar (like installing a new CPU :) ). Does this seperation sound useful, or did I overlook/misunderstand something? Besides: From what I see in a random binpackage (fretsonfire), the IUSE part of the xpak looks like the right part for the USE flag info of the name, so we could just use that directly without having to do much error-prone conversion. Remember that the binpackage name just needs to have an ID with which it can be found; it doesn't need to include anythign which can be found in the ebuild, since anyone installing a binpackage should also have the corresponding ebuild. The requirement for each ID element of the name is, that it can be generated directly by any Gentoo installation which only knows the ebuild and the already installed libraries. Question: What do we do if we don't already have one of the binary requirements? How can we then find the correct binpackage? Does the info contained in a binpackage suffice? What I see (in openarena-0.8.1.tbz via vim) is: CXX: x86_64-pc-linux-gnu-g++ NEEDED: /usr/games/bin/openarena-ded libdl.so.2,libm.so.6,libc.so.6 /usr/games/bin/openarena libSDL-1.2.so.0,libpthread.so.0,libGL.so.1,libvorbisfile.so.3,libvorbis.so.0,libogg.so.0,libdl.so.2,libm.so.6,libc.so.6 CFLAGS: -O2 -pipe -march=k8 NEEDED.ELF.2: X86_64;/usr/games/bin/openarena-ded;;;libdl.so.2,libm.so.6,libc.so.6 X86_64;/usr/games/bin/openarena;;;libSDL-1.2.so.0,libpthread.so.0,libGL.so.1,libvorbisfile.so.3,libvorbis.so.0,libogg.so.0,libdl.so.2,libm.so.6,libc.so.6 Also there seems to be some binary "DEPEND". I don't know what it contains, though, so I can't judge if it would be needed for binary compatibility. Can we get these values from our own system (fast enough -> without building the package ourselves), so we can just include a hash with which we can find a fitting binpackage and the right libs for it? And if yes: Which of these values have to be included as hash in the package name for binary compatibility, and which are systemwide, so they can be included in the Host SLOT?
I'm not sure what you mean by "user readable" is you consider a hash over a long string more readable than a packed presentation of that string. The last one, could be converted to the user readable format, not the first one. If you write about CFLAGS: Maybe it would be wise to leave CFLAGS out, that don't do anything (at least concerning the binary). Something like "-march=prescott -mmmx" and "-march=prescott" are the same, "-pipe" only changes the behaviour of gcc, not the created binary. I usually try to make gentoo behave like other binary distros when it comes to binary packages. And they only have the version and this "binary revision" or whatever and maybe the right dependencies. But to make it usable, we don't even need dependencies and which LIBS are needed and all that stuff. Portage checks that USE-flags and all that stuff matches and then the package with the biggest binary revision is installed. That works without lots of magic and foo and before we discuss here more and more, we shoudl maybe implement a simple working system instead of trying to create a complicated, error-prone one.
By usser-readable I mean the default string representations of the active useflags, which are only hashed, if they get longer than the hash (that's what I proposed above). That way a package with three active USE flags just has these USE flags in the package name, while a package with many active USE flags (like mplayer) has a simple hash. The advantage here is that only the active USE flags are tracked - for most packages simple no USE flag, for most others only one or two. For the Host SLOT I honestly don't care much, but I think that a hash over the string will be far easier to write and especially to maintain. Also a hash has a guaranteed length, which isn't true for a bitmask (though the difference in length might make that difference fall away). I think that CFLAGS should be in, though, since some users have kinda crazy settings in there which only work for their specific setup. The problem with binary dependencies is, that the ebuild might state that a lib is compatible, since it is recognized in the configure process, but it can well happen that you have to revdep-rebuild to make your programs work again after a library update. If you do a binary installation, you can't just rebuild with the new lib, and the two binary packages (one for old lib one for new lib) must not be mixed - so they should have different names and the names should make it possible for portage to decide which binary package to install. Gentoo isn't just a binary distro with a fixed set of packages which all use the same libs. Different from Ubuntu and similar the basic libraries can get updated while all the rest stays version-fixed.
Your hash-system has 2 disadvantages: Binary packages of the same source package are scattered all over the tree (directory tree). Also, if you have the hash, you are less than unreadable for the user. For my system, you write 2 simple conversion functions and can have command line tools, web interfaces, everything you need. It's all the same, no special cases. Also, I don't see any advantage in only tracking active USE-flags. It mages a packed representation harder. CFLAGS are a story of their own, that's for sure. Maybe one should create an assembler file with -fverbose-asm and record those CFLAGS and not the ones from the make.conf. Last about the library stuff. If you update a library and therefore need to update an application, we proposed 2 ways: You want to save library dependencies in the file (which might not be enough, since the ABI might change without the filename, which really sucks) and then you have to calculate, download several packages or the metadata of them, check which is usable, install it. In my version the build system would create the new package for the lib, check which application is broken (ok, lots of apps need to be installed here) and then just build new packages for them. It would then increase the binary revision by one and commit all the packages in 1 transaction to the server. So, whenever you get the chance to install a lib that might break some application, either you installed the app froma binary and therefore you get a new binary package, because the binary version is higher, or you compiled the package on your own and have to reinstall it on your own. And, last but not least, there is still the preserved libs feature of portage which can help here.
> In my version the build system would create the new package for the lib, check which application is broken (ok, lots of apps need to be installed here) and then just build new packages for them. It would then increase the binary revision by one and commit all the packages in 1 transaction to the server. How do you make that work with multiple unconnected servers? What happens if you and I rebuild at the same time and send the files to different servers? And what do you do if you don't have all applications installed which depend on the lib? I'd like this system to be robust enough to support community-built binpackages (with some trust system to ensure that you can always find the responsible person if something breaks). For the USE flags I doubt that you would get any real gain by doing USE flags bitwise. From what "eix -I" and some grepping and seding tells me, most packages have at least two USE flags, but none of them is active. So using only the active USE flags has no cost here, using a bitmask takes one char. Also the simple advantage is, that people can look at the binpackages and *see* with which USE flags they were built. Last to the hash system: There is one binpackage per installed ebuild, just as in the current system. The only change is that the name of the binpackages gets two extra parts appended: USE flags and a Host SLOT. (Three with binary compatibility) So it's the current system + a way to find matching binpackages for different configurations. The binpackages themselves already contain ways to check, if they can be used, but we currently can't search for this information efficiently. Did you read the whole discussion in this bug? I am currently leaning more towards adding the USE hash and SLOTs hash to the filenames, since these could be shared more easily, but using a directory structure is cleaner.
(In reply to comment #20) > How do you make that work with multiple unconnected servers? With some kind of locking, it can work. > What happens if you and I rebuild at the same time and send the files to > different servers? Nothing happens, because the servers are different. If you want to sync them, you have to think of something. > And what do you do if you don't have all applications installed which depend on > the lib? You figure out, if you usually build binpackages for them and if you do, you install your latest binpackage and check it. > I'd like this system to be robust enough to support community-built binpackages > (with some trust system to ensure that you can always find the responsible > person if something breaks). The thing is, I discuss some kind of build server while you discuss the distributed system. Each have different applications, but creating a binary package that only supports one would be stupid. Therefore I suggest adding the binary revision and handling the packages built by your distributed system like live ebuilds, with -b9999 or something like that. Those yould be unmased with a FEATURE or some other flag in the make.conf. Therefore one can decide to only use the packages built on the trustworthy server or also use the packages built by the distributed system. > For the USE flags I doubt that you would get any real gain by doing USE flags > bitwise. From what "eix -I" and some grepping and seding tells me, most > packages have at least two USE flags, but none of them is active. So using only > the active USE flags has no cost here, using a bitmask takes one char. > > Also the simple advantage is, that people can look at the binpackages and *see* > with which USE flags they were built. It's difficult to talk about cost here, since it does not really exist. We get a problem if filenames and paths get too long to be suported by http, the filesystem or whatever. This happens with none of our representations. You propose a mixed representation based on the combination on use-flags and a hash. I don'T like it because it's not clean: Representation depends on what is represented, binary packages for the same ebuild are scattered all over the place. If a package gets a new USE-flag or loses one, the next binary package might end up in a completely different directory. How is that easily usable? How can a user find out if a new version is available without the help of some tools? How much time and effort would it be to delete binary packages because of security issues, license problems or other reasons? How many different directories would you end up with and are they supported on currently used file systems? Is the access fast enough to be practical? > Last to the hash system: There is one binpackage per installed ebuild, just as > in the current system. > The only change is that the name of the binpackages gets two extra parts > appended: USE flags and a Host SLOT. (Three with binary compatibility) How do you handle packages being built against different library versions? They would end up with the same filename. And not every user can update every library just to overcome a problem in the naming scheme. > So it's the current system + a way to find matching binpackages for different > configurations. The binpackages themselves already contain ways to check, if > they can be used, but we currently can't search for this information > efficiently. You could provide a cach file on the server. RPM uses some kind of index file. > Did you read the whole discussion in this bug? I guess so. I skiped uninteresting parts and parts, were the same was told again and again. If you think I miss a special comment, please tell me which it is. One big problem with this discussion is, that it was scattered between bugzilla and the soc maininglist. I tried to keep it seperated, but it does not seem to be possible. You refer to your distributed system before mentioning it once in this bug. > I am currently leaning more towards adding the USE hash and SLOTs hash to the > filenames, since these could be shared more easily, but using a directory > structure is cleaner. Other distributions use different directories for different SLOTS, how you call them. I would stick to this. Gentoo also uses different directories for different architectures. I would also prefer to have the USE-flag in the filename, sionce that describes the package itself and not the "linux distribution" (replace this with a better name, iff you want) which it belongs to.
> If you think I miss a special comment, please tell me which it is. It's not one post but the three different approaches: * SLOT and USE in directory structure * SLOT in directory, USE in directory * Both in filename. And the reasons for using readable USE flags in the filename. > One big problem with this discussion is, that it was scattered between > bugzilla and the soc maininglist. I didn't know that there was discussion on this in the soc list. Your you give me a link? > I tried to keep it seperated, but it does not > seem to be possible. You refer to your distributed system before mentioning it > once in this bug. That is a long-term goal we talked about in IRC a few years ago and I seem to have missed writing it here before... sorry for that. In short: It would be great if we had a way for users to get trusted binpackage providers and for them to tell portage to use binpackages whenever possible and to create and upload new ones, where the binpackages don't exist, yet. As soon as this works, it could be extended to a trusted p2p network in which people simply share their binpackages and download those binpackages they need. The idea comes from the experience that sharing the distfiles in Gnutella led to many people downloading them. > Other distributions use different directories for different SLOTS, how you call > them. I would stick to this. Gentoo also uses different directories for > different architectures. I would also prefer to have the USE-flag in the > filename, sionce that describes the package itself and not the "linux > distribution" (replace this with a better name, iff you want) which it belongs > to. The name SLOT comes from zmedico, though I'd love to be able to claim that it was my idea :) By using a Hash, the SLOT can contain many different host type definitions, and the system won't have to change a bit when the included information changes - it's still just a hash over stuff we look for. Using a SLOT dir for machine type and USE flags in the name sounds also good to me. With USE-fllags in the filename the files don't get scattered, by the way. It would look similar to this (needs to be checked against package naming scheme, if it's compatible): SLOT1/ portage-2.2_rc25-epydoc.tbz portage-2.2_rc27-epydoc.tbz python-2.5.4-r2-GYFBQKRKWSM67J6GO63XUE3JPDIJUSRA.tbz ... SLOT2/ ...
(In reply to comment #22) > > One big problem with this discussion is, that it was scattered between > > bugzilla and the soc maininglist. > > I didn't know that there was discussion on this in the soc list. > > Your you give me a link? I somehow had identified you with the person I'm writing with on gentoo-soc. You can find the thread at <http://archives.gentoo.org/gentoo-soc/>.
(In reply to comment #22) > In short: It would be great if we had a way for users to get trusted binpackage > providers and for them to tell portage to use binpackages whenever possible and > to create and upload new ones, where the binpackages don't exist, yet. > > As soon as this works, it could be extended to a trusted p2p network in which > people simply share their binpackages and download those binpackages they need. Ok, but this is beyond the scope of this bug. Here we should discuss htings that would make sense to be added to the binpackage as metadata which would enable everything we want to do with it. > By using a Hash, the SLOT can contain many different host type definitions, and > the system won't have to change a bit when the included information changes - > it's still just a hash over stuff we look for. The problem I see is, that it will become hard to find out, which binpackages for your system are available. Maybe you don't care about USE-flags or one about one special CFLAG, then you might want to find not the same SLOT, but almost the same one. Would that be possible?
> Maybe you don't care about USE-flags or one > about one special CFLAG, then you might want to find not the same SLOT, but > almost the same one. Would that be possible? If you don't care about USE flags, you just ignore the USE hash part of the binpackages, so this is easy. But I see an advantage of your approach here: It would be easier to just compare bitwise "does the package have the USE flags I need, I don't care about additional capabilities". As much as I dislike losing readable USE flags in the binpackage, this is a major advantage, since you can then do more complex checks from the names without having to download the binpackages and checking the xpak. But at the same time doing this on the binpackage level kills dependency tracking, since a new USE flag can imply a new dependency which is needed for getting the binpackage to run. This means you'll have to recalculate dependencies for every alternate USE flag combination, which is so expensive that any string conversion or hash algorithm pales in comparision. Just saying "I don't care about USE flags" only works for mostly selfcontained packages, else you need to enable he right USE flags. For CHOST and similar (the Host SLOT): How do you decide what is unimportant? Anything which isn't a Hash is in danger of becoming arbitrarily long if you need to include more information, and if you leave stuff out, you take away the option of using that information - and in fact force users to use packages with non-fitting settings. With the Hash it is far easier to change the included information later on without breaking earlier versions: Just change the information and hash again and all resolution will still work, even for old versions (they will just not see the new packages as being compatible, but they won't get false information). The only thing we need is being able to find the files based on known strings. To allow for more complex comparisions with hashes we'd just have to hash each combination of active USE flags -> since most packages have just two or three USE flags there are only a few possible combinations. And since we'll have to recalculate dependencies anyway if we use different USE flags, the cost of doing multiple hashes is negligible in comparision.