Summary: | [PATCH] prevent ebuilds of installed packages from being deleted in emerge --sync | ||
---|---|---|---|
Product: | Portage Development | Reporter: | vicaya <gentoobugs> |
Component: | Enhancement/Feature Requests | Assignee: | Portage team <dev-portage> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | aleksei.romanenko, bothie, bugs+gentoo, dagurasu15, drescherjm, jakub, le.petit.fou, sam, tightcode, zmc |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=349719 https://bugs.gentoo.org/show_bug.cgi?id=662070 https://bugs.gentoo.org/show_bug.cgi?id=722868 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Prototype script to backup ebuilds of installed packages
prototype script with experimental manifest support Portage patch to fix some manifest bugs triggered by the preserve script |
Description
vicaya
2006-03-13 11:27:48 UTC
What is your goal with this, if you need to read the ebuild it's still present in /var/db/pkg, if portage is giving you crap about missing ebuilds/packages ( aside from revdep-rebuild, which I think still whines ) where is that happening? All installed ebuild can be found in /var/db/pkg. But I don't know, if all neccessary files from the files directory like patches or some start and configuration scripts will be preserved that way too. So, maybe it should be enough, to copy them to /var/db/pkg too, so that it "would" be possible to restore a complete "old" portage tree that reflect the live system. This is probably another case where a generic RSYNC_OPTIONS var could help (together with some bash magic). Alec, I meant the digest and the files (patches etc refered in the digest) should preserved as well as the ebuilds. I can write an emerge wrapper to generate RSYNC_EXCLUDEFROM from /var/pkg/db as a stop gap measure. But I'd rather not maintain the script, because portage might decide to change the /var/pkg/db layout/format... I think emerge sync should just do the Right Thing in this case. i dont think this is really doable ... you may be able to calculate the ebuilds/digests to keep, but there's no way to figure out what should be left behind in $FILESDIR ... you'll also hit plenty of "errors" from emerge because ebuilds/digests are being found that arent recorded in the Manifest > (In reply to comment #5) > I can write an emerge wrapper to generate RSYNC_EXCLUDEFROM from /var/pkg/db as > a stop gap measure. But I'd rather not maintain the script, because portage > might decide to change the /var/pkg/db layout/format... > > I think emerge sync should just do the Right Thing in this case. > I'm looking for the rationale, more than the implementation. What are you trying to do, versus how to achieve it. At present, portage doesn't need all of those things (manifests,digests, FILESDIR) once the program is installed. What are you trying to do with them? For example, Company A has a default server install that has pkgFoo on it, and pkgFoo is now no longer in the tree, thus new server installs fail. Company A can now either fetch all the relevant files from the attic, or they can redo their server install to not use pkgFoo, which could potentially be time consuming. In that case the files are useful to store on Company A's servers in order to keep installing pkgFoo. However I wonder at why emerge needs to support this at all. At best I can see some auxillary script/program/thing. The rationale: being able to recompile/reinstall installed packages without much hassle (what if a library got upgraded/glsaed? revdep-rebuild can pretend that the "obsolete" packages don't exist, but that's just wrong.) This feature is essential for gentoo to be used in a production environment. Being able to easily build (and rebuild) from source is one of the important reasons we chose gentoo. We should make gentoo in production easy, not hard. The problem with Manifest shouldn't be too hard to fix, BTW. As Mike said, a rsync exclude list alone won't help. The best solution I can think of is this: Before each sync generate a list of CPVs to preserve, copy the related ebuilds and _all_ MISC and AUX files from portdir into an overlay and regenerate the Manifest (eventually reusing entries from the original one). Still not perfect though as that way you will miss any ebuild updates (updates without revbump) for installed packages. Not sure if this is something to integrate into emerge or do via a hook (for this we'd have to add a presync hook). I have run into this problem as well but not as much in the situation of a package which no longer exists but a version which is no longer in portage. It becomes impossible to rebuild the package (required to updated dependencies, gcc, any other reason to rebuild) and thus forces in some cases an undesireable upgrade. Being forced to upgrade instead of rebuild solely because the ebuild and required files have been removed from portage for an installed-package version has caused me a bit of grief in the past. I see it as vital that any package-version installed have the ability to be rebuilt at the users whim. Created attachment 107529 [details]
Prototype script to backup ebuilds of installed packages
This script implements the basic idea I outlined in my last comment, except that there it doesn't copy/generate the Manifest file yet (a bit tricky and the related portage code has a few bugs that need to be fixed first).
Also it doesn't maintain the generated backup overlay (= won't remove ebuilds of unmerged packages/versions) and isn't limited to deleted ebuilds.
Feedback appreciated.
Created attachment 107548 [details]
prototype script with experimental manifest support
This version has experimental support for regenerating the Manifest files in backups. I've limited it to Manifest2, that that if a package doesn't yet support Manifest2 (about 20% of the tree) and doesn't have all SRC_URI files fetched the script can't regenerate the Manifest.
It also needs the attached portage patch to fix some previously undetected bugs.
Created attachment 107549 [details, diff]
Portage patch to fix some manifest bugs triggered by the preserve script
Please note that bug 48195 has been fixed before considering this feature request as a workaround for that bug. There are a few issues with this request that come to mind: 1) distfiles may cease to be fetchable for ebuilds that have been removed from the live tree (though persistence previously fetched files in $DISTDIR can help avoid this issue). 2) It's possible for conflicts to occur if multiple ebuilds (multiple version and/or slots of a given package) from different snapshots of the main tree are merged into a single backup overlay. 3) The backup isn't truly complete unless it includes a snapshot of the eclasses. Rather than make a backup at sync time, perhaps it would make more sense to do it when the package is initially installed? First I have to agree that is a very important issue. At the very least, if it is not fixed, then the package.mask/unmask documentation should clearly explain that freezing versions will eventually fail if the package is not in the overlay. Second documention should clearly state that emerge -U foo without a -D will in fact still upgrade dependancies that don't necesarily need upgrading once they are not in portage (possibly a good thing, but not what one would necessarily expect and it creates more updating that you might think you didn't asked for) Second I and barefootcoder have a much more advanced script here to handle this: http://forums.gentoo.org/viewtopic-t-533794-highlight-.html Maybe I should post it here when I get another minute. This uses hard links to save lots of disk space, hard links are sligthly contraversial becasue of the whole multiple file system thing, but the overlay could always be put in the portage directory with a corresponding rsync exclude and then the only requirement is that the portage directory is on one filesystem, which seems like a reasonable requirement anyway. The idea is by hard linking all the parts of the portage tree which are installed to an overlay, no extra disk space is used, but when the files are removed from the portage tree, the hard links automagically keep a copy in the overlay at which point the disk space is used, so the space only gets used for installed package that aren't in portage. No more space is needed even temporarily. Manifest cleanup is handled neatly and only as needed. Digests are copied and manifests are rebuilt as needed. There is no problem retrieving most files, patches etc (except for distfiles stuff, and that can probably be worked on a bit and anyway distfiles aren't deleted by portage by default) because it is linked to the overlay BEFORE it is removed from portage in the first place. > > 1) distfiles may cease to be fetchable for ebuilds that have been removed from > the live tree (though persistence previously fetched files in $DISTDIR can help > avoid this issue). If you want to keep a package, then don't delete the distfiles, portage doesn't do this by default, but I'm not against the idea of doing more for this issue though. > > 2) It's possible for conflicts to occur if multiple ebuilds (multiple version > and/or slots of a given package) from different snapshots of the main tree are > merged into a single backup overlay. I don't understand this. Don't all ebuilds have unique version numbers. It sounds like there probably is a detail here that I'm missing. > > 3) The backup isn't truly complete unless it includes a snapshot of the > eclasses. Ok, people who are really against this stuff always mention this, but isn't the eclass a separate well known bug in and of itself and doesn't it strike even if you simply don't upgrade often enough? Isn't backward compatibility supposed to be required(at least in principle) for all eclasses until a better system is in place and isn't a better system already under development?(or is this just for uninstalling and it's a diferent issue for installing) It does not seem like a system to avoid unwanted obsolesence should not be required to fix every other bug in portage that could interact with it though. These are not rhetorical questions s I'm not an expert on the eclass problems. > > Rather than make a backup at sync time, perhaps it would make more sense to do > it when the package is initially installed? > I don't see that it matters much. The good thing about the way this script does before and after sync operations is that only packages that are no longer in portage and that are installed stay in the overlay. I guess if you're using hardlinks anyway though it probably doesn't matter if you have redundancy between the overlay and the portage tree, does it? I think what should be avoided is having true copies, not hard links, of ALL installed packages. Unfortunately though if you use copies(which I'm sure the mainstream would prefer), it's hard to know which ones to copy until they are already removed from portage at which point you can't copy them and so it's hard to get around backing up ALL installed packages in that case, which is the whole bummer about using copies. I guess you have to be more clever than me is all, which shouldn't be too hard. There may be some difference in the amount manifest rebuilding required for the two plans but that depends on many things. You could also have a scheme where only things in package.unmask get backed up. That would not be unreasonable behavior. It fixes the issue of being able to explicitly freeze certain pakcages with one config change. It doesn't do as much to save on general uneeded updates of stuff with emerge -U foo commands though, but that's a bit of a separate issue and probably harder to sell(although I like it and I'll keep arguing why). p.s. I meant "emerge -u" of course. also I should probably be careful saying this script is "more advanced". I think certainly has some nice features that have been thought about for a little while now, but it sounds like Marius Mauch may know a thing or two to worry about Manifest rebuilding that may or may not be handled well in the safesync script. Good discussions here. it seems the easiest non-intrusive workaround would be: 1. backup (hardlinks are unnecessary, as we're only backup a fraction of the portage tree) the portage dirs (e.g. dev-libs/something) of the installed packages to the "save" overlay. 2. to maintain the overlay on every emerge, copy the portage dirs of the just emerged packages to the "save" overlay *after* they're successfully installed. 3. 1,2 can be done with an external util/wrapper for portage. However, to make things clean, portage needs to maintain a separate "save" overlay, in addition to the usual local overlay, which are typically for experiments. I propose "save-emerged" as portage FEATURE. When it's turned on. emerge would do 1, 2. Sounds pretty good. Actually I just tested the safesync script with copying and no post-sync removal of files still in portage just so I could measure the space used. On my kde system with probably a typical amount of workstation software it was in fact only 32 M so it does seem like that's not so horrible. Maybe if you've installed the whole portage tree it's a bit, but then I guess you'll already be prepared to need some disk space anyway. Just a couple of minor things that will have to go in the details, when you emerge something and rewrite or add stuff to the the saved package directory, you need a provision for the possibility that there may be a second slotted version installed. So you can't just blindly delete the old saved directory. If you append it, you need to fix the Manifest afterwards (not the digests I think so long as you copy them in unless there's some version compatibility issue with mixing diferent digests in one directory). Not a big deal. But it's probably best to remove crust too(unless the user want's to keep the ability to reinstall anything that was ever installed before... which can be nice, sounds like an optional feature to me though, because this CAN get big for people who update often, maybe a "protect for n days" after uninstall behavior would be cool down the road?) so maybe check for the slotted installs in advance and if there are none, then clean the old contents before copying, or else always append and then fix the Manifest. Also, in my test with copying, the one thing I did notice is that it is MUCH slower than hard linking. On the scale of emerge sync times, it's significant but still much less than the whole sync time. It would be simple enough to keep the hard links optional, but maybe would cause more bug reports than it's worth so I'd understand leaving it out. Oh and the slowness with copying is only an issue on the first backup anyway, since in vicaya's plan they'll just be copied at emerge time after that. Is it just me, or did the latest version of portage make things better? I'm now running portage 2.1.2-r9. I'm not sure how recently the improvements came. It seems like now if you directly try to update an installed package that you don't have an ebuild for, or for which all the available builds are masked, it will of course complain, but if you do something like emerge -uDp world, it won't get bent of shape over it even if it's a top-level package. In fact it will even still update the dependencies of said package even if they aren't in the world file. I verified that the dependencies are actually pulled in through the package in question by using the --tree option. It seems that emerge is of course still taking into account the fact that the package is still installed, which is exactly what we'd like. However, just for fun I tried to add a completely bogus package to the world file. Even then emerge managed to continue albeit with an error that my world file was messed up. This isn't a complete solution. First, one probably should be careful about updating depandencies of a package that you no longer have an updated ebuild for, but that's somewhat controllable and anyway is an issue for any solution. More to the point though, is the simple fact that you still end up losing the ebuild and related files. If you need to recompile the outdated package, maybe because you changed some use flags or did a major compiler upgrade, then you'll still be stuck without your ebuild. Otherwise though you should be in good shape and won't be forced to upgrade. So this is the same behavior my original savebuilds script produced and I was mostly happy with that. I think it's a nice improvement. Ok, the main questions left here are: a) how to deal with eclasses? This isn't completely fixable as two installed packages could have been installed with different versions of the same eclass, so a compromise is needed here (I guess a MRU policy would be best here), unless we wait for the implementation of glep 33 b) should ebuilds be removed from the backup overlay on unmerges? c) how many old versions should be kept on updates? all of them, or should there be a limit? (e.g. in the update sequence a1->a2->a3->a4->a5, should we keep a1-4, or just a4, or a3-4, or ...) d) do we need a way to check for ebuild updates (=ebuild in $PORTDIR is newer than the same version in the backup overlay), or is that the job of the user? e) do we have to care about "global updates", e.g. if a dependency is renamed in the tree? Or do we assume that all deps are also kept in the backup overlay and leave it to the user to sort things out? Are slotmoves covered by d) ? (In reply to comment #22) > Ok, the main questions left here are: > a) how to deal with eclasses? This isn't completely fixable as two installed > packages could have been installed with different versions of the same eclass, I'm not expert on this at all, but I like the idea of fixing the real problem, ie glep 33. If there's a way to get around it for now, great. > so a compromise is needed here (I guess a MRU policy would be best here), > unless we wait for the implementation of glep 33 > b) should ebuilds be removed from the backup overlay on unmerges? In my/barefootcoder's safesync script (http://forums.gentoo.org/viewtopic-p-3857340.html#3857340) , this is optional. It's controlled in a configuration variable, but what I've decided I like better is to be able to call it with a cleanup option from the command line. Then, whenever you're happy that you like the state of your system and aren't planning to go back, you can call the cleanup option. > c) how many old versions should be kept on updates? all of them, or should > there be a limit? (e.g. in the update sequence a1->a2->a3->a4->a5, should we > keep a1-4, or just a4, or a3-4, or ...) see answer to a, but maybe a maximum number of backups is still reasonable too. > d) do we need a way to check for ebuild updates (=ebuild in $PORTDIR is newer > than the same version in the backup overlay), or is that the job of the user? I remember playing with this once. What you want is an "underlay" instead of an overlay. Pretty sure my un-orthodox solution was to declare the portage tree itself as a higher priority overlay. This probably wastes a small amount of time somewhere, but I believe it worked. One could make a less hack-ish solution I'm sure, (could just delete redundant overlay ebuilds). But I don't think this issue is a very big concern anyway. > e) do we have to care about "global updates", e.g. if a dependency is renamed > in the tree? Or do we assume that all deps are also kept in the backup overlay > and leave it to the user to sort things out? Are slotmoves covered by d) ? > If the dependency is renamed, then the package of the original name will no longer exist in portage. Lucky for us we're talking about making a system to keep around installed things that aren't in portage, so it will still work just fine. The problem is if you go trying to solve d, then you could cause problem e to force an upgrade to the renamed dependency as listed in the newer ebuild, which may not be the end of the world, but maybe isn't desired. I don't mind simply not solving d. Also if the user (or another package) goes and updates the dependency to the newly named one thus breaking the old dependant ebuild, maybe the user has to sort that out. That's kind of a corner case and to be expected when not keeping something updated. The safesync script removes redundant overlay ebuilds, thus solving d, but possibly forcing the upgrade mentioned in e. Overall I think this stuff is all MUCH less critical now that portage recognizes old installed packages when checking dependency sanity. Ok, you don't have your old source code and ebuilds when you want to recompile, but at least portage doesn't complain now that the package isn't installed when it is. Oh god please no... someone tell me this is a joke and please kill this bug. Use the stuff in VDB and/or http://sources.gentoo.org and/or anonymous CVS if you can't live without unsupported wiped cruft - if a maintainer doesn't want to support something and nukes the ebuild, they do it for a reason, go grab and maintain such things in your overlay yourself, not in the official tree. We already get more that enough bugs about 'missing' distfiles for removed stuff, go ask release folks about this. (In reply to comment #24) > Oh god please no... someone tell me this is a joke and please kill this bug. > > Use the stuff in VDB and/or http://sources.gentoo.org and/or anonymous CVS if > you can't live without unsupported wiped cruft - if a maintainer doesn't want > to support something and nukes the ebuild, they do it for a reason, go grab and > maintain such things in your overlay yourself, not in the official tree. > > We already get more that enough bugs about 'missing' distfiles for removed > stuff, go ask release folks about this. > If you don't want to accommodate someone's desires that is ok. Obviously things that aren't maintained could be out of date and obsolete. This is true. There are many reasons to have this functionality and many serious reasons why not having it breaks things... even costs real cash in the real world (one reason I do not run gentoo on my mission critical work computers, nor does anyone I know in my entire industry... which does use linux almost exclusively on such computers). The good news is that this is largely fixed now anyway (but not entirely). Anyway, the fact that you like it the way it is doesn't mean other people don't want options(the whole idea of gentoo once upon a time). The fact that people want options doesn't mean you have to provide them. So please do not feel the need to tell me what to maintain; I did not tell you to do anything. I don't really know if God cares much either way though, nor am I sure why you bring him up, but I do know that he loves you. Peace. and by the way, nobody asked for anything to be kept in the official tree. You may want to read the thread. We only asked about what he official tree should do to the stuff that is already on our computers. *** Bug 232488 has been marked as a duplicate of this bug. *** Hi all and sorry to come on this very old thread, but a very critical one for using Gentoo in real life. Possibly the solution brought is not complete since I meet this problem : I have virtualbox-4.0.12 and co (app-emulation/virtualbox-additions, app-emulation/virtualbox-guest-additions, app-emulation/virtualbox-modules, x11-drivers/xf86-video-virtualbox), all 4.0.12, and I masked upgrades into /etc/portage/package.mask with >xxx-4.0.12 Now, here is what a emerge -aNuDv world brings me : ------------------------------------------------------------------------ app-emulation/virtualbox-modules:0 (app-emulation/virtualbox-modules-4.0.12::gentoo, installed) pulled in by ~app-emulation/virtualbox-modules-4.0.12 required by (app-emulation/virtualbox-4.0.12::gentoo, installed) (app-emulation/virtualbox-modules-1.6.4::dev-zero, ebuild scheduled for merge) pulled in by (no parents that aren't satisfied by other packages in this slot) emerge: there are no ebuilds to satisfy "~x11-drivers/xf86-input-virtualbox-4.0.12". (dependency required by "app-emulation/virtualbox-guest-additions-4.0.12" [installed]) -------------------------------------------------------------------------- 4.0.12 are not anymore in portage, but I can find them in /var/db/pkg which might be from my understanding what you put in place to solve the issue raised by this thread. But the ebuild for the dependency ~x11-drivers/xf86-input-virtualbox-4.0.12 is nowhere. I am puzzled because it is not installed thought the ebuild requires it : ----------------------------------------------------------------------------- # grep -C 4 xf86-input-virtualbox /var/db/pkg/app-emulation/virtualbox-guest-additions-4.0.12/virtualbox-guest-additions-4.0.12.ebuild KEYWORDS="amd64 x86" IUSE="X" RDEPEND="X? ( ~x11-drivers/xf86-video-virtualbox-${PV} ~x11-drivers/xf86-input-virtualbox-${PV} x11-apps/xrandr x11-apps/xrefresh x11-libs/libXmu x11-libs/libX11 ----------------------------------------------------------------------------- Of course, one can say this is not a portage problem but a virtualbox packaging problem. Still, I consider portage should not complain with what is installed and working, even if some ebuilds or their maintenance are faulty. Why on the hell portage want to re-emerge that ? I have masked upgrades, my USE changes don't affect virtualbox ! I even tryed to use package.provided, but it says : Invalid package name in package.provided: app-emulation/virtualbox Invalid package name in package.provided: app-emulation/virtualbox-modules Invalid package name in package.provided: app-emulation/virtualbox-guest-additions Invalid package name in package.provided: app-emulation/virtualbox-additions Invalid package name in package.provided: x11-drivers/xf86-video-virtualbox Invalid package name in package.provided: x11-drivers/xf86-input-virtualbox I don't ask a solution for this virtualbox problem here - Moreover I think I can solve it by modifying the ebuild to suppress this useless dependency. But my concern here is : I should be able to make portage proceed and stop complaining when I emerge N|D|u world with obsolete packages. It may warn about what it cannot do, possibly why, possibly propose workarounds, block emerge, but it shall be possible to force it proceed. (In reply to comment #28) > I don't ask a solution for this virtualbox problem here - Moreover I think > I can solve it by modifying the ebuild to suppress this useless dependency. You need to edit /var/db/pkg/app-emulation/virtualbox-guest-additions-4.0.12/RDEPEND, since that's were portage reads the dependency from. After you edit the file, you need to bump the timestamp of the parent directories, like this: touch /var/db/pkg/app-emulation/virtualbox-guest-additions-4.0.12/ touch /var/db/pkg/app-emulation/ touch /var/db/pkg/ When portage sees that the timestamp of the parent directories has changed, it will invalidate the cache for that package (which is stored in /var/cache/edb/vdb_metadata.pickle). Thanks very much Zac Medico. (In reply to comment #15) > > 2) It's possible for conflicts to occur if multiple ebuilds (multiple version > > and/or slots of a given package) from different snapshots of the main tree > > are merged into a single backup overlay. > > I don't understand this. Don't all ebuilds have unique version numbers. It > sounds like there probably is a detail here that I'm missing. Sometimes ebuilds are updated without a revision bump, and also, there can be conflicts in the support files that are hosted in the "files" subdirectory. I think the best solution would be to use a special repository format, like the one described here: http://wiki.gentoo.org/wiki/Google_Summer_of_Code/2012/Ideas#Repository_of_self-contained_ebuild_source_packages Using an approach like that also solves bug #349719. Since the issue with preservation of ebuilds has not been resolved so far I'd like to share my approach to handling this situation. The synopsis of the discussion above is as follows. We want to preserve ebuild files of the installed packages, so that they could be rebuilt if use flags change is requested without version bump even if the originl ebuild has been removed from main gentoo repo. Simple backup of ebuild, however, is not guaranteed to work due to: a) need for additional files, that are stored in the repos such as configuration files and patches; b) ebuilds being dependent on eclasses which are neither frozen nor fully versioned which creates the risk of ebuilds getting broken if parent eclass is patched in or removed from the main repository. My solution: 1) I use BTRFS filesystem and have git repository of gentoo cloned onto it. For me it is my root partition but it does not have to be. 2) Whenever I do emerge --sync that triggers a need to install or update some package, e.g. due to GLSA, I also copy the latest stored repository folder and pull the latest revision, so that the repository contents match my updated tree. Thus I have multiple git repositories featuring the state of main gentoo repository at fixed dates. The copying and updating has little overhead due to COW properties of BTRFS, so I can keep reasonable amount of repository versions "open" at a time without wasting physical drive space. 3) For every version of main repository "open" in such way I modify ./profiles/reponame to gentoo-[date] and add a minimal repository config in /etc/portage/repos.conf For example [gentoo-28092021] location = /var/db/repos/gentoo-28092021 ------------------------ With steps 2 and 3 carried out every time, emerge seems to be able to locate individual old ebuilds, appropriate eclasses and any necessary supplementary files from the correct version of the repository without causing any forward compatibility issues beyond those that are naturally caused by software rot. Additionally the repositories and individual packages could be masked/unmasked manually to reduce the impact on emerge performance; resolve conflicts between ebuilds, that have been modified without proper revision/version increment; and provide the path for the eventual retirement of individual old versions. (In reply to Aleksei Romanenko from comment #32) > Since the issue with preservation of ebuilds has not been resolved so far > I'd like to share my approach to handling this situation. > To an extent, it has been in bug 662070. |