Home | Docs | Forums | Lists | Bugs | Planet | Store | GMN | Get Gentoo!
View Bug Activity | Format For Printing | XML | Clone This Bug
We use gentoo on production servers. It's very annoying that portage deletes "obsolete" ebuilds, which are installed in the production system, where the mantra is: if it's working, don't mess with it. The fix should be pretty easy, generate a rsync exclude list from /var/db/pkg. I believe it's the right thing to do. Thanks!
What is your goal with this, if you need to read the ebuild it's still present in /var/db/pkg, if portage is giving you crap about missing ebuilds/packages ( aside from revdep-rebuild, which I think still whines ) where is that happening?
All installed ebuild can be found in /var/db/pkg. But I don't know, if all neccessary files from the files directory like patches or some start and configuration scripts will be preserved that way too. So, maybe it should be enough, to copy them to /var/db/pkg too, so that it "would" be possible to restore a complete "old" portage tree that reflect the live system.
This is probably another case where a generic RSYNC_OPTIONS var could help (together with some bash magic).
Alec, I meant the digest and the files (patches etc refered in the digest) should preserved as well as the ebuilds.
I can write an emerge wrapper to generate RSYNC_EXCLUDEFROM from /var/pkg/db as a stop gap measure. But I'd rather not maintain the script, because portage might decide to change the /var/pkg/db layout/format... I think emerge sync should just do the Right Thing in this case.
i dont think this is really doable ... you may be able to calculate the ebuilds/digests to keep, but there's no way to figure out what should be left behind in $FILESDIR ... you'll also hit plenty of "errors" from emerge because ebuilds/digests are being found that arent recorded in the Manifest
> (In reply to comment #5) > I can write an emerge wrapper to generate RSYNC_EXCLUDEFROM from /var/pkg/db as > a stop gap measure. But I'd rather not maintain the script, because portage > might decide to change the /var/pkg/db layout/format... > > I think emerge sync should just do the Right Thing in this case. > I'm looking for the rationale, more than the implementation. What are you trying to do, versus how to achieve it. At present, portage doesn't need all of those things (manifests,digests, FILESDIR) once the program is installed. What are you trying to do with them? For example, Company A has a default server install that has pkgFoo on it, and pkgFoo is now no longer in the tree, thus new server installs fail. Company A can now either fetch all the relevant files from the attic, or they can redo their server install to not use pkgFoo, which could potentially be time consuming. In that case the files are useful to store on Company A's servers in order to keep installing pkgFoo. However I wonder at why emerge needs to support this at all. At best I can see some auxillary script/program/thing.
The rationale: being able to recompile/reinstall installed packages without much hassle (what if a library got upgraded/glsaed? revdep-rebuild can pretend that the "obsolete" packages don't exist, but that's just wrong.) This feature is essential for gentoo to be used in a production environment. Being able to easily build (and rebuild) from source is one of the important reasons we chose gentoo. We should make gentoo in production easy, not hard. The problem with Manifest shouldn't be too hard to fix, BTW.
As Mike said, a rsync exclude list alone won't help. The best solution I can think of is this: Before each sync generate a list of CPVs to preserve, copy the related ebuilds and _all_ MISC and AUX files from portdir into an overlay and regenerate the Manifest (eventually reusing entries from the original one). Still not perfect though as that way you will miss any ebuild updates (updates without revbump) for installed packages. Not sure if this is something to integrate into emerge or do via a hook (for this we'd have to add a presync hook).
I have run into this problem as well but not as much in the situation of a package which no longer exists but a version which is no longer in portage. It becomes impossible to rebuild the package (required to updated dependencies, gcc, any other reason to rebuild) and thus forces in some cases an undesireable upgrade. Being forced to upgrade instead of rebuild solely because the ebuild and required files have been removed from portage for an installed-package version has caused me a bit of grief in the past. I see it as vital that any package-version installed have the ability to be rebuilt at the users whim.
Created an attachment (id=107529) [edit] Prototype script to backup ebuilds of installed packages This script implements the basic idea I outlined in my last comment, except that there it doesn't copy/generate the Manifest file yet (a bit tricky and the related portage code has a few bugs that need to be fixed first). Also it doesn't maintain the generated backup overlay (= won't remove ebuilds of unmerged packages/versions) and isn't limited to deleted ebuilds. Feedback appreciated.
Created an attachment (id=107548) [edit] prototype script with experimental manifest support This version has experimental support for regenerating the Manifest files in backups. I've limited it to Manifest2, that that if a package doesn't yet support Manifest2 (about 20% of the tree) and doesn't have all SRC_URI files fetched the script can't regenerate the Manifest. It also needs the attached portage patch to fix some previously undetected bugs.
Created an attachment (id=107549) [edit] Portage patch to fix some manifest bugs triggered by the preserve script
Please note that bug 48195 has been fixed before considering this feature request as a workaround for that bug. There are a few issues with this request that come to mind: 1) distfiles may cease to be fetchable for ebuilds that have been removed from the live tree (though persistence previously fetched files in $DISTDIR can help avoid this issue). 2) It's possible for conflicts to occur if multiple ebuilds (multiple version and/or slots of a given package) from different snapshots of the main tree are merged into a single backup overlay. 3) The backup isn't truly complete unless it includes a snapshot of the eclasses. Rather than make a backup at sync time, perhaps it would make more sense to do it when the package is initially installed?
First I have to agree that is a very important issue. At the very least, if it is not fixed, then the package.mask/unmask documentation should clearly explain that freezing versions will eventually fail if the package is not in the overlay. Second documention should clearly state that emerge -U foo without a -D will in fact still upgrade dependancies that don't necesarily need upgrading once they are not in portage (possibly a good thing, but not what one would necessarily expect and it creates more updating that you might think you didn't asked for) Second I and barefootcoder have a much more advanced script here to handle this: http://forums.gentoo.org/viewtopic-t-533794-highlight-.html Maybe I should post it here when I get another minute. This uses hard links to save lots of disk space, hard links are sligthly contraversial becasue of the whole multiple file system thing, but the overlay could always be put in the portage directory with a corresponding rsync exclude and then the only requirement is that the portage directory is on one filesystem, which seems like a reasonable requirement anyway. The idea is by hard linking all the parts of the portage tree which are installed to an overlay, no extra disk space is used, but when the files are removed from the portage tree, the hard links automagically keep a copy in the overlay at which point the disk space is used, so the space only gets used for installed package that aren't in portage. No more space is needed even temporarily. Manifest cleanup is handled neatly and only as needed. Digests are copied and manifests are rebuilt as needed. There is no problem retrieving most files, patches etc (except for distfiles stuff, and that can probably be worked on a bit and anyway distfiles aren't deleted by portage by default) because it is linked to the overlay BEFORE it is removed from portage in the first place. > > 1) distfiles may cease to be fetchable for ebuilds that have been removed from > the live tree (though persistence previously fetched files in $DISTDIR can help > avoid this issue). If you want to keep a package, then don't delete the distfiles, portage doesn't do this by default, but I'm not against the idea of doing more for this issue though. > > 2) It's possible for conflicts to occur if multiple ebuilds (multiple version > and/or slots of a given package) from different snapshots of the main tree are > merged into a single backup overlay. I don't understand this. Don't all ebuilds have unique version numbers. It sounds like there probably is a detail here that I'm missing. > > 3) The backup isn't truly complete unless it includes a snapshot of the > eclasses. Ok, people who are really against this stuff always mention this, but isn't the eclass a separate well known bug in and of itself and doesn't it strike even if you simply don't upgrade often enough? Isn't backward compatibility supposed to be required(at least in principle) for all eclasses until a better system is in place and isn't a better system already under development?(or is this just for uninstalling and it's a diferent issue for installing) It does not seem like a system to avoid unwanted obsolesence should not be required to fix every other bug in portage that could interact with it though. These are not rhetorical questions s I'm not an expert on the eclass problems. > > Rather than make a backup at sync time, perhaps it would make more sense to do > it when the package is initially installed? > I don't see that it matters much. The good thing about the way this script does before and after sync operations is that only packages that are no longer in portage and that are installed stay in the overlay. I guess if you're using hardlinks anyway though it probably doesn't matter if you have redundancy between the overlay and the portage tree, does it? I think what should be avoided is having true copies, not hard links, of ALL installed packages. Unfortunately though if you use copies(which I'm sure the mainstream would prefer), it's hard to know which ones to copy until they are already removed from portage at which point you can't copy them and so it's hard to get around backing up ALL installed packages in that case, which is the whole bummer about using copies. I guess you have to be more clever than me is all, which shouldn't be too hard. There may be some difference in the amount manifest rebuilding required for the two plans but that depends on many things. You could also have a scheme where only things in package.unmask get backed up. That would not be unreasonable behavior. It fixes the issue of being able to explicitly freeze certain pakcages with one config change. It doesn't do as much to save on general uneeded updates of stuff with emerge -U foo commands though, but that's a bit of a separate issue and probably harder to sell(although I like it and I'll keep arguing why).
p.s. I meant "emerge -u" of course.
also I should probably be careful saying this script is "more advanced". I think certainly has some nice features that have been thought about for a little while now, but it sounds like Marius Mauch may know a thing or two to worry about Manifest rebuilding that may or may not be handled well in the safesync script.
Good discussions here. it seems the easiest non-intrusive workaround would be: 1. backup (hardlinks are unnecessary, as we're only backup a fraction of the portage tree) the portage dirs (e.g. dev-libs/something) of the installed packages to the "save" overlay. 2. to maintain the overlay on every emerge, copy the portage dirs of the just emerged packages to the "save" overlay *after* they're successfully installed. 3. 1,2 can be done with an external util/wrapper for portage. However, to make things clean, portage needs to maintain a separate "save" overlay, in addition to the usual local overlay, which are typically for experiments. I propose "save-emerged" as portage FEATURE. When it's turned on. emerge would do 1, 2.
Sounds pretty good. Actually I just tested the safesync script with copying and no post-sync removal of files still in portage just so I could measure the space used. On my kde system with probably a typical amount of workstation software it was in fact only 32 M so it does seem like that's not so horrible. Maybe if you've installed the whole portage tree it's a bit, but then I guess you'll already be prepared to need some disk space anyway. Just a couple of minor things that will have to go in the details, when you emerge something and rewrite or add stuff to the the saved package directory, you need a provision for the possibility that there may be a second slotted version installed. So you can't just blindly delete the old saved directory. If you append it, you need to fix the Manifest afterwards (not the digests I think so long as you copy them in unless there's some version compatibility issue with mixing diferent digests in one directory). Not a big deal. But it's probably best to remove crust too(unless the user want's to keep the ability to reinstall anything that was ever installed before... which can be nice, sounds like an optional feature to me though, because this CAN get big for people who update often, maybe a "protect for n days" after uninstall behavior would be cool down the road?) so maybe check for the slotted installs in advance and if there are none, then clean the old contents before copying, or else always append and then fix the Manifest. Also, in my test with copying, the one thing I did notice is that it is MUCH slower than hard linking. On the scale of emerge sync times, it's significant but still much less than the whole sync time. It would be simple enough to keep the hard links optional, but maybe would cause more bug reports than it's worth so I'd understand leaving it out.
Oh and the slowness with copying is only an issue on the first backup anyway, since in vicaya's plan they'll just be copied at emerge time after that.
Is it just me, or did the latest version of portage make things better? I'm now running portage 2.1.2-r9. I'm not sure how recently the improvements came. It seems like now if you directly try to update an installed package that you don't have an ebuild for, or for which all the available builds are masked, it will of course complain, but if you do something like emerge -uDp world, it won't get bent of shape over it even if it's a top-level package. In fact it will even still update the dependencies of said package even if they aren't in the world file. I verified that the dependencies are actually pulled in through the package in question by using the --tree option. It seems that emerge is of course still taking into account the fact that the package is still installed, which is exactly what we'd like. However, just for fun I tried to add a completely bogus package to the world file. Even then emerge managed to continue albeit with an error that my world file was messed up. This isn't a complete solution. First, one probably should be careful about updating depandencies of a package that you no longer have an updated ebuild for, but that's somewhat controllable and anyway is an issue for any solution. More to the point though, is the simple fact that you still end up losing the ebuild and related files. If you need to recompile the outdated package, maybe because you changed some use flags or did a major compiler upgrade, then you'll still be stuck without your ebuild. Otherwise though you should be in good shape and won't be forced to upgrade. So this is the same behavior my original savebuilds script produced and I was mostly happy with that. I think it's a nice improvement.
Ok, the main questions left here are: a) how to deal with eclasses? This isn't completely fixable as two installed packages could have been installed with different versions of the same eclass, so a compromise is needed here (I guess a MRU policy would be best here), unless we wait for the implementation of glep 33 b) should ebuilds be removed from the backup overlay on unmerges? c) how many old versions should be kept on updates? all of them, or should there be a limit? (e.g. in the update sequence a1->a2->a3->a4->a5, should we keep a1-4, or just a4, or a3-4, or ...) d) do we need a way to check for ebuild updates (=ebuild in $PORTDIR is newer than the same version in the backup overlay), or is that the job of the user? e) do we have to care about "global updates", e.g. if a dependency is renamed in the tree? Or do we assume that all deps are also kept in the backup overlay and leave it to the user to sort things out? Are slotmoves covered by d) ?
(In reply to comment #22) > Ok, the main questions left here are: > a) how to deal with eclasses? This isn't completely fixable as two installed > packages could have been installed with different versions of the same eclass, I'm not expert on this at all, but I like the idea of fixing the real problem, ie glep 33. If there's a way to get around it for now, great. > so a compromise is needed here (I guess a MRU policy would be best here), > unless we wait for the implementation of glep 33 > b) should ebuilds be removed from the backup overlay on unmerges? In my/barefootcoder's safesync script (http://forums.gentoo.org/viewtopic-p-3857340.html#3857340) , this is optional. It's controlled in a configuration variable, but what I've decided I like better is to be able to call it with a cleanup option from the command line. Then, whenever you're happy that you like the state of your system and aren't planning to go back, you can call the cleanup option. > c) how many old versions should be kept on updates? all of them, or should > there be a limit? (e.g. in the update sequence a1->a2->a3->a4->a5, should we > keep a1-4, or just a4, or a3-4, or ...) see answer to a, but maybe a maximum number of backups is still reasonable too. > d) do we need a way to check for ebuild updates (=ebuild in $PORTDIR is newer > than the same version in the backup overlay), or is that the job of the user? I remember playing with this once. What you want is an "underlay" instead of an overlay. Pretty sure my un-orthodox solution was to declare the portage tree itself as a higher priority overlay. This probably wastes a small amount of time somewhere, but I believe it worked. One could make a less hack-ish solution I'm sure, (could just delete redundant overlay ebuilds). But I don't think this issue is a very big concern anyway. > e) do we have to care about "global updates", e.g. if a dependency is renamed > in the tree? Or do we assume that all deps are also kept in the backup overlay > and leave it to the user to sort things out? Are slotmoves covered by d) ? > If the dependency is renamed, then the package of the original name will no longer exist in portage. Lucky for us we're talking about making a system to keep around installed things that aren't in portage, so it will still work just fine. The problem is if you go trying to solve d, then you could cause problem e to force an upgrade to the renamed dependency as listed in the newer ebuild, which may not be the end of the world, but maybe isn't desired. I don't mind simply not solving d. Also if the user (or another package) goes and updates the dependency to the newly named one thus breaking the old dependant ebuild, maybe the user has to sort that out. That's kind of a corner case and to be expected when not keeping something updated. The safesync script removes redundant overlay ebuilds, thus solving d, but possibly forcing the upgrade mentioned in e. Overall I think this stuff is all MUCH less critical now that portage recognizes old installed packages when checking dependency sanity. Ok, you don't have your old source code and ebuilds when you want to recompile, but at least portage doesn't complain now that the package isn't installed when it is.
Oh god please no... someone tell me this is a joke and please kill this bug. Use the stuff in VDB and/or http://sources.gentoo.org and/or anonymous CVS if you can't live without unsupported wiped cruft - if a maintainer doesn't want to support something and nukes the ebuild, they do it for a reason, go grab and maintain such things in your overlay yourself, not in the official tree. We already get more that enough bugs about 'missing' distfiles for removed stuff, go ask release folks about this.