distfiles-clean is a small script that removes all old/unused files from the distfiles-dir. I found this on a mailing-list (can't remember which one), I think it would be a very useful addition to gentoolkit.
Created attachment 20965 [details] distfiles-clean
Created attachment 25211 [details] yadc.py Attached is another script with a similar goal. yadc stands for "yet another distfiles cleaner", but feel free to name it differently (it is also distfiles-clean on my box in fact :)). The heuristic for cleaning is a bit different than the script above: it doesn't clean sources for not-installed packages, but rather the sources for packages that are not in portage anymore. That means that it cleans less, but won't force you to redownload if you downgrade something. For performance reasons, i grab SRC_URI from the cache, not from the ebuilds. It is not 100% safe in theory, but is enough in practice (files in distfiles comes from packages you've emerged, and when you've emerged something, a cache entry is created, even if there was no metadata for it). But you can find some really pathologic cases where you may delete files that still have an ebuild somewhere. I don't think that's a real issue, but if you do, i can easily change for a real ebuild parsing (beetween 5 and 10 times slower tho). The script comes from the this forum post: http://forums.gentoo.org/viewtopic.php?p=815507#815507 (with credit to share beetween "far", "eric.swanson" and me) I can try to manpage it if your interrested for gentoolkit inclusion.
I have been using TGL's version of the script and I can say that I love it. In fact, I like it so much that I suggested he submit it to you for inclusion in gentoolkit. =]
There's a bug report for my script on forum :/ So don't include it now, i will try to come back with a fixed version soon.
Created attachment 25290 [details] yadc.py Only part of the reported issues was real bug of the script (did handle closing parenthesis in SRC_URI when there was no space character before). Other were ebuilds bugs, I can't do anything against that (empty SRC_URI for some fetch restricted packages - bug #41003). This version correct the regexp for the parenthesis bug, and also fix a few small other things (error if user is not root and better display in pretend mode if nothing to clean).
Created attachment 25298 [details] packages-clean.py Here is a similar script to clean binary packages. The heuristic is the same: remove binary packages for which there are no more ebuild in portage. I've made that because I use the "buildsyspkg" feature of new portage to keep tarballs of my system packages, and this should be a convenient way to remove the too old ones. In fact, there is a lot of code sharing with the previous script (i would say it is 50% copy/paste). I think of merging both in a same tool, let's say "eclean", with a syntax like this: % eclean [options...] <action> [<action>...] Actions would be for now either "distfiles", or "packages", or both. I'm thinking of doing something for overlay directories too, to remove ebuilds that are older than the ones visible in portage (and clean directories with no more ebuilds also). It would be useful when you bump some package on your side, and then the update comes into portage, and a bit later another update and finaly your overlay ebuild is outdated and can be safely removed. What do you think? Does something like this would have his place in gentoolkit?
I like the idea of merging the packages and disfiles cleaning scripts into a single script. Personally, I think an overlay pruning script is a bad idea. Especially considering it is possible to have multiple overlays. There are many instances where people put older ebuilds in their overlays because they do NOT want to upgrade. I tend to feel that the overlays are something that portage and its utilities shouldn't touch.
> Personally, I think an overlay pruning script is a bad idea. Yeah, i was not really sure about that one. In general, I put my bump ebuilds in a separate overlay made only for this purpose, and in this kind of configuration, assuming you can choose to only clean this particular overlay, i think it can be usefull. But it's a rather specific need, and i agree it may be too error-prone in many cases. I will code that separatly, and probably share it only as a forum tip, at least in a first time to see if there is some interest for something in that direction. Thanks for your comments.
Created attachment 25498 [details] eclean Here is a script that can do both packages and distfiles cleaning. I've tested it a few days and so far it seems okay. Please let me know if you think it has its place in gentoolkit, so that i write a manpage, the bash completion, things like that. Thanks.
*** Bug 25108 has been marked as a duplicate of this bug. ***
I think it would be a nice addition to gentoolkit, but it would need some changes for that (moving/using some code to/from gentoolkit.py, adding the portage python path to sys.path, maybe reusing code from equery), but a manpage too :)
Created attachment 29493 [details] eclean.1 Sorry to answer so late, I had almost no time to allocate for gentoo work this last few weeks :/ Anyway, here is a manpage, and i'm starting to do some code refactorings right now.
Created attachment 32982 [details, diff] Patch for eclean to work with Portage 2.0.51 eclean didn't find the dependency cache after changing to Portage 2.0.51, because the structure of the config class changed. (And, hence, cleaned out all my distfiles.) This patch fixes it.
How about an option for eclean to use the more agressive algorithm for cleaning the distfiles?
Come to think of it, how about an option to specify how many older versions to leave around? While true I might downgrade, it's unlikely I'll downgrade more than one version, so I'd use this option to keep just the most recent source and the source from one version prior.
This script is good, BUT. Portage is getting a spanking fresh API very soon, called portageapi. I suggest we move over this script to that API once it's stable and released. Hopefully, that will be with portage 2.0.51 release or perhaps 2.0.52. For now, talk to jstubbs or check out portage-mod in the gentoo-src module.
I'll look at this for gentoolkit-0.2.1, the API is currently redesigned from scratch and won't be available for some time.
*** Bug 67414 has been marked as a duplicate of this bug. ***
the srcript is great, although it would be good if it also remove old version versions of packages and source files.
Actually, I think the point was that it did not do that. The reason is if you (like myself) have multiple architectures that all have different versions in testing and stable, and other such things. Now, it might be nice to have an *option* to clean out all but the newest version, but the default of only removing things not in the tree at all, seems to me to be the best default choice.
> Now, it might be nice to have an *option* to clean out all but the > newest version I have a version of the script with a --destructive option that does that (well, it's not really "save latest versions only", but "save currently installed versions only" actually, which is probably close in most case). But if i recall correctly, i think it uses an old portageapi version. Marius, what is the status of the portageapi redesign you were talking about? Is there something that i could download somewhere, or should i port the script to what is available in plain portage-2.0.51, or... well, what api should i use if i want to rewrite that once for all? Thanks.
*** Bug 74031 has been marked as a duplicate of this bug. ***
Created attachment 53397 [details] eclean Here is the version i'm currently using. It adds a "--destructive" option for people who want more cleaning: it will then only keep files corresponding to actually installed packages versions (that means you will keep the required distfiles for revision bumps, and will only miss those for downgrades, but that doesn't happen that often). It doesn't use raw portage cache access anymore (meaning it will be a bit slower than the previous version), but the portage API (which i know is not really a stable API, but it works with both 2.0.51.x and HEAD).
*** Bug 85143 has been marked as a duplicate of this bug. ***
There is yet another distfiles_cleanup script in the torpage package. It's probably more suited to distfile mirrors though, it's heuristics is slightly different, it generates a list of all distfiles from the digest-* files from /usr/portage and all overlays, and then checks for every file in ${DISTDIR} whether it's in the list, if it is, nothing happens to the file, if it's not in the list it gets rm'ed. It's written in bash and is surprisingly efficient. It actually cleaned up the 50GB standard distfiles mirror to about 25GB (which just goes to show how much unneeded stuff is still resident in the mirror).
25GB isn't the correct distfiles size; it clocks out (currently) around 30gb. Regarding unneeded crap on the mirrors, well aware. Automated cleansing of the mirrors starts sunday, week after it'll be down from 58gb to around 30gb.
Great tool! Will it be available through gentoolkit? I could not find such functionality in app-portage/gentoolkit-0.2.1_pre4
Created attachment 62807 [details] yacleaner-0.2 TGL: It would be nice to see the functionality/CLI/options of yacleaner in your python/portageAPI based script. Right now yacleaner is in bash (like other gentoolkit tools, don't despise it :P) so it's not API dependant, seems to be slightly faster than eclean and could be another option. = eclean = = yacleaner = real 0m8.021s real 0m3.193s user 0m6.646s user 0m2.052s sys 0m0.918s sys 0m0.858s There is a forum thread for yacleaner: http://forums.gentoo.org/viewtopic-p-2553143.html
Created attachment 62908 [details] yacleaner-0.3 http://gentoo.org.mx/yacleaner/
2003-11-19. Are we going to wait until the 2 year anniversary to get this thing in testing? I've got to say this is vital for those of us with huge distfiles. Honestly, what's the holdup with this thing?
mmm yacleaner is a great tool! (a must-have) but i use also enotice (another must-have) that logs emerge messages usually in /var/tmp/portage/enotice. it think it would be great if yacleaner would not remove that directory when cleaning :) bye
Created attachment 67046 [details] /usr/bin/eclean (0.4) Here is a new eclean version. Changes: - added exclusion files support - added --time-limit option - added --size-limit option (for distfiles only) - added fetch-restricted distfile optionnal protection - added --package-names option for protection of all versions of installed packages - removed support of multiple actions on command-line. That would have been hell with action-specific options.
Created attachment 67047 [details] /usr/share/man/man1/eclean.1 And the updated man page. Some proof reading by an english native speaker would be more than welcome :)
Created attachment 67081 [details] /usr/bin/eclean (0.4.1) Here is eclean-0.4.1. Changes: - added support for some "eclean-dist" and "eclean-pkg" symlinks on "eclean" (and thus refactored command-line parsing and help screen code) - accept file names in exclude files for specific distfiles protection (useful to protect the OOo i18n files for instance, which are not in $SRC_URI but put there manually) - minor rewrite of some findDistfiles() code - added /usr/lib/portage/pym python path, just to be sure it comes first (after all, "ouput" is a pretty generic name for a python module...)
Created attachment 67082 [details] /usr/share/man/man1/eclean.1 (0.4.1) Updated manpage.
eclean is in the subversion repository. It will be available in the next release of gentoolkit.
In the man page for eclean.1 (0.4.1) there is probably a very small typo: EXCLUSION FILES o if a line contains a package name with an exclamation mark in front ("!sys-apps/portage"), then this package will be excluded from protection. Sure, this is only useful is the category itself was protected. ^^ Shouldn't it be "if" instead of "is"? Like: Sure, this is only useful if the category itself was protected. ^^
The man page typo is fixed in subversion.
Is this in a released version of gentoolkit now?
eclean is in gentoolkit-0.2.1_pre8. I will close the bug once gentoolkit-0.2.1 goes stable
The eclean behaves in a way that is not expected, at least by me. I would expect it to print all distfiles that are unneeded - like distfiles for packages that are not installed (compare the output of `eclean --pretend distfiles` and the output of this script: http://dev.gentoo.org/~nelchael/unneeded_distfiles.py ).
Ehh... it is much more flexible than that. To get what you expect, you use: eclean --destructive distfiles
Fixed with gentoolkit-0.2.1
feature request to eclean: i would like to clean up distfiles completely and leave only files with fetch-restrictons in there. "eclean -d distfiles -f" leaves over 100 files in distfiles. i cannot do that manually, because fetch-restricted files cannot be excepted that way. such a option should be very easy to implement, unfortunately i have not enough python experience to do that on my own.
File a new bug. This bug was for inclusion, which was done *long* ago.
*** Bug 182308 has been marked as a duplicate of this bug. ***
*** Bug 188915 has been marked as a duplicate of this bug. ***
Could you add a default cron job with safe cleanup ?
I added this cron job: /etc/cron.daily/clean-distfiles --- #! /bin/sh rm -fv $(find /usr/portage/distfiles -type f -ctime +200) eclean -d distfiles --- Of course you could put a safer behavior by default.
Please quit adding comments to this bug. This bug was to have the tool included, which was done a long time ago. If you have any requests or problems, file a new bug.