Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 33877

Summary: include distfiles-clean into gentoolkit
Product: Portage Development Reporter: Hanno Böck <hanno>
Component: ToolsAssignee: Portage Tools Team <tools-portage>
Status: RESOLVED FIXED    
Severity: enhancement CC: 1723542c42148b2fe4af9f7ad1e382b30d4b7fd7, alan.schmitt, alexander.huemer, alonbl, avuton, bickerdyke, bugz07, bugzilla-gentoo, chutzpah, cloos, dguido, flash3001, flexx, gentoo.20.calle2003, helman, hongqn, jerome.bouat, jrmalaq, kronenpj, magnade, mal, Martin.vGagern, miolinux, nelchael, nikolaymetchev, pauling, phajdan.jr, pinihadad, radek, Ricardo.Cordeiro, rick, rockoo, rossi.f, sascha-gentoo-bugzilla, schulz.benjamin, sean, sebastian, smithj, tacvbo, thomas.bettler, tom.gl, uzytkownik2, wolf31o2, yvasilev
Priority: High Keywords: InVCS
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: distfiles-clean
yadc.py
yadc.py
packages-clean.py
eclean
eclean.1
Patch for eclean to work with Portage 2.0.51
eclean
yacleaner-0.2
yacleaner-0.3
/usr/bin/eclean (0.4)
/usr/share/man/man1/eclean.1
/usr/bin/eclean (0.4.1)
/usr/share/man/man1/eclean.1 (0.4.1)

Description Hanno Böck gentoo-dev 2003-11-19 17:03:45 UTC
distfiles-clean is a small script that removes all old/unused files from the distfiles-dir.
I found this on a mailing-list (can't remember which one), I think it would be a very useful addition to gentoolkit.
Comment 1 Hanno Böck gentoo-dev 2003-11-19 17:04:09 UTC
Created attachment 20965 [details]
distfiles-clean
Comment 2 TGL 2004-02-08 13:13:47 UTC
Created attachment 25211 [details]
yadc.py

Attached is another script with a similar goal. yadc stands for "yet another
distfiles cleaner", but feel free to name it differently (it is also
distfiles-clean on my box in fact :)).

The heuristic for cleaning is a bit different than the script above: it doesn't
clean sources for not-installed packages, but rather the sources for packages
that are not in   portage anymore. That means that it cleans less, but won't
force you to redownload if you downgrade something.

For performance reasons, i grab SRC_URI from the cache, not from the ebuilds.
It is not 100% safe in theory, but is enough in practice (files in distfiles
comes from packages you've emerged, and when you've emerged something, a cache
entry is created, even if there was no metadata for it). But you can find some
really pathologic cases where you may delete files that still have an ebuild
somewhere. I don't think that's a real issue, but if you do, i can easily
change for a real ebuild parsing (beetween 5 and 10 times slower tho).

The script comes from the this forum post:
http://forums.gentoo.org/viewtopic.php?p=815507#815507
(with credit to share beetween "far", "eric.swanson" and me)

I can try to manpage it if your interrested for gentoolkit inclusion.
Comment 3 Chris Gianelloni (RETIRED) gentoo-dev 2004-02-08 13:25:35 UTC
I have been using TGL's version of the script and I can say that I love it.  In fact, I like it so much that I suggested he submit it to you for inclusion in gentoolkit.  =]
Comment 4 TGL 2004-02-08 15:23:45 UTC
There's a bug report for my script on forum :/ 
So don't include it now, i will try to come back with a fixed version soon. 
Comment 5 TGL 2004-02-09 13:12:44 UTC
Created attachment 25290 [details]
yadc.py

Only part of the reported issues was real bug of the script (did handle closing
parenthesis in SRC_URI when there was no space character before). Other were
ebuilds bugs, I can't do anything against that (empty SRC_URI for some fetch
restricted packages - bug #41003).

This version correct the regexp for the parenthesis bug, and also fix a few
small other things (error if user is not root and better display in pretend
mode if nothing to clean).
Comment 6 TGL 2004-02-09 15:14:37 UTC
Created attachment 25298 [details]
packages-clean.py

Here is a similar script to clean binary packages. The heuristic is the same:
remove binary packages for which there are no more ebuild in portage. I've made
that because I use the "buildsyspkg" feature of new portage to keep tarballs of
my system packages, and this should be a convenient way to remove the too old
ones.

In fact, there is a lot of code sharing with the previous script (i would say
it is 50% copy/paste). I think of merging both in a same tool, let's say
"eclean", with a syntax like this: 
  % eclean [options...] <action> [<action>...]
Actions would be for now either "distfiles", or "packages", or both. 

I'm thinking of doing something for overlay directories too, to remove ebuilds
that are older than the ones visible in portage (and clean directories with no
more ebuilds also). It would be useful when you bump some package on your side,
and then the update comes into portage, and a bit later another update and
finaly your overlay ebuild is outdated and can be safely removed.

What do you think? Does something like this would have his place in gentoolkit?
Comment 7 Chris Gianelloni (RETIRED) gentoo-dev 2004-02-10 06:18:18 UTC
I like the idea of merging the packages and disfiles cleaning scripts into a single script.

Personally, I think an overlay pruning script is a bad idea.  Especially considering it is possible to have multiple overlays.  There are many instances where people put older ebuilds in their overlays because they do NOT want to upgrade.  I tend to feel that the overlays are something that portage and its utilities shouldn't touch.
Comment 8 TGL 2004-02-10 06:44:03 UTC
> Personally, I think an overlay pruning script is a bad idea. 

Yeah, i was not really sure about that one. In general, I put my bump ebuilds in a separate overlay made only for this purpose, and in this kind of configuration, assuming you can choose to only clean this particular overlay, i think it can be usefull. But it's a rather specific need, and i agree it may be too error-prone in many cases. I will code that separatly, and probably share it only as a forum tip, at least in a first time to see if there is some interest for something in that direction.
Thanks for your comments.
Comment 9 TGL 2004-02-12 14:37:42 UTC
Created attachment 25498 [details]
eclean

Here is a script that can do both packages and distfiles cleaning.  I've tested
it a few days and so far it seems okay.  Please let me know if you think it has
its place in gentoolkit, so that i write a manpage, the bash completion, things
like that. Thanks.
Comment 10 Marius Mauch (RETIRED) gentoo-dev 2004-03-13 02:25:14 UTC
*** Bug 25108 has been marked as a duplicate of this bug. ***
Comment 11 Marius Mauch (RETIRED) gentoo-dev 2004-03-13 02:33:37 UTC
I think it would be a nice addition to gentoolkit, but it would need some changes for that (moving/using some code to/from gentoolkit.py, adding the portage python path to sys.path, maybe reusing code from equery), but a manpage too :)
Comment 12 TGL 2004-04-17 07:55:42 UTC
Created attachment 29493 [details]
eclean.1

Sorry to answer so late, I had almost no time to allocate for gentoo work this
last few weeks :/

Anyway, here is a manpage, and i'm starting to do some code refactorings right
now.
Comment 13 Benjamin Braatz 2004-06-09 09:48:59 UTC
Created attachment 32982 [details, diff]
Patch for eclean to work with Portage 2.0.51

eclean didn't find the dependency cache after changing to Portage 2.0.51,
because the structure of the config class changed. (And, hence, cleaned out all
my distfiles.)

This patch fixes it.
Comment 14 Trebor A. Rude 2004-07-15 06:15:17 UTC
How about an option for eclean to use the more agressive algorithm for cleaning the distfiles?
Comment 15 Trebor A. Rude 2004-07-15 06:34:56 UTC
Come to think of it, how about an option to specify how many older versions to leave around? While true I might downgrade, it's unlikely I'll downgrade more than one version, so I'd use this option to keep just the most recent source and the source from one version prior.
Comment 16 Karl Trygve Kalleberg (RETIRED) gentoo-dev 2004-07-16 17:58:31 UTC
This script is good, BUT.

Portage is getting a spanking fresh API very soon, called portageapi. I suggest
we move over this script to that API once it's stable and released. Hopefully,
that will be with portage 2.0.51 release or perhaps 2.0.52.

For now, talk to jstubbs or check out portage-mod in the gentoo-src module.
Comment 17 Marius Mauch (RETIRED) gentoo-dev 2004-10-10 16:21:13 UTC
I'll look at this for gentoolkit-0.2.1, the API is currently redesigned from scratch and won't be available for some time.
Comment 18 SpanKY gentoo-dev 2004-10-13 16:03:10 UTC
*** Bug 67414 has been marked as a duplicate of this bug. ***
Comment 19 Sebastian Baldovino 2004-12-24 09:16:09 UTC
the srcript is great, although it would be good if it also remove old version versions of packages and source files.
Comment 20 Sebastian Baldovino 2004-12-24 09:16:54 UTC
the srcript is great, although it would be good if it also remove old version versions of packages and source files.
Comment 21 Chris Gianelloni (RETIRED) gentoo-dev 2004-12-26 07:18:57 UTC
Actually, I think the point was that it did not do that.  The reason is if you (like myself) have multiple architectures that all have different versions in testing and stable, and other such things.  Now, it might be nice to have an *option* to clean out all but the newest version, but the default of only removing things not in the tree at all, seems to me to be the best default choice.
Comment 22 TGL 2004-12-28 15:52:19 UTC
> Now, it might be nice to have an *option* to clean out all but the 
> newest version

I have a version of the script with a --destructive option that does that (well, it's not really "save latest versions only", but "save currently installed versions only" actually, which is probably close in most case). But if i recall correctly, i think it uses an old portageapi version.

Marius, what is the status of the portageapi redesign you were talking about? Is there something that i could download somewhere, or should i port the script to what is available in plain portage-2.0.51, or... well, what api should i use if i want to rewrite that once for all? Thanks.
Comment 23 SpanKY gentoo-dev 2005-01-02 17:49:28 UTC
*** Bug 74031 has been marked as a duplicate of this bug. ***
Comment 24 TGL 2005-03-14 01:52:10 UTC
Created attachment 53397 [details]
eclean

Here is the version i'm currently using. It adds a "--destructive" option for
people who want more cleaning: it will then only keep files corresponding to
actually installed packages versions (that means you will keep the required
distfiles for revision bumps, and will only miss those for downgrades, but that
doesn't happen that often).

It doesn't use raw portage cache access anymore (meaning it will be a bit
slower than the previous version), but the portage API (which i know is not
really a stable API, but it works with both 2.0.51.x and HEAD).
Comment 25 Marius Mauch (RETIRED) gentoo-dev 2005-03-14 07:18:51 UTC
*** Bug 85143 has been marked as a duplicate of this bug. ***
Comment 26 Marius Mauch (RETIRED) gentoo-dev 2005-03-14 08:19:23 UTC
*** Bug 85143 has been marked as a duplicate of this bug. ***
Comment 27 Jaco Kroon 2005-04-27 02:25:04 UTC
There is yet another distfiles_cleanup script in the torpage package.  It's probably more suited to distfile mirrors though, it's heuristics is slightly different, it generates a list of all distfiles from the digest-* files from /usr/portage and all overlays, and then checks for every file in ${DISTDIR} whether it's in the list, if it is, nothing happens to the file, if it's not in the list it gets rm'ed.  It's written in bash and is surprisingly efficient.  It actually cleaned up the 50GB standard distfiles mirror to about 25GB (which just goes to show how much unneeded stuff is still resident in the mirror).
Comment 28 Brian Harring (RETIRED) gentoo-dev 2005-04-27 21:23:15 UTC
25GB isn't the correct distfiles size; it clocks out (currently) around 30gb.
Regarding unneeded crap on the mirrors, well aware.  Automated cleansing of the mirrors starts sunday, week after it'll be down from 58gb to around 30gb.
Comment 29 Alon Bar-Lev (RETIRED) gentoo-dev 2005-06-30 08:57:59 UTC
Great tool!
Will it be available through gentoolkit?
I could not find  such functionality in app-portage/gentoolkit-0.2.1_pre4
Comment 30 Octavio Ruiz (Ta^3) 2005-07-06 16:50:44 UTC
Created attachment 62807 [details]
yacleaner-0.2

TGL: It would be nice to see the functionality/CLI/options of yacleaner in your
python/portageAPI based script.

Right now yacleaner is in bash (like other gentoolkit tools, don't despise it
:P) so it's not API dependant, seems to be slightly faster than eclean and
could be another option.

  = eclean =		= yacleaner =
real	0m8.021s      real    0m3.193s
user	0m6.646s      user    0m2.052s
sys	0m0.918s      sys     0m0.858s

There is a forum thread for yacleaner:
http://forums.gentoo.org/viewtopic-p-2553143.html
Comment 31 Octavio Ruiz (Ta^3) 2005-07-08 04:52:23 UTC
Created attachment 62908 [details]
yacleaner-0.3

http://gentoo.org.mx/yacleaner/
Comment 32 Avuton Olrich 2005-07-09 16:03:35 UTC
2003-11-19. Are we going to wait until the 2 year anniversary to get this thing
in testing? I've got to say this is vital for those of us with huge distfiles.
Honestly, what's the holdup with this thing?
Comment 33 miolinux 2005-08-16 15:34:35 UTC
mmm yacleaner is a great tool! (a must-have)

but i use also enotice (another must-have) that logs emerge messages usually in
/var/tmp/portage/enotice.

it think it would be great if yacleaner would not remove that directory when
cleaning :)

bye
Comment 34 TGL 2005-08-27 21:53:39 UTC
Created attachment 67046 [details]
/usr/bin/eclean (0.4)

Here is a new eclean version.

Changes:
 - added exclusion files support
 - added --time-limit option
 - added --size-limit option (for distfiles only)
 - added fetch-restricted distfile optionnal protection
 - added --package-names option for protection of all versions of installed
packages
 - removed support of multiple actions on command-line. That would have been
hell with action-specific options.
Comment 35 TGL 2005-08-27 21:55:19 UTC
Created attachment 67047 [details]
/usr/share/man/man1/eclean.1

And the updated man page. Some proof reading by an english native speaker would
be more than welcome :)
Comment 36 TGL 2005-08-28 08:42:29 UTC
Created attachment 67081 [details]
/usr/bin/eclean (0.4.1)

Here is eclean-0.4.1. Changes:
 - added support for some "eclean-dist" and "eclean-pkg" symlinks on "eclean"
(and thus refactored command-line parsing and help screen code)
 - accept file names in exclude files for specific distfiles protection (useful
to protect the OOo i18n files for instance, which are not in $SRC_URI but put
there manually)
 - minor rewrite of some findDistfiles() code
 - added /usr/lib/portage/pym python path, just to be sure it comes first
(after all, "ouput" is a pretty generic name for a python module...)
Comment 37 TGL 2005-08-28 08:43:32 UTC
Created attachment 67082 [details]
/usr/share/man/man1/eclean.1 (0.4.1)

Updated manpage.
Comment 38 Paul Varner (RETIRED) gentoo-dev 2005-09-08 14:36:08 UTC
eclean is in the subversion repository.  It will be available in the next
release of gentoolkit.
Comment 39 Wiktor Wandachowicz 2005-10-14 07:06:49 UTC
In the man page for eclean.1 (0.4.1) there is probably a very small typo:

EXCLUSION FILES
o   if a line contains a package name with an exclamation mark in front
("!sys-apps/portage"), then this package will be excluded from protection.
Sure, this is only useful is the category itself was protected.
                          ^^

Shouldn't it be "if" instead of "is"? Like:

Sure, this is only useful if the category itself was protected.
                          ^^
Comment 40 Paul Varner (RETIRED) gentoo-dev 2005-10-14 09:04:17 UTC
The man page typo is fixed in subversion.
Comment 41 Chris Gianelloni (RETIRED) gentoo-dev 2005-10-17 19:22:37 UTC
Is this in a released version of gentoolkit now?
Comment 42 Paul Varner (RETIRED) gentoo-dev 2005-10-17 21:38:35 UTC
eclean is in gentoolkit-0.2.1_pre8.  I will close the bug once gentoolkit-0.2.1
goes stable
Comment 43 Krzysztof Pawlik (RETIRED) gentoo-dev 2005-12-29 15:37:48 UTC
The eclean behaves in a way that is not expected, at least by me. I would expect it to print all distfiles that are unneeded - like distfiles for packages that are not installed (compare the output of `eclean --pretend distfiles` and the output of this script: http://dev.gentoo.org/~nelchael/unneeded_distfiles.py ).
Comment 44 Chris Gianelloni (RETIRED) gentoo-dev 2005-12-30 07:32:16 UTC
Ehh... it is much more flexible than that.  To get what you expect, you use:

eclean --destructive distfiles
Comment 45 Paul Varner (RETIRED) gentoo-dev 2006-01-17 19:08:34 UTC
Fixed with gentoolkit-0.2.1
Comment 46 Alexander Huemer 2007-02-13 10:53:52 UTC
feature request to eclean:

i would like to clean up distfiles completely and leave only files with fetch-restrictons in there.
"eclean -d distfiles -f" leaves over 100 files in distfiles.
i cannot do that manually, because fetch-restricted files cannot be excepted that way.
such a option should be very easy to implement, unfortunately i have not enough python experience to do that on my own.
Comment 47 Chris Gianelloni (RETIRED) gentoo-dev 2007-02-13 20:45:13 UTC
File a new bug.

This bug was for inclusion, which was done *long* ago.
Comment 48 Jakub Moc (RETIRED) gentoo-dev 2007-06-17 09:04:46 UTC
*** Bug 182308 has been marked as a duplicate of this bug. ***
Comment 49 Jakub Moc (RETIRED) gentoo-dev 2007-08-14 21:43:09 UTC
*** Bug 188915 has been marked as a duplicate of this bug. ***
Comment 50 Jerome 2007-08-14 22:00:27 UTC
Could you add a default cron job with safe cleanup ?
Comment 51 Jerome 2007-08-14 22:23:55 UTC
I added this cron job:
/etc/cron.daily/clean-distfiles
---
#! /bin/sh
rm -fv $(find /usr/portage/distfiles -type f -ctime +200)
eclean -d distfiles
---

Of course you could put a safer behavior by default.
Comment 52 Chris Gianelloni (RETIRED) gentoo-dev 2007-08-17 00:32:10 UTC
Please quit adding comments to this bug.  This bug was to have the tool included, which was done a long time ago.  If you have any requests or problems, file a new bug.