It would be nice to be able to manage the size of the /usr/portage/distfile directory. I was thinking of a variable setting in make.conf saying "DISTFILE_SPACE=500kb" and when portage is about to download it first checks the size of the distfile directory. If the directory is over the limit portage starts to remove the oldest file until the size of the directory is below DISTFILE_SPACE.
500kb is VERY limited of course, but your thoughts are OK (as long as it is possible to turn of the behaviour)
I'd say that Portage should not be concerned with this. This sounds like a task for a nightly cron job. p.s. My ${DEITY}, 500K of distfile space is not going to get you very far. ;^)
I created a small bash script that could be run from a crontab. Not too pretty, but it should work. #!/bin/bash DISTFILES_SPACE="20gb" ## convert $DISTFILES_SPACE to bytes DISTFILES_SUFFIX=`echo $DISTFILES_SPACE | sed 's/[0-9]//g'` DISTFILES_PREFIX=`echo $DISTFILES_SPACE | sed 's/[a-z]//g'` if [ $DISTFILES_SUFFIX == "m" ] || [ $DISTFILES_SUFFIX == "mb" ] then DISTFILES_SPACE2=`echo "$DISTFILES_PREFIX * 1024" | bc -l` elif [ $DISTFILES_SUFFIX == "g" ] || [ $DISTFILES_SUFFIX == "gb" ] then DISTFILES_SPACE2=`echo "$DISTFILES_PREFIX * 1024 * 1024" | bc -l` elif [ $DISTFILES_SUFFIX == "b" ] then DISTFILES_SPACE2=`echo "$DISTFILES_PREFIX / 1024" | bc -l` fi echo $DISTFILES_SPACE2 ## current size of distfiles directory CURRENT_SIZE=`du -k /usr/portage/distfiles | awk '{print $1}'` if [ $CURRENT_SIZE -gt $DISTFILES_SPACE2 ] then ## make an array of files in /usr/portage/distfiles filename=( `ls -1tr /usr/portage/distfiles` ) file_size=( `ls -ltr /usr/portage/distfiles | awk '{print $5"/1024"}' | bc -l | sed 's/\.[0-9]*$//g'` ) I=0 until [ $CURRENT_SIZE -le $DISTFILES_SPACE2 ] do let CURRENT_SIZE=CURRENT_SIZE-${file_size[$I]} let I=I+1 done fi
Might work even better if it acutally deleted files too :-( add "rm -f /usr/portage/${filename[0]}" after the "do" and before the two "let"
I must stop coding this early in the morning, change ${filename[0]} to ${filename[$I]}
Keeping a sizelimit? I can see the reasons behind this since disks are limited in size but I figure giving people a choice in distfiles mount points is a better solution to beating the size problem. Read bug report 4950. http://bugs.gentoo.org/show_bug.cgi?id=4950
I have recently mentioned something similar on the gentoo-user list. I was suggesting an option on emerge to clean the distfiles of old files. Many people felt strongly this should be only an option. It has also been suggested to me that the old files could be moved to a different directory so that for example they could be burnt to cd. So how about all old files (checks similar to emeerge clean would be needed) to be copied to /usr/portage/distfiles.old. The user could then either delete that directory when required or move it elsewhere for safe keeping. This could either be done on the comand line as I had thought or tied to the size limit as suggested above.
*** Bug 9957 has been marked as a duplicate of this bug. ***
*** Bug 12907 has been marked as a duplicate of this bug. ***
*** Bug 9858 has been marked as a duplicate of this bug. ***
Read some of the descriptions in the "duplicate" bugs listed in this thread to get an idea of the multiple ways in which this unrestricted directory growth is a serious issue and why a cron job is simply not good enough. <a href= "http://bugs.gentoo.org/show_bug.cgi?id=9858">Bug 9858</a> is a good example of a time when cron would not work. Sean
That is why I always thought emerge should be adjustable to use multiple distfiles directories and configurable in a way to allow NFS and Samba file shares /usr/portage/distfiles-local /usr/portage/distfiles-samba /usr/portage/distfiles-nfs Then people could adjust around the space problem. But people hated my idea. Marcel
*** Bug 13560 has been marked as a duplicate of this bug. ***
Just being able to point to network drives isn't a good solution either since many people may not have network drives avaliable to them. I still ean towards a solution like I mentioned in Bug 9858 : "There should be a way to tell emerge to emtpy these directories ( --cleandist --cleantmp ) after it finishes with each package. It seems to only do this at the end of the whole emerge command currently. This way it would emerge bash, clean those files, and then emerge X11, clean those files, etc., etc." Refering to both /usr/portage/distfiles and /var/tmp/portage. Sean
I didn't say emerge didn't need a --clean option. A --clean option is needed for people with limited disk space and single computers. But people with networks and larger diskspace available and smart networks and people who handle networks with 5+ Gentoo machines need to have nfs and samba distfiles capabilities. The reason why I am so vehement on this subject is to prevent Gentoo from setting --clean as a DEFAULT. It would make people who like to keep their source, because they pay for every MB with their ISP, a lot more trouble. It's a lot cheaper for me to buy an extra disk than it is to pay for downloading the whole stuff 2 or 3 times. Marcel
*** Bug 13765 has been marked as a duplicate of this bug. ***
*** Bug 14273 has been marked as a duplicate of this bug. ***
In Bug 14273, I had suggested that a distfile should at least be cleaned out when someone runs emerge clean. For example, if I were upgrading a package, and cleaning out the old version, the old distfile should be deleted as well. Keep in mind that this wouldn't apply to packages with an "-rX" tagged on the end (like foo-1.2.13-r6), because r1 through r6 would all use the distfile foo-1.2.13.tar.gz. However, if I did an emerge update on foo, and it went from foo-1.2.13 to foo-1.3.12, then foo-1.2.13.tar.gz would be deleted, as it has been replaced with foo-1.3.12.tar.gz, the latest tarball of package foo. The downfall to this scheme is when a gentoo package maintainer decides that a package needs to be downgraded, due to bugs or otherwise. In this case, a user could specify to keep X latest versions of the packages in distfiles. So set SRC_HISTORY=2, and when emerge cleans, it would keep foo-1.2.13.tar.gz and foo-1.3.12.tar.gz - the two latest tarballs of package foo. In my distfiles dir, there are more packages than I care to count with 5 or more versions sitting there. I think it's safe to assume a package won't be downgrading by 5 versions. There really aught to be some way of keeping excess version tarballs in check. This would be a very effective scheme in many situations. It'd be great for the situation described in comment #15, for example.
IMO this is a job that should be tackled with FETCH_COMMAND / RESUME_COMMAND: - check size of DISTDIR - if it is above a given limit either delete old files or fail (depending on user preference) - start the normal fetch process after that
*** Bug 29422 has been marked as a duplicate of this bug. ***
In addition to my comment #18, when "emerge sync" is run, and portage is updated, are there not ebuilds from old versions of packages that get cleaned out? When this happens, emerge should check for that version's tarball in distfiles. If the ebuild is being removed, then there's certainly no need for the tarball.
Bugger, just missed the 500 day anniversary for this bug. Og well, I'm sure we'll get another chance at 1000. Anyway, I'm with comment #19. Two simple constants telling the system how much space you want to set aside (0 could mean no restrictions) and what to do at limit (halt emerge or remove oldest until below limit or empty). No system should be allowed to eat diskspace without having some scheme for capping growth, that much is pretty obvious if you've ran a few systems before. Btw, the 'remove the binary when removing the ebuild' theory won't work, there are plenty of ebuilds that reuse the binary. And not just the -r* ones. Most of the kernel-packages consist of the base kernel then patches for example.
I'd prefer this be a script that could possibly hook into portage at some point rather than adding this logic. Tools can take this if they want.
Why work with a size limit which then forces portage to "choose" which distfile to remove. What I propose is that each package look after everything, including their own distfiles. Distfiles aren't recorded when they're downloaded from what I can see. One solution is to create a FEATURE called deldistfile or something like that which acts as a switch for this feature: Secondly, an emerge would write the distfiles to a file in /var/db/pkg/foo-bar/pkgname/CONTENTS with the prefix DISTFILE, so an example would be DISTFILE-kdebase-3.4.0.tar.bz2 (obviously the distfile names are gotten from SRC_URI). Then, using the same process that `qpkg -f /path/to/file` uses, determine whether or not the distfile (in the example: DISTFILE-kdebase-3.4.0.tar.bz2) is used by any other installed package and if not, delete the distfile. So that way, if you have both a ck-sources-2.6.99 and gentoo-sources-2.6.99, only removing the ck-sources will still keep the vanilla kernel tarball (as distributed by kernel.org) but will remove the ck-patchset tarball. Removing both would remove the vanilla kernel tarball and both patchset tarballs. The benefit of this is that portage isn't selectively choosing tarballs to delete in an effort to get down to size. Portage is aware of distfiles and when they're no longer used by a package after `emerge -C [pkg]` they're removed. One quick look in /usr/portage/distfiles for me shows a bunch of kde[...]-3.3* tarballs.... I don't have kde 3.3