Http-replicator is a proxy cache written in python. It is best suited to maintain a local cache of packages for Gentoo on a Lan or Wan. Requests for a package will be served from the local cache, if available. If not available, replicator will simultaneously download the package to the cache AND multiple clients. The author designed the proxy for debian. I contacted him and worked with him to make sure it would work with gentoo. He has made most of the mods I requested, but not all. He has promised to make some more changes in the next verson :-) I have patched the current version of replictor to work with the default gentoo install. The defaults are to use /usr/portage/distfiles as the cache directory. This ensures all previously downloaded files prime the cache. It listens on port 8080. Edit /etc/http-replicator.conf to change the defaults and see short activation help. Minor changes to /etc/make.conf are required to make portage aware of the proxy. I made the changes here so only portage is aware of the proxy, it won't interfere with any other proxy in use. I suggest net-misc. This package is important to gentoo because it will not only ease the load on mirrors, but will transfer packages at lan speeds to the clients. Authors Description From Freshmeat: Replicator is a replicating HTTP proxy server. Files that are downloaded through the proxy are transparently stored in a private cache, so an exact copy of accessed remote files is created on the local machine. It is, in essence, a general purpose proxy server, but especially suited for maintaining a cache of Debian or Gentoo packages
Created attachment 31285 [details] http-replicator ebuild
Created attachment 31286 [details, diff] config file patch for gentoo Changes default to gentoo specific
Created attachment 31288 [details] Gentoo specific init script Written by me
Created attachment 31289 [details, diff] patch for replicators daemon code to work with my init script replicator has separate daemonizing code, this patches that code to work with the gentoo specific init script I wrote.
I should add that python 2.3 is only needed for an experimental timeout() to try and work with cranky, overloaded mirrors. If most users haven't upgraded yet, I can remove that experimental feature.
Comment on attachment 31286 [details, diff] config file patch for gentoo >--- http-replicator-2.0/http-replicator.orig/http-replicator.conf 2004-05-01 12:16:06.000000000 -0500 >+++ http-replicator-2.0/http-replicator.conf 2004-05-07 14:56:52.000000000 -0500 >@@ -1,3 +1,29 @@ >+# ************README-Gentoo Http-Replicator ******************* >+# The defaults in Http-Replicator have been changed to work with the >+# default Gentoo install and shouldn't have to be changed. The only >+# changes required to activate Http-Replicator are in /etc/make.conf >+# on the clients and the server itself. >+# >+# Find the Default fetch command section in /etc/make.conf. If you are >+# already using one of these alternate fetch commands apply the changes >+# to your section. Otherwise, make the following changes: >+# >+# 1. Add the PROXY="http_proxy=http://YourProxyHere.com:8080" Line >+# replacing YourProxyHere.com with your proxy hostname or IP address. >+# 2. Uncomment (remove the leading '#') from FETCHCOMMAND and RESUMECOMMAND >+# 3. Remove the -c from the RESUMECOMMAND >+# 4. Add $PROXY to the beginning of the lines. >+# >+# It will look like this when complete: >+# >+# Default fetch command (5 tries, passive ftp for firewall compatibility) >+# PROXY="http_proxy=http://YourMirrorHere.com:8080" >+# FETCHCOMMAND="$PROXY /usr/bin/wget -t 5 \${URI} -P \${DISTDIR}" >+# RESUMECOMMAND="$PROXY /usr/bin/wget -t 5 \${URI} -P \${DISTDIR}" >+# >+# >+# ************END README-Gentoo Http-Replicator ******************* >+ > # This is the configuration file for the replicator proxy server. > # Settings from this file will apply to the server in daemon mode and also to the cache cleaner script, if used. > # >@@ -14,36 +40,36 @@ > # * flat: save files in a single directory > # * debug: crash on exceptions > >-FLAGS = [] >+FLAGS = ['static','flat'] > > # For security reasons the hosts for which access to the proxy is granted should be specified in the [IP] list. > # A '?' can be used as wildcard for a single digit and a '*' for a multiple digits. > # For example '10.0.?.*' grants access from 10.0.1.25 but not from 10.0.15.25. > >-IP = ['127.0.0.1'] >+IP = ['127.0.0.1','192.168.*.*','10.*.*.*'] > > # The proxy server can be monitored via telnet on port [TELNET]. > # This is disabled by entering a zero value. > # Otherwise make sure the port is available or replicator will not start. > >-TELNET = 8081 >+TELNET = 0 > > # The process user id is set to [USER]. > # The daemon must be started as root because no other user can change into another. > # Not even [USER] can change into itself! > >-USER = 'proxy' >+USER = 'portage' > > # All cached files ar saved in directory [DIR]. > # The [USER] should of course have write permission in this directory. > # Where in this directory the files are actually put depends on if the server is in flat mode. > # By default the entire directory structure is copied. > >-DIR = '/home/cache' >+DIR = '/usr/portage/distfiles' > > # The process id of the running process is saved in [PID]. > # As this is done before changing into [USER], write permission for [USER] for this file is not needed. >- >+# Changes here also have to be make in the init script. > PID = '/var/run/http-replicator.pid' > > # All messages on stdout and stderr are in daemon mode written to the [LOG]. >@@ -55,5 +81,5 @@ > # The value of [KEEP] sets the maximum number of versions of each package to be kept. > # For example a value of one will delete all versions but the most recent. > # The script is disabled by setting this value to zero. >- >+# Not implemented in Gentoo yet! > KEEP = 2
Created attachment 31393 [details, diff] config file patch for gentoo: spelling corrected updated spelling error
i just tried this out it friggin rocks ive been waiting for somethin like this for ages aka works for me :)
I'm going to try this in the next couple of days, this is really cool!
Created attachment 31571 [details] http-replicator ebuild -r1 -r1 - added patch for replicator/emerge partial files fix
Created attachment 31572 [details, diff] replicator partial files fix
Created attachment 31573 [details, diff] config file patch for gentoo: simplify activation simplify the activation instructions
Fixed a couple of minor bugs and simplified the activation changes. There is more info at http://forums.gentoo.org/viewtopic.php?t=173226
The new fix needs more testing, sorry. Use the original ebuild/patches or the instructions on the forum for now. The problem isn't replicator, just my patch has side affects.
Tools-portage might consider picking this up when it's stable. Otherwise I'd suggest the script-repo once it gets going.
Http-Replicator install HOWTO updated to version 1.3 Http-replicator still at 2.0 I will post individual files after a get some feedback on the forum at: http://forums.gentoo.org/viewtopic.php?t=173226
ive been using this for a while now and its working great only problem ive seen aside from resuming is ftp stuff isnt cached which happens when a ebuild has ftp src_uri in it and also RESTRICT="nomirror" like alsa-libs and utils do for example and setting ftp_proxy doesnt help matters since it doesnt understand ftp i guess
Thanks for the report Bret, Your right that http-replicator doesn't support ftp. You can of course, copy any file to replicator's cache directory and it will be available. I've even written a tool to automatically import files to the cache, repcacheman. The latest version is still available on the forums only. Here is the tricky part.. Portage will ignore replicator UNLESS you create a file called /etc/portage/mirrors containing: # Http-Replicator override for FTP and RESTRICT="nomirror packages local http://gentoo.oregonstate.edu/distfiles With this file, portage WILL check replicator for those RESTRICTED and FTP files!!! For now, replicator doesn't support resuming. Resuming has dubious value on the LAN side, it's probably faster just to download the whole package at 11 MB/s. Resuming on the internet side is in the planning stage.
Created attachment 32673 [details] http-replicator ebuild -r2
Created attachment 32674 [details] config file patch for gentoo
Created attachment 32675 [details] Gentoo specific init script
Created attachment 32676 [details] patch for replicators daemon code to work with my init script
Created attachment 32677 [details] My cache manager script: installs, and maintains cache, imports new files. Not scrictly necessary for replicator to work, but makes it nice for gentoo. If new install this will create cache directory with proper permissions, import distfiles to replicator cache checking to make sure they are complete and not corrupt(md5 check). After install, this script will delete the duplicate files on the server and import any new files, such as the ones that were downloaded on the server using ftp.
This is the latest http-replicator ebuild and files. Has been working with no major problems for users on the forum for over 1 week.
Works seamlessly on my lan with 4 computers. I am extremely pleased with it.
Received reports confirming http-replicator works fine on amd64 - Added amd64 to keywords.
Tom: epatch is not working right now... there's a missing line if you want this to work properly: inherit eutils I have added it locally just before DESCRIPTION and now it's working properly. Best regards
I've also noticed that the ebuild doesn't create the proxy user... that leads to an error when executing repcacheman first time just after a fresh install. Maybe it could be a good idea to do it, as the default configuration uses that user. Best regards
Sorry, I was checking the application without the patches, once the patches get applied I have noticed you use the portage user... forget about the last comment.
Jose, Http-Replicator should be working for you now? I've I guess something changed with epatch as it was working before... I'll add inherit eutils to the ebuild. I used the user 'portage' because I assumed it is present on all gentoo systems.
Yes, it's working great with the added inherit eutils. I think using the portage user is a sensible choice. Some gentoo developer should give her opinion on this. Maybe I would change the default port, as 8080 is a port typically used by generic http proxies... what do you think?
http-replicator confirmed working on alpha
Created attachment 37148 [details] http-replicator including proxy support Tarball with all the necessary files to install http-replicator.
previous attachment works perfectly in our environment (x86), going through the forum I see a lot of people who greatly benefit from this package, what needs to be done to add this to the tree?
Works also here very nicely on x86. We had long been waiting for something like this, hope it will be officially available soon. Great job!
If anyone is interested in binary packages and proxies... From Bug #61845 Comment #2 Try this patch. http://zarquon.twobit.net/gentoo/portage/getbinpkg.py.diff
confirmed working on ppc
Hi, i am trying to setup this promising tool, and the initial run of repcacheman fails (see below). Any idea what is wrong here ? Found 16432 ebuilds. Extracting the checksums.... Missing digest: net-mail/gml-0.5 Done! Verifying checksum's.... /usr/portage/distfiles/mailx-support-20030215.tar.bz2 Traceback (most recent call last): File "/usr/bin/repcacheman", line 204, in ? if t[0]: KeyError: 0
I forgot to mention that this exception occurs with http-replicator-2.1_rc3 / Python 2.3.4 / Portage 2.0.51-r2 / x86 profile. Anyway, i know quite *nothing* about python coding, but as far as i can tell, the problem is that '0' is not a key in 't'. Replacing t[0] by t['MD5'] (line 204 and 205) *seems* to solve the problem. repcacheman final output: SUMMARY: Found 0 duplicate file(s). Deleted 0 dupe(s). Found 663 new file(s). Added 626 of those file(s) to the cache. Rejected 0 corrupt or incomplete file(s). 37 Unknown file(s) that are not listed in portage You may want to delete them yourself.... I may be all wrong here because.... how can i be the only one with this problem ???
Yoann, You're not the only one with this problem. Your also probably not the only one who hasn't already read the solution at the original Http-Replicator forum thread: http://forums.gentoo.org/viewtopic.php?t=173226 You probably only want the solution at: http://forums.gentoo.org/viewtopic.php?t=173226&start=187 But yes, you've already solved the problem :-)
Ooops... sorry, and thanks for pointing me to this thread. So i just tried http://www.updatedlinux.com/replicator/portagefix/repcacheman (v0.31, according to its header) and got: Extracting the checksums.... Traceback (most recent call last): File "/usr/bin/repcacheman", line 167, in ? portage_util.writemsg("Missing digest: %s\n" % mycpv) NameError: name 'portage_util' is not defined So i switched back to portage.writemsg(), which appears to work fine here. (sorry if this is a duplicate. I did not read the whole forum post (8 pages!))
That was no dup!! Thanks, I missed that. Thats what I get for blindly copying code...
Created attachment 43011 [details] http-replicator ebuild 2.1
Created attachment 43012 [details] standard init script Standard start-stop daemon
Created attachment 43013 [details] standard /etc/conf.d/ config file
Created attachment 43014 [details] My cache manager script: installs, and maintains cache, imports new files. Portage 2.0.51
The Http-Replicator ebuild has surpassed 15,000 downloads. Http-Replicator has worked perfectly saving users and gentoo infrastructure significant bandwidth while speeding up emerge's. My cache manager script helps new users install and maintain the cache and has been upgraded for portage 2.0.51
Is there any chance that this makes it into portage? It's working like a charm here but i don't like keeping an eye on updates myself.
I'm considering picking this one up, because it doesn't seem like it will get into the tree any time soon if I don't. Tom, is this the most recent version of your ebuild?
Thanks for looking into this! I kept this bug updated for a while, but I had just about given up hope getting this into portage. Too many changes also makes this bug look more complex then it really is. All but one of the 18 attachments are upgrades with new features, but some people may think they are "bugs". I think it makes http-replicator look unstable when it is actually _very_ stable having so many attachments to this bug. I have been keeping the original forum post up to date and it still has a lively following even today. The thread has 315 posts and 30634 views. See http://forums.gentoo.org/viewtopic-t-173226.html for the latest ebuild and HOWTO plus complete instructions. I will update this bug with the latest ebuild if you think it will help....
My 3 week vacation starts this week, but when I get back I will look into this and get back to you. Thanks.
Too late, it's mine now ;)
http-replicator 3.0 has been added to portage. Please open new bug reports for any problems you may encounter.