While working on several enhancements to `emerge sync` I realized that it had to be refactored or it would become harder and harder to add new features/bugfixes. This bug is for tracking the progress of the rewrite, other bugs about `emerge sync` I was working on will depend on this one.
*** Bug 35540 has been marked as a duplicate of this bug. ***
Created attachment 22067 [details, diff] patch for emerge This patch makes emerge using the new sync module when you run `emerge --sync`. Other modifications are support for overlay syncs and moving the cache update to it's own function. Overlay sync works as follows: You create a file /etc/portage/overlays that contains lines in the format name syncuri overlay where name is an alphanumeric identifier (excluding the special keyword all), syncuri is a valid SYNC url (see comment on sync.py) and overlay is the directory to use for that url. It's not necessary to be listed in the PORTDIR_OVERLAY variable. Then you can run `emerge --sync name` where name is an entry from the overlays file or the special keyword "all" which syncs all repositories, including the default SYNC/PORTDIR. `emerge --sync` without an argument will behave as currently, syncing SYNC/PORTDIR.
Created attachment 22068 [details] new sync module This is he new sync module used by the patched emerge. It features classes for cvs, rsync and snapshot syncs and a factory class to create a Connection instance for a given syncuri by looking up the protocol part in a table (http:// and ftp:// are used for snapshots) and creating an instance of the correct subclass. All Connection classes provide a setup() method that checks the creation parameters and a sync() method that does the actual sync. Most of the code for the RsyncConnection and CvsConnection classes was copied from the current emerge code, the SnapshotConnection class is basically a ported version of emerge-webrsync.
Created attachment 22069 [details, diff] patch for emerge missed a return value check
I should say that this is alpha code and missing several checks and documentation.
*** Bug 28128 has been marked as a duplicate of this bug. ***
Created attachment 23407 [details] auxiliary functions some functions not directly related to sync code
Created attachment 23408 [details] rsync module
Created attachment 23409 [details] cvs module
Created attachment 23410 [details] snapshot module This is still lacking some support for 3rd-party snapshots for overlays
Created attachment 23411 [details] new sync module using separated files This code scans /usr/lib/portage/pym/sync/*.py for available sync modules and provides some primitive register/unregister function for protocol handlers. (the emerge patch needs a trivial change: `connection` is now `Connection`)
*** Bug 50785 has been marked as a duplicate of this bug. ***
Just as a notice: I've hacked together two alternative emerge-webrsync replacements, called 'synctool': A) rsync-over-http-alike: In this setup, one tar.bz2 for each package in each category is kept up-to-date in a repository (ie, some directories) a static web server. A manifest file containing md5sums for each package is kept in the root directory. An md5sum of the manifest is kept in a separate file in the root. Upon sync, the client first calculates an md5sum for each of his local packages (this can and is easily cached); if properly cached, it amounts either: 1) if no files have changed locally since the last sync (ie, nobody touched /usr/portage), and the last sync was done with synctool, the md5sum of the manifest is downloaded from the server and compared against the md5sum of the local manifest. if they differ, proceed to (2). 2) download the manifest from the server, and compare each local package's md5sum (either cached, or calculated on the spot, then cached). for each package that's different, get the server tar.bz2. 3) no local packages exist; download all packages referenced in the server manifest. B) incremental update over http: A cron-job on the server keeps track of which files are created and removed in PORTDIR. a tar.bz2 is created (called a 'daily delta') every day of all the changes in the previous 24 hours, along with a complete manifest of all files that are supposed to be in the tree, with their md5sums. When a client syncs against the server for the first time, it takes note of the time. Subsequent syncs will only need to get sufficient daily deltas to bring the client up-to-date. The server is free to collapse daily deltas into weekly deltas and monthly deltas; typically, the server will keep daily deltas back one-two weeks, then weekly deltas four-six weeks, then monthly deltas for two months. The client is able to pick the correct combination of past daily, weekly and monthly deltas to bring itself back into sync. The requirement, is that none of the files in /usr/portage changes between calls to synctool. If that happens, a fallback to method (A) is necessary. In both case A and B, the server can be a stupid http server serving static pages. This will allow any old P500 to serve thousands of clients at the same time; all the logic is in the client. Furthermore, all traffic goes across port 80, so firewalls are practically not a problem at all (and it can easily be proxied and cached with squid). Is this something I should squeeze into gentoolkit (or an app-portage/synctool) package, or should I work with you on trying to integrate into portage proper?
Karl, i think you could make sync modules like rsync.py, snapshot.py. Could you post your tool or mail me please?
"import sync" should be moved inside if myaction == "sync" block.
Stylistically, you should import everything in the global space. If it has a major time impact, you are probably running code in the global space and should correct that.
Hmmm... this is already in cvs head. Took a slightly different approach in writing it- mainly it's less dynamic in determining which sync protocol maps to which class (you have to add them to an intermediate func), but it's implemented/incvs for next major release.
*** Bug 105261 has been marked as a duplicate of this bug. ***
*** Bug 110753 has been marked as a duplicate of this bug. ***
> Hmmm... this is already in cvs head > ... > but it's implemented/incvs for next major release. Is this in Portage now? If so, this bug could be marked as resolved.
(In reply to comment #20) > > Hmmm... this is already in cvs head > > ... > > but it's implemented/incvs for next major release. > > Is this in Portage now? If so, this bug could be marked as resolved. It's in the old 2.1-experimental branch which was abandoned: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=tree;h=refs/heads/2.1-experimental;hb=refs/heads/2.1-experimental
Given we now have sync modules and proper support for additional repositories, let's call this obsolete.