Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 35535 - [PATCH] `emerge sync` refactoring
Summary: [PATCH] `emerge sync` refactoring
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All All
: High major (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: Inclusion
: 28128 35540 50785 105261 110753 (view as bug list)
Depends on:
Blocks: 240187 15990 25499 28128 28796 57887 472746
  Show dependency tree
 
Reported: 2003-12-10 10:15 UTC by Marius Mauch (RETIRED)
Modified: 2013-08-19 14:49 UTC (History)
13 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
patch for emerge (emerge-sync.diff,11.47 KB, patch)
2003-12-11 19:57 UTC, Marius Mauch (RETIRED)
Details | Diff
new sync module (sync.py,10.34 KB, text/plain)
2003-12-11 20:03 UTC, Marius Mauch (RETIRED)
Details
patch for emerge (emerge-sync.diff,11.48 KB, patch)
2003-12-11 20:16 UTC, Marius Mauch (RETIRED)
Details | Diff
auxiliary functions (aux.py,344 bytes, text/plain)
2004-01-08 12:09 UTC, Marius Mauch (RETIRED)
Details
rsync module (rsync.py,5.13 KB, text/plain)
2004-01-08 12:10 UTC, Marius Mauch (RETIRED)
Details
cvs module (cvs.py,2.24 KB, text/plain)
2004-01-08 12:10 UTC, Marius Mauch (RETIRED)
Details
snapshot module (snapshot.py,2.11 KB, text/plain)
2004-01-08 12:11 UTC, Marius Mauch (RETIRED)
Details
new sync module using separated files (__init__.py,1.84 KB, text/plain)
2004-01-08 12:14 UTC, Marius Mauch (RETIRED)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Mauch (RETIRED) gentoo-dev 2003-12-10 10:15:39 UTC
While working on several enhancements to `emerge sync` I realized that
it had to be refactored or it would become harder and harder to add new 
features/bugfixes.
This bug is for tracking the progress of the rewrite, other bugs about
`emerge sync` I was working on will depend on this one.
Comment 1 Marius Mauch (RETIRED) gentoo-dev 2003-12-10 10:28:00 UTC
*** Bug 35540 has been marked as a duplicate of this bug. ***
Comment 2 Marius Mauch (RETIRED) gentoo-dev 2003-12-11 19:57:28 UTC
Created attachment 22067 [details, diff]
patch for emerge

This patch makes emerge using the new sync module when you run `emerge --sync`.

Other modifications are support for overlay syncs and moving the cache update
to it's own function.
Overlay sync works as follows:
You create a file /etc/portage/overlays that contains lines in the format

name  syncuri  overlay

where name is an alphanumeric identifier (excluding the special keyword all),
syncuri is a valid SYNC url (see comment on sync.py) and overlay is the 
directory to use for that url. It's not necessary to be listed in the 
PORTDIR_OVERLAY variable.
Then you can run `emerge --sync name` where name is an entry from the 
overlays file or the special keyword "all" which syncs all repositories,
including the default SYNC/PORTDIR. `emerge --sync` without an argument
will behave as currently, syncing SYNC/PORTDIR.
Comment 3 Marius Mauch (RETIRED) gentoo-dev 2003-12-11 20:03:13 UTC
Created attachment 22068 [details]
new sync module

This is he new sync module used by the patched emerge.
It features classes for cvs, rsync and snapshot syncs and a factory class
to create a Connection instance for a given syncuri by looking up the protocol
part in a table (http:// and ftp:// are used for snapshots) and creating an
instance of the correct subclass. All Connection classes provide a setup()
method that checks the creation parameters and a sync() method that does the
actual sync.
Most of the code for the RsyncConnection and CvsConnection classes was copied
from the current emerge code, the SnapshotConnection class is basically a
ported version of emerge-webrsync.
Comment 4 Marius Mauch (RETIRED) gentoo-dev 2003-12-11 20:16:16 UTC
Created attachment 22069 [details, diff]
patch for emerge

missed a return value check
Comment 5 Marius Mauch (RETIRED) gentoo-dev 2003-12-11 20:23:13 UTC
I should say that this is alpha code and missing several checks and documentation.
Comment 6 Nicholas Jones (RETIRED) gentoo-dev 2003-12-24 14:16:46 UTC
*** Bug 28128 has been marked as a duplicate of this bug. ***
Comment 7 Marius Mauch (RETIRED) gentoo-dev 2004-01-08 12:09:25 UTC
Created attachment 23407 [details]
auxiliary functions

some functions not directly related to sync code
Comment 8 Marius Mauch (RETIRED) gentoo-dev 2004-01-08 12:10:20 UTC
Created attachment 23408 [details]
rsync module
Comment 9 Marius Mauch (RETIRED) gentoo-dev 2004-01-08 12:10:52 UTC
Created attachment 23409 [details]
cvs module
Comment 10 Marius Mauch (RETIRED) gentoo-dev 2004-01-08 12:11:47 UTC
Created attachment 23410 [details]
snapshot module

This is still lacking some support for 3rd-party snapshots for overlays
Comment 11 Marius Mauch (RETIRED) gentoo-dev 2004-01-08 12:14:27 UTC
Created attachment 23411 [details]
new sync module using separated files

This code scans /usr/lib/portage/pym/sync/*.py for available sync modules and
provides some primitive register/unregister function for protocol handlers.
(the emerge patch needs a trivial change: `connection` is now `Connection`)
Comment 12 Marius Mauch (RETIRED) gentoo-dev 2004-05-12 01:57:41 UTC
*** Bug 50785 has been marked as a duplicate of this bug. ***
Comment 13 Karl Trygve Kalleberg (RETIRED) gentoo-dev 2004-05-23 15:14:40 UTC
Just as a notice:

I've hacked together two alternative emerge-webrsync replacements, called
'synctool':

A) rsync-over-http-alike: 
In this setup, one tar.bz2 for each package in each category is kept
up-to-date in a repository (ie, some directories) a static web server.
A manifest file containing md5sums for each package is kept in the root
directory. An md5sum of the manifest is kept in a separate file in the root.

Upon sync, the client first calculates an md5sum for each of his local
packages (this can and is easily cached); if properly cached, it amounts
either:
1) if no files have changed locally since the last sync (ie, nobody touched
/usr/portage), and the last sync was done with synctool, the md5sum of the
manifest is downloaded from the server and compared against the md5sum of
the local manifest. if they differ, proceed to (2).
2) download the manifest from the server, and compare each local package's
md5sum (either cached, or calculated on the spot, then cached). for each
package that's different, get the server tar.bz2.
3) no local packages exist; download all packages referenced in the server
manifest.


B) incremental update over http:
A cron-job on the server keeps track of which files are created and removed
in PORTDIR. a tar.bz2 is created (called a 'daily delta') every day of all
the changes in the previous 24 hours, along with a complete manifest of 
all files that are supposed to be in the tree, with their md5sums.

When a client syncs against the server for the first time, it takes note of 
the time. Subsequent syncs will only need to get sufficient daily deltas to
bring the client up-to-date.

The server is free to collapse daily deltas into weekly deltas and monthly
deltas; typically, the server will keep daily deltas back one-two weeks, then
weekly deltas four-six weeks, then monthly deltas for two months. 

The client is able to pick the correct combination of past daily, weekly and
monthly deltas to bring itself back into sync.

The requirement, is that none of the files in /usr/portage changes between
calls to synctool. If that happens, a fallback to method (A) is necessary.


In both case A and B, the server can be a stupid http server serving static
pages. This will allow any old P500 to serve thousands of clients at the 
same time; all the logic is in the client.

Furthermore, all traffic goes across port 80, so firewalls are practically
not a problem at all (and it can easily be proxied and cached with squid).

Is this something I should squeeze into gentoolkit (or an app-portage/synctool)
package, or should I work with you on trying to integrate into portage proper?
Comment 14 Nguyen Thai Ngoc Duy (RETIRED) gentoo-dev 2004-06-16 17:50:24 UTC
Karl, i think you could make sync modules like rsync.py, snapshot.py.
Could you post your tool or mail me please?
Comment 15 Nguyen Thai Ngoc Duy (RETIRED) gentoo-dev 2004-06-16 18:21:49 UTC
"import sync" should be moved inside if myaction == "sync" block.
Comment 16 Nicholas Jones (RETIRED) gentoo-dev 2004-07-23 10:35:14 UTC
Stylistically, you should import everything in the global space.

If it has a major time impact, you are probably running code in
the global space and should correct that.
Comment 17 Brian Harring gentoo-dev 2005-02-15 21:51:26 UTC
Hmmm... this is already in cvs head.
Took a slightly different approach in writing it- mainly it's less dynamic in determining which sync protocol maps to which class (you have to add them to an intermediate func), but it's implemented/incvs for next major release.
Comment 18 Brian Harring gentoo-dev 2005-09-08 09:20:08 UTC
*** Bug 105261 has been marked as a duplicate of this bug. ***
Comment 19 Marius Mauch (RETIRED) gentoo-dev 2006-02-16 16:28:43 UTC
*** Bug 110753 has been marked as a duplicate of this bug. ***
Comment 20 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-03-15 13:38:47 UTC
> Hmmm... this is already in cvs head
> ...
> but it's implemented/incvs for next major release.

Is this in Portage now? If so, this bug could be marked as resolved.
Comment 21 Zac Medico gentoo-dev 2013-03-15 14:32:27 UTC
(In reply to comment #20)
> > Hmmm... this is already in cvs head
> > ...
> > but it's implemented/incvs for next major release.
> 
> Is this in Portage now? If so, this bug could be marked as resolved.

It's in the old 2.1-experimental branch which was abandoned:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=tree;h=refs/heads/2.1-experimental;hb=refs/heads/2.1-experimental