Summary: | triple emerge sync speed---keep stamp | ||
---|---|---|---|
Product: | Portage Development | Reporter: | ivo welch <ivo.welch> |
Component: | Enhancement/Feature Requests | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED WONTFIX | ||
Severity: | enhancement | CC: | infra-bugs |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
ivo welch
2005-08-28 07:24:36 UTC
We already rely on timestamp files within the tree to do what you're suggesting; the difference is that your approach would require timestamp's pushing into each category (fex) to chunk up the syncing, and would be complicated by the fact that a $PORTDIR/dev-util change requires both that dir synced, and $PORTDIR/metadata/cache/dev-util . Additional issue, the md5 of each chunk of the users tree may not be accurate; the user may have modified an ebuild within that cat (fex), which means you cannot trust the md5, need to regenerate it every run, which is what rsync does; the saving in what you're proposing is the fact that chksum information isn't transmitted, lowering the 2.4mB overhead of a full tree rsync. Personally, I don't think this is the route to go; what you're after is effectively versioning the tree, knowing that it was at release x, and that to get to the current release z you need to pull the x->, y->z deltas an apply them. This is what emerge-delta-webrsync does, difference being emerge-delta-webrsync doesn't make assumptions about the user's tree being unmodified; it relies on tarsync (or rsync in worst case) to ensure the users tree is a copy of the targeted snapshot. So... Dunno. Chunking up rsync'ing into (fex) potentially per category has the added disadvantage of jacking up the # of connections per sync attempt; currently it's 2, say 50% of the categories have some form of change in them; with a per cat + md5 check rsync'ing, you're looking at (140 cats currently) 1 + (2 * (140*.5)); 1 for md5 info, 2x per cat for $PORTDIR/$CATEGORY and $PORTDIR/metadata/$CATEGORY ; this is also ignoring any form of syncing required for other directories, eclass/profiles/metadata fex. Offhand, I'd rather see an approach of emerge-delta-webrsync using uncompressed zip files, with portage running directly off the zip file; this has the added bonus of being easier to deal with for delta generation/reconstruction, and being a bit more full proof way of ensuring that the user doesn't screw around with the 'versioning'; it's a bit harder to do without knowing the effects of the action compared to just modifying a file in the tree, plus generating a delta for a single file is easier and allows for greater optimization of the patch. Meanwhile, cc'ing infra since Lance asked me for a bug of this sort, and I never quite got around to it ;) So what's to happen with this bug then? Per ferringb on IRC, There are a bunch of ways this can be done, and only a few that work. There are a lot of server-side issues to it as well, if I recall, so a good method must be carefully chosen. This however, is not it. |