Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 631616 - portage snapshot is incompatible with tarsync (emerge-webrsync stopped working): encountered type 'V'
Summary: portage snapshot is incompatible with tarsync (emerge-webrsync stopped workin...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-21 08:19 UTC by Bernd Feige
Modified: 2017-09-25 22:16 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bernd Feige 2017-09-21 08:19:09 UTC
Beginning this week, packaging the portage.tar.{xz,bz2} appears to have changed to use new tar features not supported by tarsync's tar.c:

# emerge-delta-webrsync
Looking for available base versions for a delta
Checking digest ...
fetching patches
Fetching file snapshot-20170920-20170921.patch.bz2.md5sum ...
Fetching file snapshot-20170920-20170921.patch.bz2.md5sum ...
Fetching file snapshot-20170920-20170921.patch.bz2.md5sum ...
failed fetching snapshot-20170920-20170921.patch.bz2.md5sum
no patches found? up to date?
syncing with existing file
Syncing local tree ...
scanning tarball...
encountered type 'V', probably going to screw it up.
failed attempting to strip 1 dirs from entry 'portage-20170920.tar'
error reading /gentoo/distfiles//portage-20170920.tar.bz2
emerge-delta-webrsync: error: tarsync failed; tarball is corrupt? (/gentoo/distfiles//portage-20170920.tar.bz2)

I'm behind a firewall, so downloading the snapshot patches via http(s) is the most convenient choice for frequent updates.
It's also not a matter of a corrupt tar file - tar extracts the contents just fine and if I unmerge tarsync, emerge-delta-webrsync works (while taking longer).
So it appears that either tarsync should be masked as out-of-date or the snapshot generation adapted to avoid the tar feature not supported by tarsync.
Comment 1 Alon Bar-Lev (RETIRED) gentoo-dev 2017-09-21 09:27:52 UTC
$ tarsync --strip-dir 1 portage-20170920.tar.bz2 /tmp/test1
scanning tarball...
encountered type 'V', probably going to screw it up.
failed attempting to strip 1 dirs from entry 'portage-20170920.tar'
error reading portage-20170920.tar.bz2
Comment 2 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2017-09-22 03:34:30 UTC
I just looked at the tarsync source, and I'm astounded that it's worked as long as it has.

Please try the 20170921 snapshot for now, see if it works for you. I have dropped -V for now.
Comment 3 Bernd Feige 2017-09-22 07:54:52 UTC
(In reply to Robin Johnson from comment #2)
> I just looked at the tarsync source, and I'm astounded that it's worked as
> long as it has.
> 
> Please try the 20170921 snapshot for now, see if it works for you. I have
> dropped -V for now.

Thanks, works again!

I agree that maintaining a separate tar unpacker is tedious...

I personally like the squashfs images -- maybe distributing daily deltas between them would be preferable, they can be mounted directly under /usr/portage or mounted temporarily and synced with rsync. I'd happily drop the tarsync option for that. Just my 2 cents though...

Best regards,
Bernd
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2017-09-22 18:45:04 UTC
(In reply to Bernd Feige from comment #3)
> I agree that maintaining a separate tar unpacker is tedious...
> 
> I personally like the squashfs images -- maybe distributing daily deltas
> between them would be preferable, they can be mounted directly under
> /usr/portage or mounted temporarily and synced with rsync. I'd happily drop
> the tarsync option for that. Just my 2 cents though...
> 
> Best regards,
> Bernd

The deltas between tarballs can be used without tarsync; tarsync just applies the contents of a tarball to a given directory, trying to save overwrites, and then deletes stuff that wasn't in the tarball.

Deltas between squashfs images aren't very small, we did used to generate them, and they were 5MB/day. A purpose-build diff system for squashfs would probably behave better, realizing what files were added/removed.
Comment 5 Bernd Feige 2017-09-23 12:50:48 UTC
(In reply to Robin Johnson from comment #4)

> The deltas between tarballs can be used without tarsync; tarsync just
> applies the contents of a tarball to a given directory, trying to save
> overwrites, and then deletes stuff that wasn't in the tarball.

Yes, just without tarsync there's no efficient way to sync the contents. Unpacking the whole portage tree to a temp directory is quite inefficient. Squashfs images are treatable completely with standard tools and are accessed and decompressed on the fly.
Maybe tarsync could be relatively easily changed to use dev-libs/libtar so that we don't have this rarely used code path for dealing with tar files in tarsync.

> Deltas between squashfs images aren't very small, we did used to generate
> them, and they were 5MB/day. A purpose-build diff system for squashfs would
> probably behave better, realizing what files were added/removed.

Hmm, still a factor 10 less than downloading the squashfs file each day (which are quite small now with 56M).

Since the original problem for this bug report is solved, should we close it now? Maybe add a wishlist item for tarsync?
Comment 6 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2017-09-25 22:16:16 UTC
(In reply to Bernd Feige from comment #5)
> (In reply to Robin Johnson from comment #4)
> 
> > The deltas between tarballs can be used without tarsync; tarsync just
> > applies the contents of a tarball to a given directory, trying to save
> > overwrites, and then deletes stuff that wasn't in the tarball.
> 
> Yes, just without tarsync there's no efficient way to sync the contents.
> Unpacking the whole portage tree to a temp directory is quite inefficient.
> Squashfs images are treatable completely with standard tools and are
> accessed and decompressed on the fly.
> Maybe tarsync could be relatively easily changed to use dev-libs/libtar so
> that we don't have this rarely used code path for dealing with tar files in
> tarsync.
> 
> > Deltas between squashfs images aren't very small, we did used to generate
> > them, and they were 5MB/day. A purpose-build diff system for squashfs would
> > probably behave better, realizing what files were added/removed.
> 
> Hmm, still a factor 10 less than downloading the squashfs file each day
> (which are quite small now with 56M).
> 
> Since the original problem for this bug report is solved, should we close it
> now? Maybe add a wishlist item for tarsync?
Bug #632018 Created.

I'm looking at other solutions as well, the most promising so far is daily git bundles of the with-generated-data repo. It makes the snapshot itself a little bigger than right now, but shrinks the daily delta download.