Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 409445 - timestamp constraint for existing metadata/cache format interferes with rsync in some cases involving eclass changes
Summary: timestamp constraint for existing metadata/cache format interferes with rsync...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Documentation (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on: 139134 410505
Blocks:
  Show dependency tree
 
Reported: 2012-03-23 13:50 UTC by Martin von Gagern
Modified: 2012-04-08 18:00 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge--info,6.37 KB, text/plain)
2012-03-23 13:50 UTC, Martin von Gagern
Details
Packages with broken metadata on my system (atd1,14.36 KB, text/plain)
2012-03-24 09:42 UTC, Martin von Gagern
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2012-03-23 13:50:40 UTC
Created attachment 306423 [details]
emerge --info

Not sure whether "Gentoo Linux" is the correct component for this. My latest world update told me:

WARNING: One or more updates have been skipped due to a dependency conflict:

sys-devel/automake:1.11

  (sys-devel/automake-1.11.3::gentoo, ebuild scheduled for merge) conflicts with
    =sys-devel/automake-1.11.1* required by (sys-libs/zlib-1.2.6::gentoo, installed)

This zlib dependency is really strange. It appears to be some kind of inconsistency between the ebuild and its eclass on the one hand, and the metadata on the other hand:

$ cd /var/db/pkg/sys-libs/zlib-1.2.6/
$ cat DEPEND 
=sys-devel/automake-1.11.1* >=sys-devel/autoconf-2.68 sys-devel/libtool
$ bzcat environment.bz2 | grep DEPEND
declare -- AUTOTOOLS_AUTO_DEPEND="no"
declare -- AUTOTOOLS_DEPEND="|| ( >=sys-devel/automake-1.11.1  ) >=sys-devel/autoconf-2.68 sys-devel/libtool"
declare DEPEND="minizip? ( || ( >=sys-devel/automake-1.11.1  ) >=sys-devel/autoconf-2.68 sys-devel/libtool ) "
declare PDEPEND=""
declare -x RDEPEND="!<dev-libs/libxml2-2.7.7 "
$ portageq metadata / ebuild sys-libs/zlib-1.2.6 DEPEND
minizip? ( || ( =sys-devel/automake-1.11.1* ) >=sys-devel/autoconf-2.68 sys-devel/libtool )

So the ebuild states a different dependency than the metadata does, and the DEPEND file of the installed package apparently reflects the latter.

It might be worth mentioning that I'm running "emerge --regen" after each "emerge --sync". I'm not exactly sure why I added that to my list of commands, might be pretty old. I now read that rsync users (which I am) should not use that, but on the other hand that this might be the only way to build metadata for overlays (of which I'm using quite a few). In any case, I take it that "emerge --regen" won't touch /usr/portage/metadata/cache/sys-libs/zlib-1.2.6, but even that file has the =sys-devel/automake-1.11.1* dependency. Therefore I assume that this is something to do with the gentoo portage server mirrors.

My guess would be that the autotools eclass changed its dependency calculation, but the cache for the zlib package didn't get updated to match that change.
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2012-03-23 14:54:53 UTC
(In reply to comment #0)

> It might be worth mentioning that I'm running "emerge --regen" after each
> "emerge --sync".

Maybe you ought to remove that or replace it with --metadata. At the least the purpose of --regen isn't clear.
Comment 2 Zac Medico gentoo-dev 2012-03-23 16:01:39 UTC
(In reply to comment #0)
> I take it that "emerge --regen" won't touch

Right, it won't.

> /usr/portage/metadata/cache/sys-libs/zlib-1.2.6, but even that file has the
> =sys-devel/automake-1.11.1* dependency. Therefore I assume that this is
> something to do with the gentoo portage server mirrors.

The metadata appears to be correct on rsync.gentoo.org:

$ rsync rsync://rsync.gentoo.org/gentoo-portage/metadata/cache/sys-libs/zlib-1.2.6 ./zlib-1.2.6
$ grep automake ./zlib-1.2.6
minizip? ( || ( >=sys-devel/automake-1.11.1 ) >=sys-devel/autoconf-2.68 sys-devel/libtool )

It also appears to be correct for rsync.de.gentoo.org.

You should remove /usr/portage/metadata/cache/sys-libs/zlib-1.2.6 and sync again.
Comment 3 Martin von Gagern 2012-03-23 21:01:08 UTC
(In reply to comment #2)
> You should remove /usr/portage/metadata/cache/sys-libs/zlib-1.2.6 and sync
> again.

OK, that did it. I wonder why rsync didn't detect the difference in file content, and retransfer the file without me having to delete it. I guess the length of the file has stayed the same, but the modification time should have been different.

On #gentoo-portage, dwfreed apparently could reproduce that metadata content. So others might be hit by the same rsync strangeness. Perhaps it would be a good idea to simply touch the metadata file, so that its timestamp changes on all the rsync mirrors, and all clients will make sure to get the correct version.
Comment 4 Zac Medico gentoo-dev 2012-03-23 21:17:42 UTC
There have been relevant changes to autotools.eclass lately:

http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/eclass/autotools.eclass?view=log#rev1.128

Changes like those should be handled automatically by egencache, avoiding bug 139134, by using the egencache --rsync option.

@infra: Can we verify that the egencache --rsync option is enabled when we generate cache for the master rsync mirror?
Comment 5 Zac Medico gentoo-dev 2012-03-23 22:30:38 UTC
Assuming that the egencache --rsync option is being used, it's still possible for a series of eclass changes to modify the metadata in such a way that the egencache --rsync option will not be effective. This sort of scenario is very unlikely, but it's quite possible that the recent series of changes to autotools.eclass did exactly this.

We could protect against this sort of problem by setting "cache-formats = md5-dict pms" in metadata/layout.conf, as described in the following commit:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=a058baf9ed238a1f260b6739ba7fc10c6472f6ee

That will cause both the old "pms" format and then new "md5-dict" format to be generated, and the md5-dict format is immune to issues like bug 139134, because it contains eclass checksums.
Comment 6 Zac Medico gentoo-dev 2012-03-23 23:00:51 UTC
(In reply to comment #5)
> the md5-dict format is immune to issues like bug 139134,
> because it contains eclass checksums.

It's better that I explain why the "pms" cache format is vulnerable to bug 139134.  The issue is an interaction between the "pms" cache format and the rsync protocol, since rsync protocol only compares file size and timestamp, and "pms" cache format requires the cache entry timestamp to match the ebuild which hasn't been modified since this series of eclass changes. So, we've got a cache entry that the user synced at one point with a certain size and timestamp, and now the content has changed while size and timestamp have remained constant.

Since the new md5-dict format doesn't need to screw around with the mtime of the cache entry, its mtime changes every time an eclass is modified, with guarantees that the rsync protocol will recognize that it has changed.
Comment 7 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-03-23 23:24:50 UTC
egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \
    --portdir=${STAGEDIR} \
    --update-use-local-desc \

^^ That's what infra does. ^^
Comment 8 Zac Medico gentoo-dev 2012-03-23 23:30:12 UTC
Okay, so it appears as thoughtthe recent series of modifications to autotools.eclass has triggered a very rare case which the egencache --rsync option can't solve, as discussed in comment #5.

If we want a simple workaround for this one time, we can just bump the mtime of zlib-1.2.6.ebuild.
Comment 9 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-03-24 00:38:12 UTC
(In reply to comment #8)
> Okay, so it appears as thoughtthe recent series of modifications to
> autotools.eclass has triggered a very rare case which the egencache --rsync
> option can't solve, as discussed in comment #5.
> 
> If we want a simple workaround for this one time, we can just bump the mtime
> of zlib-1.2.6.ebuild.

Would it help to touch the metadata file on masterrsync so all clients (everyone) will get a new cache file?
Comment 10 Zac Medico gentoo-dev 2012-03-24 00:41:05 UTC
You have to bump the mtime on the ebuild if you want it to stick, because egencache syncs cache entry timestamp with the ebuild timestamp every time it runs.
Comment 11 Zac Medico gentoo-dev 2012-03-24 01:30:14 UTC
I've bumped the mtime of the ebuild by erasing the existing $Header and committing it. The new $Header is:

# $Header: /var/cvsroot/gentoo-x86/sys-libs/zlib/zlib-1.2.6.ebuild,v 1.2 2012/03/24 01:24:17 zmedico Exp $

That should fix this one. The same issue may affect other ebuilds that inherit autotools.eclass, but it will only affect users who synced during a certain window of time.
Comment 12 Martin von Gagern 2012-03-24 09:42:20 UTC
Created attachment 306511 [details]
Packages with broken metadata on my system

(In reply to comment #11)
> That should fix this one. The same issue may affect other ebuilds that
> inherit autotools.eclass

Yes, quite a few of them. The attached file is the result of a run of

grep -rlF '=sys-devel/automake-1.11.1*' /usr/portage/metadata/cache \
  | cut -d/ -f6- | sort

On my system this yielded 553 packages. I removed and resynced all the files matching that search, and got no single match afterwards. So I assume that the rsync mirrors have the correct data for all of these, but my own local data was broken.
Comment 13 Zac Medico gentoo-dev 2012-03-24 20:32:15 UTC
(In reply to comment #12)
> On my system this yielded 553 packages.

It's not real really practical to bump all of these mtimes via commits to CVS, and that won't protect us from having this issue happen again. A reliable solution would be to enable the md5-dict format in layout.conf, as mentioned earlier.

> I removed and resynced all the files
> matching that search, and got no single match afterwards. So I assume that
> the rsync mirrors have the correct data for all of these, but my own local
> data was broken.

Yeah, the egencache --rsync option mostly ensures this, though it's not completely reliable. For 100% reliability, we could enable the md5-dict format.
Comment 14 Zac Medico gentoo-dev 2012-03-28 17:44:47 UTC
Re-assigning to infra-bugs, since egencache/portage already support the md5-dict format which solves this issue, so anything that remains to be done will have to be done by infra.
Comment 15 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-03-28 17:56:13 UTC
(In reply to comment #14)
> Re-assigning to infra-bugs, since egencache/portage already support the
> md5-dict format which solves this issue, so anything that remains to be done
> will have to be done by infra.

Huh? What will have to be done by infra? Comment #5 says that an addition to layout.conf is required.
Comment 16 Zac Medico gentoo-dev 2012-03-28 19:11:00 UTC
Oh, I forget that I can just edit metadata/layout.conf in CVS. We also have to ensure that egengache is from >=sys-apps/portage-2.1.10.32 on the infra side.
Comment 17 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-03-28 20:06:53 UTC
(In reply to comment #16)
> Oh, I forget that I can just edit metadata/layout.conf in CVS. We also have
> to ensure that egengache is from >=sys-apps/portage-2.1.10.32 on the infra
> side.

2.1.10.44 is installed on all infra at the time of this writing.
Comment 18 Zac Medico gentoo-dev 2012-03-28 20:17:37 UTC
Okay, so all that's left is to add "cache-formats = md5-dict pms" to metadata/layout.conf. This will result in egencache generating a metadata/md5-cache directory which will consume slightly more space than the existing metadata/cache directory (extra space due to inclusion of keys and md5 digests in the md5-dict format).

I'll add pms-bugs to CC, since we'll likely want to document the md5-dict format in PMS.
Comment 19 Ulrich Müller gentoo-dev 2012-03-28 21:35:06 UTC
What are the implications on performance? Timestamp and size are available from the file's directory entry, whereas calculation the md5sum requires reading the file plus some cpu time.
Comment 20 Zac Medico gentoo-dev 2012-03-28 21:44:45 UTC
We could add a timestamp-dict format, but if we rely on timestamps then it rules out distribution by protocols that don't preserve timestamps. Funtoo currently uses md5-dict, since they distribute via git rather than rsync:

https://github.com/funtoo/ports-2012/tree/funtoo.org/metadata/md5-cache
Comment 21 Brian Harring (RETIRED) gentoo-dev 2012-03-28 21:54:00 UTC
(In reply to comment #19)
> What are the implications on performance? Timestamp and size are available
> from the file's directory entry, whereas calculation the md5sum requires
> reading the file plus some cpu time.

The stats are available in the commit (1e8870bd); roughly, to do a full tree validation of the cache (aka, flex fully the perf difference between md5 and pms backends), at the time it was ~25k cpvs to validate; for portage at the time pms was ~7.7s, for md5 it was 8.8s.

It's important to keep in mind that's for validating the *entire* tree... so that ~14% slow down isn't exactly likely to be encountered in full force (even the nastiest resolution is under 2k nodes, and the time isn't burn on validation as much as everything else).

For the tree itself, figure this will add another 25k inodes and ~20MB of content.  Not great, but it occurs unfortunately.  An rsync exclude of the old cache location would suffice imo and could be sent out via news w/ a plan to turn off the code cache (remove it) in 6-12 months (in which case people would just generate locally- slower, but still working).

(In reply to comment #20)
> We could add a timestamp-dict format, but if we rely on timestamps then it
> rules out distribution by protocols that don't preserve timestamps. Funtoo
> currently uses md5-dict, since they distribute via git rather than rsync:
> 
> https://github.com/funtoo/ports-2012/tree/funtoo.org/metadata/md5-cache

chrome-os uses it also (it was written for us specifically).  I've had zero issues w/ it since development/deployment of it.
Comment 22 Brian Harring (RETIRED) gentoo-dev 2012-03-28 22:19:09 UTC
Also, just a note; md5-dict addresses the timestamp issue, but also laces in the necessary validation information so that partial syncs are handled correctly, and the PM can properly (and easily) handle scenarios where the local configuration injects a different eclass stacking order.

Overall, it's a better format; definitely not the *best* format (some day I'll implement my notion of that), but it's supported/deployed, works, and is field tested by git users.

Pkgcore/paludis don't support it, although pkgcore will soon enough- I've just been busy with other things, next release will have the functionality.
Comment 23 David Leverton 2012-03-30 18:17:01 UTC
(In reply to comment #22)
> Pkgcore/paludis don't support it, although pkgcore will soon enough- I've
> just been busy with other things, next release will have the functionality.

I've commited support for reading this format (although not generating it, at least for now) to Paludis git master.
Comment 24 Brian Harring (RETIRED) gentoo-dev 2012-03-31 10:29:41 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > Pkgcore/paludis don't support it, although pkgcore will soon enough- I've
> > just been busy with other things, next release will have the functionality.
> 
> I've commited support for reading this format (although not generating it,
> at least for now) to Paludis git master.

Sweet.  That leaves pkgcore which I do not want considered as a blocker for using the new cache format (namely have held off since I didn't yet need it; that said it'll land in the next week or two).
Comment 25 Zac Medico gentoo-dev 2012-03-31 18:01:01 UTC
Feedback seems positive, so I've added "cache-formats = md5-dict pms" to metadata/layout.conf in gentoo-x86 CVS.
Comment 26 Brian Harring (RETIRED) gentoo-dev 2012-03-31 22:45:35 UTC
(In reply to comment #25)
> Feedback seems positive, so I've added "cache-formats = md5-dict pms" to
> metadata/layout.conf in gentoo-x86 CVS.

We should probably be planning for removing the old cache at some point in the near future then; mainly to avoid the extra space the rsync mirrors now are stuck w/ (extra 20MB or so).
Comment 27 Eric F. GARIOUD 2012-04-02 10:48:24 UTC
1/ Thank you all for the work you achieve in order to make portage efficient and reliable even in "very unlikely scenarios"

2/ Could I respectfully request that, in the future, this kind of modification becomes the topic of a pre-emptive advice in the news list.
As a matter of fact, getting +30K files, that is to say +20% of the number of files of a typical /usr/portage tree, +50% of the number of files of my /usr/portage tree because I make an extensive use of the --exclude feature... after a simple and usual emerge --sync is at least surprising if not impact-free.
Comment 28 Weedy 2012-04-06 05:09:57 UTC
As much as I would like to thank you for all your hard work...

Why are we reinventing a VCS? plenty of gentoo spinoffs use git for the portage tree. We should think long and hard about this before we continue down this path.

# du -hs /usr/portage/
780M	/usr/portage/

I still remember when the tree was under 250mb.
Comment 29 Ciaran McCreesh 2012-04-06 13:40:56 UTC
The metadata cache has absolutely nothing to do with the use of Git or lack thereof.
Comment 30 Karl-Robert Ernst 2012-04-08 17:42:57 UTC
Since this was introduced, I noticed that the amount of files transferred during each emerge --sync has increased almost tenfold (and so did the time it takes). And looking at the output, it seems mostly because of files in metadata/md5-cache/. At first I though it was because its new, but it seems it still didn't settle down and continues to transfer large amounts of files each sync.

Is this working as designed, or a bug?
Comment 31 Zac Medico gentoo-dev 2012-04-08 18:00:27 UTC
(In reply to comment #30)
> Since this was introduced, I noticed that the amount of files transferred
> during each emerge --sync has increased almost tenfold (and so did the time
> it takes). And looking at the output, it seems mostly because of files in
> metadata/md5-cache/. At first I though it was because its new, but it seems
> it still didn't settle down and continues to transfer large amounts of files
> each sync.
> 
> Is this working as designed, or a bug?

Between March 31 and April 2, bug 410505 triggered unnecessary rsync transfer of the entire md5-cache directory for every sync. That's fixed now.

Also, since the md5-dict format contains an md5 digest for every eclass that's inherited by a particular ebuild, it has to be updated every time one of those eclasses changes. For example, when eutils.eclass is modified, it will cause a large number of files in the md5-cache directory to be updated because many of them reference eutils:

$ find /usr/portage/metadata/md5-cache -type f | xargs grep -l eutils | wc -l
23093