Bug 541030

Summary:	metadata transfer should be done after overlays update
Product:	Portage Development	Reporter:	Andrew Savchenko <bircoph>
Component:	Unclassified	Assignee:	Portage team <dev-portage>
Status:	RESOLVED WORKSFORME
Severity:	normal	CC:	bircoph
Priority:	Normal
Version:	unspecified
Hardware:	All
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---

Description Andrew Savchenko gentoo-dev

2015-02-22 11:28:19 UTC

Hello,

I use portage-2.2.17 with sqlite cache enable, thus I have metadata-transfer enabled in my FEATURES in order to run emerge --metadata after Gentoo tree sync.

I also use overlays via layman's sync plugin, so I have repos.conf/layman.conf as follows:

[science]
priority = 50
location = /var/lib/layman/science
layman-type = git
sync-type = laymansync
sync-uri = git://git.overlays.gentoo.org/proj/sci.git
auto-sync = Yes

(the same for other overlays)

The problem is that when running emerge --sync (or emaint sync -a) portage does the following in order:
1. Updates main tree.
2. Runs emerge --metadata.
3. Updates overlays.

This way old overlay data is being cached during emerge --metadata run, which is wrong. Please run metadata sync only after all repositories were updated.

Comment 1 Arfrever Frehtes Taifersar Arahesis 2015-02-22 11:58:10 UTC

Anyway almost all repositories (including repository "science") other than repository "gentoo" provide no metadata cache (in ${repository_location}/metadata/md5-cache if 'cache-formats = md5-dict' set).
You should run `emerge --regen` to locally generate metadata cache for these repositories.

Comment 2 Andrew Savchenko gentoo-dev

2015-02-22 12:58:55 UTC

This is not necessary: emerge --regen will just duplicate effort.

After running emerge --metadata I have files /var/cache/edb/dep/var/lib/layman/*.sqlite for each overlay updated. sqlite3 confirms that database structure and content from these files is similar to /var/cache/edb/dep/usr/portage.sqlite.

Comment 3 Arfrever Frehtes Taifersar Arahesis 2015-02-22 14:05:03 UTC

1. Delete /var/cache/edb/dep/*
2. Run: emerge --metadata
3. Check sizes of /var/cache/edb/dep/**/*.sqlite
4. Run: emerge --regen
5. Check sizes of /var/cache/edb/dep/**/*.sqlite
   They will be different than at step 3.

Comment 4 Brian Dolbec (RETIRED) gentoo-dev

2015-02-22 16:58:12 UTC

While we look at making the change needed.

You can add either a /etc/portage/postsync.d hook that runs once after all repos are updated or add a /etc/portage/repo.postsync.d hook that runs once for each repo sysnc'd.  For repo.postsync.d threee items are passed in to the hook script, repo name, location, sync-uri.

Comment 5 Andrew Savchenko gentoo-dev

2015-02-22 16:59:20 UTC

Hmm, indeed, you're right: after deleting old news new ones were empty databases.
Now I wonder why I had those files filled in the first place.

Another interesting observation: portage.sqlite was 10% smaller after removal and regenaration. Probably sqlite3 $i 'reindex; vacuum' should be used once in a while...

Comment 6 Andrew Savchenko gentoo-dev

2015-02-22 17:03:30 UTC

(In reply to Brian Dolbec from comment #4)
> While we look at making the change needed.
> 
> You can add either a /etc/portage/postsync.d hook that runs once after all
> repos are updated or add a /etc/portage/repo.postsync.d hook that runs once
> for each repo sysnc'd.  For repo.postsync.d threee items are passed in to
> the hook script, repo name, location, sync-uri.

In such case emerge --metadata will be run twice: after Gentoo tree sync and after all overlays update. And this is time consuming, especially on several old boxes of mine.

Right now I solved this by falling back to eix-update utility (with --regen hook) and by disabling autosync for layman overlays. This way I basically use pre-2.2.16 portage configuration.

Comment 7 Zac Medico gentoo-dev

2015-02-22 18:26:37 UTC

(In reply to Andrew Savchenko from comment #0)
> 1. Updates main tree.
> 2. Runs emerge --metadata.

It's not the same thing as emerge --metadata. I only transfers metadata for the repo that was just synced.

> 3. Updates overlays.
> 
> This way old overlay data is being cached during emerge --metadata run,
> which is wrong. Please run metadata sync only after all repositories were
> updated.

No, it does the right thing, because it will transfer metadata for an overlay if it has a metadata/md5-cache directory. Since the overlays don't have metadata/md5-cache directories, it skips the metadata transfer.

(In reply to Brian Dolbec from comment #4)
> While we look at making the change needed.

As explained above, no change is needed. The action_metadata function does not do any extra work. It is able to operate on one repo at a time.

> You can add either a /etc/portage/postsync.d hook that runs once after all
> repos are updated or add a /etc/portage/repo.postsync.d hook that runs once
> for each repo sysnc'd.  For repo.postsync.d threee items are passed in to
> the hook script, repo name, location, sync-uri.

This would be pointless, because action_metadata will already be called if for the repo if the metadata/md5-cache directory exists.

Comment 8 Zac Medico gentoo-dev

2015-02-22 18:32:19 UTC

For reference, see the SyncManager._sync_callback method:

https://github.com/gentoo/portage/blob/v2.2.17/pym/portage/sync/controller.py#L309

Note that it calls action_metadata only if the metadata/md5-cache directory exists. Also note that it uses the porttrees=[self.repo.location] parameter so that the function only transfers metadata for the current repository.

Comment 9 Andrew Savchenko gentoo-dev

2015-02-23 01:34:02 UTC

Now I configured repo.postsync.d to run egencache for overlays based on example scripts. Looks like everything works fine now. Thank you for explanations.