557192 – rsync refetches all Manifest files and profiles/use.local.desc every time

Bug 557192 - rsync refetches all Manifest files and profiles/use.local.desc every time

Summary: rsync refetches all Manifest files and profiles/use.local.desc every time

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Infrastructure
Classification:	Unclassified
Component:	Other (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo Infrastructure

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	549914
	Show dependency tree

Reported:	2015-08-10 08:02 UTC by Ulrich Müller
Modified:	2015-09-22 01:35 UTC (History)
CC List:	26 users (show)

See Also:	557962
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ulrich Müller gentoo-dev

2015-08-10 08:02:22 UTC

<ulm> looks like rsync refetches all Manifest files all the time?
<ulm> even if they haven't changed
<prometheanfire> guessing that the manifests are either wiped or regenerated every time :|
<prometheanfire> ulm: make a bug?
<ulm> yeah, will do

Comment 1 Ulrich Müller gentoo-dev

2015-08-10 16:52:09 UTC

profiles/use.local.desc suffers from the same problem, BTW.

Comment 2 Mikle Kolyada (RETIRED) archtester

2015-08-10 20:18:24 UTC

(In reply to Ulrich Müller from comment #1)
> profiles/use.local.desc suffers from the same problem, BTW.

All the tree suffers from this problem, emerge --sync pulls everything again and again.

Comment 3 Arnaud Launay 2015-08-12 06:32:57 UTC

Same here, multiple servers.

Comment 4 Matthew Thode ( prometheanfire ) archtester

2015-08-12 07:48:39 UTC

Adds load to the mirrors, but not sure if it's enough to worry about.

Add this to make.conf

PORTAGE_RSYNC_EXTRA_OPTS="--checksum"

Comment 5 Matthew Thode ( prometheanfire ) archtester

2015-08-12 07:51:45 UTC

this should either be made default
or
mirrors should be updated to --checksum from masterdistfiles
or
we should stop writing files that don't change, we are using repoman manifest in parallel right now to do the signing, if someone has a better method that doesn't rewrite the files...

Comment 6 Ulrich Müller gentoo-dev

2015-08-12 08:02:26 UTC

(In reply to Matthew Thode ( prometheanfire ) from comment #4)
> Adds load to the mirrors, but not sure if it's enough to worry about.
> 
> Add this to make.conf
> 
> PORTAGE_RSYNC_EXTRA_OPTS="--checksum"

That's not an acceptable solution. With --checksum, rsync on the client's side must read all files, instead of just their inode.


(In reply to Matthew Thode ( prometheanfire ) from comment #5)

> [...] or
> we should stop writing files that don't change,

^^ This, please.

> we are using repoman manifest in parallel right now to do the signing, if
> someone has a better method that doesn't rewrite the files...

Run repoman only if some file in the dir is newer than the (previous) Manifest?

Comment 7 Michał Górny archtester

2015-08-12 12:41:33 UTC

My original scripts actually used 'git diff' to process only the directories that were changed. I don't know why this was replaced by far less efficient solution.

Comment 8 Matthew Thode ( prometheanfire ) archtester

2015-08-12 14:41:45 UTC

probably because we were tired :P

Comment 9 Zac Medico gentoo-dev

2015-08-15 20:35:50 UTC

(In reply to Matthew Thode ( prometheanfire ) from comment #5)
> we should stop writing files that don't change, we are using repoman
> manifest in parallel right now to do the signing, if someone has a better
> method that doesn't rewrite the files...

Note that egencache --update-manifests was intended for this purpose. It does manifests in parallel with the --jobs and --load-average options.

Comment 10 Robin Johnson archtester

2015-08-16 16:22:30 UTC

zmedico:
--update-manifests seems to be extremely slow compared to the other variant we are using.

time find -maxdepth 1 -mindepth 1 -type d \
 | parallel --verbose -j4 'cd {} && repoman manifest'
hot: wallclock ~1m10s, cputime ~4m14s
cold: ~20min wallclock, 80min cpu

--update-manifests: did not finish in 60 minutes with --jobs=4.

egencache command was:
egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \
    --portdir=${STAGEDIR} \
    --update-use-local-desc \
    --update-manifests --thin-manifests=n \
    --repo=gentoo \
    >> ${REGEN_LOG_DIR}/${REGEN_LOG_FILE} 2>&1

dwfreed and I started to write a better solution, that was aware of how mtimes needed to be propagated in creating thick Manifests, but I've had some work travel, so it wasn't finished.

(specifically, the mtime on the thick Manifest should be the greatest mtime of any file in a given package)

Comment 11 Zac Medico gentoo-dev

2015-08-16 18:14:18 UTC

(In reply to Robin Johnson from comment #10)
> zmedico:
> --update-manifests seems to be extremely slow compared to the other variant
> we are using.

Okay, I'll have to do some profiling to see what's wrong.

> (specifically, the mtime on the thick Manifest should be the greatest mtime
> of any file in a given package)

But eclass changes can cause the DIST entries in the Manifest to change, and non-ebuild files are irrelevant to the Manifest when using thin-manifests. So, I think it would make sense to use the max mtime of the ebuilds and the eclasses they inherit.

Comment 12 dwfreed 2015-08-16 18:37:10 UTC

(In reply to Zac Medico from comment #11)
> (In reply to Robin Johnson from comment #10)
> > zmedico:
> > --update-manifests seems to be extremely slow compared to the other variant
> > we are using.
> 
> Okay, I'll have to do some profiling to see what's wrong.
> 
> > (specifically, the mtime on the thick Manifest should be the greatest mtime
> > of any file in a given package)
> 
> But eclass changes can cause the DIST entries in the Manifest to change, and
> non-ebuild files are irrelevant to the Manifest when using thin-manifests.
> So, I think it would make sense to use the max mtime of the ebuilds and the
> eclasses they inherit.

If an eclass changes the resulting set of DIST entries, wouldn't you have to regenerate the thin manifest?  In which case, it'd propagate down to the thick manifest anyway.

Comment 13 Zac Medico gentoo-dev

2015-08-16 19:05:46 UTC

(In reply to dwfreed from comment #12)
> If an eclass changes the resulting set of DIST entries, wouldn't you have to
> regenerate the thin manifest?  In which case, it'd propagate down to the
> thick manifest anyway.

Yes, that makes sense. So, "the mtime on the thick Manifest should be the greatest mtime of any file in a given package" is correct if you include the thin Manifest mtime.

Comment 14 dwfreed 2015-08-16 19:32:27 UTC

(In reply to Zac Medico from comment #13)
> (In reply to dwfreed from comment #12)
> > If an eclass changes the resulting set of DIST entries, wouldn't you have to
> > regenerate the thin manifest?  In which case, it'd propagate down to the
> > thick manifest anyway.
> 
> Yes, that makes sense. So, "the mtime on the thick Manifest should be the
> greatest mtime of any file in a given package" is correct if you include the
> thin Manifest mtime.

Which I am :)

For the record, here's my script:

https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy

Comment 15 Zac Medico gentoo-dev

2015-08-16 21:25:09 UTC

(In reply to dwfreed from comment #14)
> For the record, here's my script:
> 
> https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy

The mtime code seems like it should work. I've filed bug 557962 to add similar behavior directly to portage's Manifest.write() method.

Comment 16 dwfreed 2015-08-16 23:36:14 UTC

I wonder if the --update-manifests code of egencache is running before the regular --update code.  In my initial implementation of my script, I was not running egencache --update on the repo first, and I noticed that FetchlistDict creation was taking forever; a simple strace showed me that the portage code was sourcing the ebuild to assemble the FetchlistDict.  The results of this sourcing would then be otherwise wasted, and so it would have to source the ebuild again for md5-cache generation.  If md5-cache generation is run first, this is used for FetchlistDict generation, and everything is much faster.  It doesn't explain why sourcing ebuilds for FetchlistDict generation is orders of magnitude slower than sourcing ebuilds for md5-cache generation, but might explain a large part of it.

Comment 17 Zac Medico gentoo-dev

2015-08-17 00:18:32 UTC

(In reply to dwfreed from comment #16)
> The results of this sourcing would then be otherwise wasted, and so it would
> have to source the ebuild again for md5-cache generation.  If md5-cache
> generation is run first, this is used for FetchlistDict generation, and
> everything is much faster.  It doesn't explain why sourcing ebuilds for
> FetchlistDict generation is orders of magnitude slower than sourcing ebuilds
> for md5-cache generation, but might explain a large part of it.

One thing that makes it slower is that it does not parallelize the cache generation in this case. The cache is generated on-demand in the main process, so the required time will be comparable to egencache --update --jobs=1.

It's just not optimized for this case, since the assumption is that you will either call egencache --update either before or together with --update-manifests.

Comment 18 Matthew Thode ( prometheanfire ) archtester

2015-08-17 06:31:02 UTC

possible fix deployed

-find ${STAGEDIR} -maxdepth 1 -mindepth 1 -type d | parallel -j4  'cd {} && repoman manifest' >>${REGEN_LOG_DIR}/${REGEN_LOG_FILE}
+
+# copy cached manifests to avoid regen
+for manifest in ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/*/*/Manifest; do
+    manifest=${manifest#${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/}
+    # redirect to stderr for packages that were just removed
+    cp ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/${manifest} ${STAGEDIR}/${manifest} 2>/dev/null
+done
+
+# look for ebuilds that have been touched and need manifests regenerated
+new_manifest_list=$(find ${BASE}/exports/gentoo-x86/ -type f -name Manifest -mmin -30)
+# regerate those manifests
+for manifest in ${new_manifest_list}; do
+    package_dir=${manifest%/Manifest}
+    pushd ${package_dir}
+    repoman manifest
+    popd
+done
 
 # for egencache, set user/group or make sure the user is in the portage group
        #--update-changelogs \
@@ -109,6 +125,9 @@ egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \
 rval=$?
 echo "END      REGEN                   $(date -u)" >> ${TIMESLOG}
 
+# copy so we don't have to regen all the time
+rsync -am --delete --include='*Manifest' --include='*/' --exclude='*' ${STAGEDIR} ${BASE}/gentoo-x86-manifest-scratch
+

Comment 19 dwfreed 2015-08-17 07:57:35 UTC

prometheanfire showed me the exact script that's being used to generate the rsync tree, and the following changes will get rid of the nasty hack he added and use my script instead:

2015-08-17 06:44:36 <dwfreed> remove line 100-115, 122, and 129; then run my script at line 129
2015-08-17 06:46:11 <dwfreed> ./thicken-manifests.py -j 4 ${STAGEDIR}

The first run should complete in under 20 minutes, and subsequent runs will complete in less than a minute.  The -j (--jobs) flag defaults to the number of CPUs on the host, so if you want to give it all the cores, you can just omit it.

Comment 20 Matthew Thode ( prometheanfire ) archtester

2015-08-17 08:17:27 UTC

my fix was didn't work for the manifests anyway

Comment 21 Fabian Groffen gentoo-dev

2015-08-17 08:48:00 UTC

FWIW, in Prefix we use

http://hg.code.sf.net/p/gentooprefixtree/code/file/f225be4b4d76/scripts/rsync-generation/hashgen.c

which does it in ~1m and restores the mtime of the Manifest before it was thickened.

Comment 22 Per Pomsel 2015-08-20 21:37:42 UTC

wtf? Revert the bad changes and stop bothering users. You see, it's not working (for days) - so what are you waiting for?

Comment 23 Zac Medico gentoo-dev

2015-08-20 22:01:11 UTC

(In reply to Per Pomsel from comment #22)
> wtf? Revert the bad changes and stop bothering users.

It's a by-product of the migration from CVS to git, and we can't really go back to CVS at this point. So, the only option is to fix the git -> rsync process.

Comment 24 cronolio 2015-08-20 22:22:47 UTC

gentoo resources so limited that you can not keep for a while cvs and git?

Comment 25 Zac Medico gentoo-dev

2015-08-20 22:45:28 UTC

(In reply to salikov.alexey from comment #24)
> gentoo resources so limited that you can not keep for a while cvs and git?

We've still got the old cvs repo but it's read-only now. Developers now push everything to git, and cvs is only retained for historical purposes.

Comment 26 Michał Górny archtester

2015-08-23 08:23:16 UTC

I've patched Portage on dipper with patch from git which causes all Manifests to copy mtime over from the newest Manifested file. Hopefully this will cause one more full Manifest copy, and then things will get back to normal. Please let me know if it works for you.

Comment 27 Konstantinos Smanis 2015-08-23 12:57:56 UTC

It seems to be working as advertised (one last full Manifest pull then things go back to normal).

Comment 28 Ulrich Müller gentoo-dev

2015-08-25 06:49:43 UTC

(In reply to Ulrich Müller from comment #1)
> profiles/use.local.desc suffers from the same problem, BTW.

This part isn't solved. use.local.desc is refetched even if it is unchanged.

Comment 29 Zac Medico gentoo-dev

2015-08-25 07:48:56 UTC

(In reply to Ulrich Müller from comment #28)
> (In reply to Ulrich Müller from comment #1)
> > profiles/use.local.desc suffers from the same problem, BTW.
> 
> This part isn't solved. use.local.desc is refetched even if it is unchanged.

There's a portage patch in this branch:

https://github.com/zmedico/portage/commits/bug_557192

Comment 30 Zac Medico gentoo-dev

2015-08-26 01:53:23 UTC

(In reply to Zac Medico from comment #29)
> (In reply to Ulrich Müller from comment #28)
> > (In reply to Ulrich Müller from comment #1)
> > > profiles/use.local.desc suffers from the same problem, BTW.
> > 
> > This part isn't solved. use.local.desc is refetched even if it is unchanged.
> 
> There's a portage patch in this branch:
> 
> https://github.com/zmedico/portage/commits/bug_557192

In the master branch now:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=041a81dd1b99d538620ea395d1cf1a36c47a7735

Comment 31 Brian Dolbec (RETIRED) gentoo-dev

2015-09-22 01:35:06 UTC

Released -n portage-2.2.21

alexander
asl
axiator
azamat.hackimov
bertrand
bircoph
bug
che
cloos
dwfreed
grobian
holger
hydrapolic
junghans
konstantinos.smanis
maksbotan
pacho
pesa
phantom4
rdalek1967
rhill
salikov.alexey
sebastien.picavet
stilriv
zlogene
zmedico