Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 557192

Summary: rsync refetches all Manifest files and profiles/use.local.desc every time
Product: Gentoo Infrastructure Reporter: Ulrich Müller <ulm>
Component: OtherAssignee: Gentoo Infrastructure <infra-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: alexander, asl, axiator, azamat.hackimov, bertrand, bircoph, bug, che, cloos, dwfreed, grobian, holger, hydrapolic, junghans, konstantinos.smanis, maksbotan, pacho, pesa, phantom4, rdalek1967, rhill, salikov.alexey, sebastien.picavet, stilriv, zlogene, zmedico
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=557962
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 549914    

Description Ulrich Müller gentoo-dev 2015-08-10 08:02:22 UTC
<ulm> looks like rsync refetches all Manifest files all the time?
<ulm> even if they haven't changed
<prometheanfire> guessing that the manifests are either wiped or regenerated every time :|
<prometheanfire> ulm: make a bug?
<ulm> yeah, will do
Comment 1 Ulrich Müller gentoo-dev 2015-08-10 16:52:09 UTC
profiles/use.local.desc suffers from the same problem, BTW.
Comment 2 Mikle Kolyada (RETIRED) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-10 20:18:24 UTC
(In reply to Ulrich Müller from comment #1)
> profiles/use.local.desc suffers from the same problem, BTW.

All the tree suffers from this problem, emerge --sync pulls everything again and again.
Comment 3 Arnaud Launay 2015-08-12 06:32:57 UTC
Same here, multiple servers.
Comment 4 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-12 07:48:39 UTC
Adds load to the mirrors, but not sure if it's enough to worry about.

Add this to make.conf

PORTAGE_RSYNC_EXTRA_OPTS="--checksum"
Comment 5 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-12 07:51:45 UTC
this should either be made default
or
mirrors should be updated to --checksum from masterdistfiles
or
we should stop writing files that don't change, we are using repoman manifest in parallel right now to do the signing, if someone has a better method that doesn't rewrite the files...
Comment 6 Ulrich Müller gentoo-dev 2015-08-12 08:02:26 UTC
(In reply to Matthew Thode ( prometheanfire ) from comment #4)
> Adds load to the mirrors, but not sure if it's enough to worry about.
> 
> Add this to make.conf
> 
> PORTAGE_RSYNC_EXTRA_OPTS="--checksum"

That's not an acceptable solution. With --checksum, rsync on the client's side must read all files, instead of just their inode.


(In reply to Matthew Thode ( prometheanfire ) from comment #5)

> [...] or
> we should stop writing files that don't change,

^^ This, please.

> we are using repoman manifest in parallel right now to do the signing, if
> someone has a better method that doesn't rewrite the files...

Run repoman only if some file in the dir is newer than the (previous) Manifest?
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-08-12 12:41:33 UTC
My original scripts actually used 'git diff' to process only the directories that were changed. I don't know why this was replaced by far less efficient solution.
Comment 8 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-12 14:41:45 UTC
probably because we were tired :P
Comment 9 Zac Medico gentoo-dev 2015-08-15 20:35:50 UTC
(In reply to Matthew Thode ( prometheanfire ) from comment #5)
> we should stop writing files that don't change, we are using repoman
> manifest in parallel right now to do the signing, if someone has a better
> method that doesn't rewrite the files...

Note that egencache --update-manifests was intended for this purpose. It does manifests in parallel with the --jobs and --load-average options.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2015-08-16 16:22:30 UTC
zmedico:
--update-manifests seems to be extremely slow compared to the other variant we are using.

time find -maxdepth 1 -mindepth 1 -type d \
 | parallel --verbose -j4 'cd {} && repoman manifest'
hot: wallclock ~1m10s, cputime ~4m14s
cold: ~20min wallclock, 80min cpu

--update-manifests: did not finish in 60 minutes with --jobs=4.

egencache command was:
egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \
    --portdir=${STAGEDIR} \
    --update-use-local-desc \
    --update-manifests --thin-manifests=n \
    --repo=gentoo \
    >> ${REGEN_LOG_DIR}/${REGEN_LOG_FILE} 2>&1

dwfreed and I started to write a better solution, that was aware of how mtimes needed to be propagated in creating thick Manifests, but I've had some work travel, so it wasn't finished.

(specifically, the mtime on the thick Manifest should be the greatest mtime of any file in a given package)
Comment 11 Zac Medico gentoo-dev 2015-08-16 18:14:18 UTC
(In reply to Robin Johnson from comment #10)
> zmedico:
> --update-manifests seems to be extremely slow compared to the other variant
> we are using.

Okay, I'll have to do some profiling to see what's wrong.

> (specifically, the mtime on the thick Manifest should be the greatest mtime
> of any file in a given package)

But eclass changes can cause the DIST entries in the Manifest to change, and non-ebuild files are irrelevant to the Manifest when using thin-manifests. So, I think it would make sense to use the max mtime of the ebuilds and the eclasses they inherit.
Comment 12 dwfreed 2015-08-16 18:37:10 UTC
(In reply to Zac Medico from comment #11)
> (In reply to Robin Johnson from comment #10)
> > zmedico:
> > --update-manifests seems to be extremely slow compared to the other variant
> > we are using.
> 
> Okay, I'll have to do some profiling to see what's wrong.
> 
> > (specifically, the mtime on the thick Manifest should be the greatest mtime
> > of any file in a given package)
> 
> But eclass changes can cause the DIST entries in the Manifest to change, and
> non-ebuild files are irrelevant to the Manifest when using thin-manifests.
> So, I think it would make sense to use the max mtime of the ebuilds and the
> eclasses they inherit.

If an eclass changes the resulting set of DIST entries, wouldn't you have to regenerate the thin manifest?  In which case, it'd propagate down to the thick manifest anyway.
Comment 13 Zac Medico gentoo-dev 2015-08-16 19:05:46 UTC
(In reply to dwfreed from comment #12)
> If an eclass changes the resulting set of DIST entries, wouldn't you have to
> regenerate the thin manifest?  In which case, it'd propagate down to the
> thick manifest anyway.

Yes, that makes sense. So, "the mtime on the thick Manifest should be the greatest mtime of any file in a given package" is correct if you include the thin Manifest mtime.
Comment 14 dwfreed 2015-08-16 19:32:27 UTC
(In reply to Zac Medico from comment #13)
> (In reply to dwfreed from comment #12)
> > If an eclass changes the resulting set of DIST entries, wouldn't you have to
> > regenerate the thin manifest?  In which case, it'd propagate down to the
> > thick manifest anyway.
> 
> Yes, that makes sense. So, "the mtime on the thick Manifest should be the
> greatest mtime of any file in a given package" is correct if you include the
> thin Manifest mtime.

Which I am :)

For the record, here's my script:

https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy
Comment 15 Zac Medico gentoo-dev 2015-08-16 21:25:09 UTC
(In reply to dwfreed from comment #14)
> For the record, here's my script:
> 
> https://bitbucket.org/snippets/dwfreed/Roekq/thicken-manifestspy

The mtime code seems like it should work. I've filed bug 557962 to add similar behavior directly to portage's Manifest.write() method.
Comment 16 dwfreed 2015-08-16 23:36:14 UTC
I wonder if the --update-manifests code of egencache is running before the regular --update code.  In my initial implementation of my script, I was not running egencache --update on the repo first, and I noticed that FetchlistDict creation was taking forever; a simple strace showed me that the portage code was sourcing the ebuild to assemble the FetchlistDict.  The results of this sourcing would then be otherwise wasted, and so it would have to source the ebuild again for md5-cache generation.  If md5-cache generation is run first, this is used for FetchlistDict generation, and everything is much faster.  It doesn't explain why sourcing ebuilds for FetchlistDict generation is orders of magnitude slower than sourcing ebuilds for md5-cache generation, but might explain a large part of it.
Comment 17 Zac Medico gentoo-dev 2015-08-17 00:18:32 UTC
(In reply to dwfreed from comment #16)
> The results of this sourcing would then be otherwise wasted, and so it would
> have to source the ebuild again for md5-cache generation.  If md5-cache
> generation is run first, this is used for FetchlistDict generation, and
> everything is much faster.  It doesn't explain why sourcing ebuilds for
> FetchlistDict generation is orders of magnitude slower than sourcing ebuilds
> for md5-cache generation, but might explain a large part of it.

One thing that makes it slower is that it does not parallelize the cache generation in this case. The cache is generated on-demand in the main process, so the required time will be comparable to egencache --update --jobs=1.

It's just not optimized for this case, since the assumption is that you will either call egencache --update either before or together with --update-manifests.
Comment 18 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-17 06:31:02 UTC
possible fix deployed

-find ${STAGEDIR} -maxdepth 1 -mindepth 1 -type d | parallel -j4  'cd {} && repoman manifest' >>${REGEN_LOG_DIR}/${REGEN_LOG_FILE}
+
+# copy cached manifests to avoid regen
+for manifest in ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/*/*/Manifest; do
+    manifest=${manifest#${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/}
+    # redirect to stderr for packages that were just removed
+    cp ${BASE}/gentoo-x86-manifest-scratch/${STAGEDIR}/${manifest} ${STAGEDIR}/${manifest} 2>/dev/null
+done
+
+# look for ebuilds that have been touched and need manifests regenerated
+new_manifest_list=$(find ${BASE}/exports/gentoo-x86/ -type f -name Manifest -mmin -30)
+# regerate those manifests
+for manifest in ${new_manifest_list}; do
+    package_dir=${manifest%/Manifest}
+    pushd ${package_dir}
+    repoman manifest
+    popd
+done
 
 # for egencache, set user/group or make sure the user is in the portage group
        #--update-changelogs \
@@ -109,6 +125,9 @@ egencache --update --rsync --jobs=4 --tolerant --cache-dir=${BASE}/tmp/ \
 rval=$?
 echo "END      REGEN                   $(date -u)" >> ${TIMESLOG}
 
+# copy so we don't have to regen all the time
+rsync -am --delete --include='*Manifest' --include='*/' --exclude='*' ${STAGEDIR} ${BASE}/gentoo-x86-manifest-scratch
+
Comment 19 dwfreed 2015-08-17 07:57:35 UTC
prometheanfire showed me the exact script that's being used to generate the rsync tree, and the following changes will get rid of the nasty hack he added and use my script instead:

2015-08-17 06:44:36 <dwfreed> remove line 100-115, 122, and 129; then run my script at line 129
2015-08-17 06:46:11 <dwfreed> ./thicken-manifests.py -j 4 ${STAGEDIR}

The first run should complete in under 20 minutes, and subsequent runs will complete in less than a minute.  The -j (--jobs) flag defaults to the number of CPUs on the host, so if you want to give it all the cores, you can just omit it.
Comment 20 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2015-08-17 08:17:27 UTC
my fix was didn't work for the manifests anyway
Comment 21 Fabian Groffen gentoo-dev 2015-08-17 08:48:00 UTC
FWIW, in Prefix we use

http://hg.code.sf.net/p/gentooprefixtree/code/file/f225be4b4d76/scripts/rsync-generation/hashgen.c

which does it in ~1m and restores the mtime of the Manifest before it was thickened.
Comment 22 Per Pomsel 2015-08-20 21:37:42 UTC
wtf? Revert the bad changes and stop bothering users. You see, it's not working (for days) - so what are you waiting for?
Comment 23 Zac Medico gentoo-dev 2015-08-20 22:01:11 UTC
(In reply to Per Pomsel from comment #22)
> wtf? Revert the bad changes and stop bothering users.

It's a by-product of the migration from CVS to git, and we can't really go back to CVS at this point. So, the only option is to fix the git -> rsync process.
Comment 24 cronolio 2015-08-20 22:22:47 UTC
gentoo resources so limited that you can not keep for a while cvs and git?
Comment 25 Zac Medico gentoo-dev 2015-08-20 22:45:28 UTC
(In reply to salikov.alexey from comment #24)
> gentoo resources so limited that you can not keep for a while cvs and git?

We've still got the old cvs repo but it's read-only now. Developers now push everything to git, and cvs is only retained for historical purposes.
Comment 26 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-08-23 08:23:16 UTC
I've patched Portage on dipper with patch from git which causes all Manifests to copy mtime over from the newest Manifested file. Hopefully this will cause one more full Manifest copy, and then things will get back to normal. Please let me know if it works for you.
Comment 27 Konstantinos Smanis 2015-08-23 12:57:56 UTC
It seems to be working as advertised (one last full Manifest pull then things go back to normal).
Comment 28 Ulrich Müller gentoo-dev 2015-08-25 06:49:43 UTC
(In reply to Ulrich Müller from comment #1)
> profiles/use.local.desc suffers from the same problem, BTW.

This part isn't solved. use.local.desc is refetched even if it is unchanged.
Comment 29 Zac Medico gentoo-dev 2015-08-25 07:48:56 UTC
(In reply to Ulrich Müller from comment #28)
> (In reply to Ulrich Müller from comment #1)
> > profiles/use.local.desc suffers from the same problem, BTW.
> 
> This part isn't solved. use.local.desc is refetched even if it is unchanged.

There's a portage patch in this branch:

https://github.com/zmedico/portage/commits/bug_557192
Comment 30 Zac Medico gentoo-dev 2015-08-26 01:53:23 UTC
(In reply to Zac Medico from comment #29)
> (In reply to Ulrich Müller from comment #28)
> > (In reply to Ulrich Müller from comment #1)
> > > profiles/use.local.desc suffers from the same problem, BTW.
> > 
> > This part isn't solved. use.local.desc is refetched even if it is unchanged.
> 
> There's a portage patch in this branch:
> 
> https://github.com/zmedico/portage/commits/bug_557192

In the master branch now:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=041a81dd1b99d538620ea395d1cf1a36c47a7735
Comment 31 Brian Dolbec (RETIRED) gentoo-dev 2015-09-22 01:35:06 UTC
Released -n portage-2.2.21