Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 688922 - dev-texlive/texlive-fontsextra spends a lot of time in src_unpack()
Summary: dev-texlive/texlive-fontsextra spends a lot of time in src_unpack()
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: TeX project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-29 10:01 UTC by Sergei Trofimovich (RETIRED)
Modified: 2023-06-16 01:14 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich (RETIRED) gentoo-dev 2019-06-29 10:01:51 UTC
Reproducer:
    $ time ebuild texlive-fontsextra-2019.ebuild clean prepare
    real    18m55,728s
    user    5m28,332s
    sys     13m48,050s

The problem comes from
  texlive-module.eclass:texlive-module_src_unpack()

  texlive-module_src_unpack() {
        unpack ${A}

        grep RELOC tlpkg/tlpobj/* | awk '{print $2}' | sed 's#^RELOC/##' > "${T}/reloclist"
        { for i in $(<"${T}/reloclist"); do  dirname $i; done; } | uniq > "${T}/dirlist"
        for i in $(<"${T}/dirlist"); do
                [ -d "${RELOC_TARGET}/${i}" ] || mkdir -p "${RELOC_TARGET}/${i}"
        done
        for i in $(<"${T}/reloclist"); do
                mv "${i}" "${RELOC_TARGET}"/$(dirname "${i}") || die "failed to relocate ${i} to ${RELOC_TARGET}/$(dirname ${i})"
        done
  }

Most of the time here is spent in dirname and mv external binary calls. reloclist has 78 000 entries.

Converting dirname into a shell code decreases runtime in half:

--- a/eclass/texlive-module.eclass
+++ b/eclass/texlive-module.eclass
@@ -143,16 +143,21 @@ S="${WORKDIR}"
 
 RELOC_TARGET=texmf-dist
 
+# faster than external 'dirname' binary
+dn() {
+       echo "${1%/*}"
+}
+
 texlive-module_src_unpack() {
        unpack ${A}
 
        grep RELOC tlpkg/tlpobj/* | awk '{print $2}' | sed 's#^RELOC/##' > "${T}/reloclist"
-       { for i in $(<"${T}/reloclist"); do  dirname $i; done; } | uniq > "${T}/dirlist"
+       { for i in $(<"${T}/reloclist"); do  dn $i; done; } | uniq > "${T}/dirlist"
        for i in $(<"${T}/dirlist"); do
-               [ -d "${RELOC_TARGET}/${i}" ] || mkdir -p "${RELOC_TARGET}/${i}"
-       done
+               [ -d "${RELOC_TARGET}/${i}" ] || echo "${RELOC_TARGET}/${i}"
+       done | xargs mkdir -p
        for i in $(<"${T}/reloclist"); do
-               mv "${i}" "${RELOC_TARGET}"/$(dirname "${i}") || die "failed to relocate ${i} to ${RELOC_TARGET}/$(dirname ${i})"
+               mv "${i}" "${RELOC_TARGET}"/$(dn "${i}") || die "failed to relocate ${i} to ${RELOC_TARGET}/$(dn ${i})"
        done
 }

$ time ebuild texlive-fontsextra-2019.ebuild clean prepare
real    9m18,932s
user    2m57,735s
sys     6m32,770s

I suspect that making bulk 'mv' calls (one per directory) will bring it down to one minute (reasonable time).
Comment 1 Larry the Git Cow gentoo-dev 2020-04-14 13:16:57 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=97812dcc9ddb1e043d8a67216692b214d62099a7

commit 97812dcc9ddb1e043d8a67216692b214d62099a7
Author:     Michał Górny <mgorny@gentoo.org>
AuthorDate: 2020-04-14 13:14:42 +0000
Commit:     Michał Górny <mgorny@gentoo.org>
CommitDate: 2020-04-14 13:16:52 +0000

    texlive-module.eclass: Optimize src_unpack()
    
    This goes a bit further than slyfox's work.  On my machine, it reduces
    the post-unpack time from ~44m to ~13m.
    
    Bug: https://bugs.gentoo.org/688922
    Acked-by: Mikle Kolyada <zlogene@gentoo.org>
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

 eclass/texlive-module.eclass | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)
Comment 2 Sergei Trofimovich (RETIRED) gentoo-dev 2020-04-14 18:07:02 UTC
I think we can easily get to under a minute here: http://trofi.github.io/posts/215-perf-and-dwarf-and-fork.html
Comment 3 Sergei Trofimovich (RETIRED) gentoo-dev 2020-04-20 17:41:46 UTC
> Status: CONFIRMED → RESOLVED
> Resolution: --- → FIXED

10+ minutes is way above expected. I would not call it "FIXED".
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-06-15 21:29:16 UTC
commit 918b21dea9fc7e714c08dcedc24d0daf6b52ccdb
Author: Ulrich Müller <ulm@gentoo.org>
Date:   Fri Jun 2 09:44:21 2023 +0200

    texlive-module.eclass: Reduce number of executed external commands

    For texlive-latexextra-2021[doc], the number of "mv" commands is
    reduced from 12718 to 3130. Speedup is also by a factor of about 4,
    which saves another 4 seconds.

    Signed-off-by: Ulrich Müller <ulm@gentoo.org>

commit 6ee282f0645dcfccf1836b9cc7ae55556629eb8b
Author: Ulrich Müller <ulm@gentoo.org>
Date:   Fri Jun 2 01:09:59 2023 +0200

    texlive-module.eclass: Speed up SRC_URI calculation

    For texlive-latexextra-2021, SRC_URI calculation ran for 37 seconds
    here. Reduced it to 0.025 seconds (i.e. more than a factor 1000) by
    using bash arrays and parameter expansion instead of nested loops.

    Reported-by: Tim Harder <radhermit@gentoo.org>
    Signed-off-by: Ulrich Müller <ulm@gentoo.org>
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-06-16 01:14:51 UTC
(In reply to Sergei Trofimovich (RETIRED) from comment #2)
> I think we can easily get to under a minute here:
> http://trofi.github.io/posts/215-perf-and-dwarf-and-fork.html

I brought up the posix_spawn suggestion today at https://lists.gnu.org/archive/html/bug-bash/2023-06/msg00030.html.