Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 492938 - backtrack value causes linear performance hit, early bailout possible?
Summary: backtrack value causes linear performance hit, early bailout possible?
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-30 10:40 UTC by Joe Breuer
Modified: 2022-10-20 02:44 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Breuer 2013-11-30 10:40:56 UTC
This is with stable sys-apps/portage-2.2.7.


I recently noticed that a typical 'emerge -bDNauvt @world' would take around 10 minutes on my machine (Core i7, used to take about one minute until recently).

Eventually I was frustrated enough to dig around in emerge using runsnakerun, and found that 75% of the time is spent in/below _add_pkg_dep_string, which got called ~30.000 times for 1706 packages (326 in world).

Further digging revealed that _resolve/_create_graph ran multiple times due to _backtrack_depgraph; and it *looks* to me that the code doesn't notice that the depgraph is not going to improve any more.

I *think* there's an early bailout possible in _backtrack_depgraph, by aborting when the current retry of depgraph does not yield any *new* backtrack info (i.e. the graph / backtrack info are unchanged in two consecutive tries).

Unfortunately, I don't know python (or the innards of portage) well enough to really attempt a fix myself; but perhaps that finding can be an easy target for someone who knows his way around the code.


The underlying cause for the not-improving backtrack attempts on my machine was the bibtexu vs. icu blocker; see bug 490362. I now managed to resolve this with the workaround documented there (manually rebuilding bibtexu).

In my situation, depgraph was run as many times as the configured backtrack value. This results in the runtime increasing linearly with the backtrack value. The default backtrack value of 10 exactly reflects the initial ten-fold perfomance decrease.
Comment 1 Sebastian Luther (few) 2013-11-30 13:27:34 UTC
I don't think that backtrack information didn't change. There's code that ensures that we don't build the dependency graph twice with the same backtrack info.

See bug 486580 comment 8 (replace python-exec with icu) for what happend. If you install portage-9999 your problem should be gone.
Comment 2 Sebastian Luther (few) 2013-12-02 20:56:21 UTC
Reopen if it's not gone with the fix from bug bug 486580.
Comment 3 Joe Breuer 2013-12-07 13:11:07 UTC
Since I've manually resolved the "unresolvable" icu/bibtexu blocker in the mean time, I cannot directly check the behaviour of portage-9999 in the original situation described above.

However, new updates on this system have some conflicts, and I ran a series of emerge -p with different backtrack values:


portage-2.2.7, python-3.2.5-r3:

backtrack  0: 0:54,92  3 conflicts 
backtrack  1: 2:05,47  4 conflicts
backtrack  2: 2:05,86  4 conflicts
backtrack  3: 2:40,00  4 conflicts
backtrack  4: 2:39,92  4 conflicts
backtrack  5: 2:40,37  0 conflicts
backtrack 10: 2:42,53  0 conflicts

portage-9999 (python-3.3.2-r2):

backtrack  0: 0:57,57  2 conflicts
backtrack  1: 2:18,09  4 conflicts
backtrack  2: 2:20,17  3 conflicts
backtrack  3: 2:19,20  0 conflicts
backtrack 10: 2:58,12  0 conflicts (1 outlier: 2:19,16)
backtrack 50: 2:59,16  0 conflicts

portage-2.2.7, python-3.3.2-r2:

backtrack  0: 0:57,09  3 conflicts
backtrack  1: 2:14,67  3 conflicts
backtrack  2: 2:12,50  4 conflicts
backtrack  3: 2:48,23  3 conflicts
backtrack  4: 2:43,84  4 conflicts
backtrack  5: 2:47,74  0 conflicts
backtrack 10: 2:50,01  0 conflicts
backtrack 50: 2:11,36  0 conflicts


[I probably could have saved myself the backtrack=1 times, since 1 and 2 appear to be folded to the same max_retries value.]

This yields a number of interesting (or not) observations:

a) portage-9999 stabilizes conflicts quicker when backtracking; which I take to confirm the fix of bug 486580. This *should* also help in my orginal situation, but I cannot directly verify this now.

b) With portage-9999 as well as portage-2.2.7 there is a variation of run time with increasing backtrack values beyond the point where no conflicts remain; also I get inconsistent results for high backtrack values. I cannot say what that signifies. Since there is a difference from 2:11 to 2:59 minutes I tried to analyze backtrack 10 - 50 further in portage-9999 and portage-2.2.7 with the newly-installed python-3.3.2-r2, but the more I look at it the more the results change between consecutive runs.

c) python-3.3.2-r2 appears a tad slower than python-3.2.5-r3 when running portage.

d) The conflicts detected with individual backtrack limits by portage-2.2.7 after python-3.3.2-r2 got installed with the portage-9999 tryout differ from those with portage-2.2.7 with python-3.2.5-r3. This could be due to either the depgraph having changed with python-3.3 being installed, or due to something else. (?)


I'll try to keep an eye on this, when I get another unresolvable conflict I'll run those comparisons again.


The "do not retry the depgraph when backtracking has not changed" logic mentioned by Sebastian Luther appears to be need_restart set to true from various _..._backtrack methods; i.e. if none of those do something useful there should, indeed, not be another depgraph retry.


portage-9999 explicitly mentions the packages causing rebuilds:

  (media-libs/libgphoto2-2.5.2-r1::gentoo, ebuild scheduled for merge) causes rebuilds for:
    (kde-base/kamera-4.10.5::gentoo, ebuild scheduled for merge)
    (media-gfx/sane-backends-1.0.23::gentoo, ebuild scheduled for merge)
    (media-gfx/digikam-3.2.0::gentoo, ebuild scheduled for merge)
    (app-emulation/wine-1.6::gentoo, ebuild scheduled for merge)


And here's the complete list of conflicts, annotated which are missing for individual tries:

** not with portage-3.3.2-r2 backtrack=0:
** not with portage-2.2.7 python-3.3 backtrack=1,backtrack=3
media-libs/flac:0

  (media-libs/flac-1.2.1-r3::gentoo, installed) pulled in by
    (no parents that aren't satisfied by other packages in this slot)

  (media-libs/flac-1.2.1-r3::gentoo, ebuild scheduled for merge) pulled in by
    >=media-libs/flac-1.2.1[abi_x86_32(-)?,abi_x86_64(-)?,abi_x86_x32(-)?,abi_mips_n32(-)?,abi_mips_n64(-)?,abi_mips_o32(-)?] required by (media-libs/audiofile-0.3.6-r1::gentoo, ebuild scheduled for merge)

** not with portage-2.2.7 python-3.2/python-3.3 backtrack=0:
** not with portage-3.3.2-r2 backtrack=0:
media-libs/libgphoto2:0  

  (media-libs/libgphoto2-2.5.2-r1::gentoo, ebuild scheduled for merge) pulled in by
    (no parents that aren't satisfied by other packages in this slot)

  (media-libs/libgphoto2-2.4.14-r1::gentoo, ebuild scheduled for merge) pulled in by
    media-libs/libgphoto2:0/0= required by (kde-base/kamera-4.10.5::gentoo, installed)
    media-libs/libgphoto2:0/0= required by (media-gfx/digikam-3.2.0::gentoo, installed)
    media-libs/libgphoto2:0/0= required by (media-gfx/sane-backends-1.0.23::gentoo, installed)

virtual/libffi:0

  (virtual/libffi-3.0.11::gentoo, installed) pulled in by
    (no parents that aren't satisfied by other packages in this slot)

  (virtual/libffi-3.0.11::gentoo, ebuild scheduled for merge) pulled in by
    virtual/libffi[abi_x86_32(-)?,abi_x86_64(-)?,abi_x86_x32(-)?,abi_mips_n32(-)?,abi_mips_n64(-)?,abi_mips_o32(-)?] required by (dev-libs/glib-2.36.4-r1::gentoo, ebuild scheduled for merge)

virtual/libiconv:0

  (virtual/libiconv-0::gentoo, installed) pulled in by
    (no parents that aren't satisfied by other packages in this slot)

  (virtual/libiconv-0::gentoo, ebuild scheduled for merge) pulled in by
    virtual/libiconv[abi_x86_32(-)?,abi_x86_64(-)?,abi_x86_x32(-)?,abi_mips_n32(-)?,abi_mips_n64(-)?,abi_mips_o32(-)?] required by (dev-libs/glib-2.36.4-r1::gentoo, ebuild scheduled for merge)
Comment 4 Sebastian Luther (few) 2013-12-07 17:56:51 UTC
That's an interesting analyses.

Some notes about backtracking:
It's not necessarily a linear sequence of steps. It's more like a tree with nodes having one or two children. For slot conflicts there are two children, for other things only one. The tree is explored depth first.

There is a limit both on the number of explored nodes (the --backtrack value) and the depth of the tree (currently max(1, (max_retries + 1) / 2), except that config changes don't increase depth).

When the last backtracking step did not solve all problems, then an additional last step is executed.

You should leave out different python versions. It's clear that they perform differently, but that's another issue.

If you aren't to scared to look at the portage code, I could give you some pointers how to get more information about the backtracking than those timings. More debug output from the backtracking would definitely be useful for debugging in general.
Comment 5 Joe Breuer 2013-12-08 12:59:51 UTC
"Good" news :-/

I've encountered another machine with the unresolved bibtexu/icu conflict; i.e. that exactly reflects the situation I was originally posting about.

I went on to run a series of backtrack tests with the unresolvable blocker present:

portage-2.2.7:

backtrack  0: 0:50,61  3 conflicts
backtrack  2: 1:55,50  5 conflicts
backtrack  3: 2:22,98  5 conflicts
backtrack  4: 2:20,41  5 conflicts
backtrack  5: 2:48,22  5 conflicts
backtrack  6: 2:50,38  5 conflicts
backtrack  7: 4:19,43  1 conflict
backtrack 10: 4:23,19  1 conflict (icu)
backtrack 20: 4:23,07  1 conflict (icu)


portage-9999:

backtrack  0: 0:51,29  3 conflicts
backtrack  2: 1:57,43  3 conflicts
backtrack  3: 2:28,77  3 conflicts
backtrack  4: 2:28,44  3 conflicts
backtrack  5: 2:28,87  0 conflicts
backtrack 10: 2:30,83  0 conflicts
backtrack 20: 2:31,32  0 conflicts


The conflicts are (without comparing down to the -rX level) the same as in the post above; with the additional icu conflict.


a) On this machine "M", the performance penalty per backtracking level is less severe than what I originally observed on the other machine "B"; but it's well visible in the first table. I'd assume that there are fewer backtracking candidates on this machine as opposed to the original problem.

Further details: M is slightly less performant than B; both are Core i7 with 8 and 12 GB RAM respectively. All numbers are cache-hot, i.e. no significant SSD accesses.
M has 1537 packages installed (282 in world); B (see above) 1706 / 326 packages.


b) portage-9999 definitely converges faster and with significantly lower run-time than portage-2.2.7 in this "conflict present" scenario; and it detects the reverse rebuild of bibtexu by itself (from backtrack=5 on), completely resolving the conflict. So the fix for bug 486580 really looks good!


c) portage-2.2.7 also levels off at some point (at backtrack=7 here); on B I originally did not check that far since - with the default backtrack of 10 - that same 'emerge -bDNpuvt @world' would take more than 10 minutes and conflict resolution had not converged yet.


d) The very small additional time linear with the backtrack limit (portage-9999: 2:28 -> 2:30 -> 2:31 with backtrack 5 -> 10 -> 20) matches my observation on B, but even if it isn't statistical noise it's so small compared to the total runtime that it's irrelevant.


Regarding python coding: I wouldn't feel qualified to dig around in the depgraph resolution engine's logic; but I think I'd manage adding tracing / selectively tracing certain situations.

I've seen that there's trace output available (writemsg_level activated by --debug) and tried to have a look at it, but couldn't find how/whether backtracking could be enabled seperately from general depgraph tracing output - the latter of which is completely overwhelming. Can't promise anything at the moment and I need to complete the updates on M for now; but when I feel the need I'll have another look and separate the backtracking tracing from the depgraph calculation tracing etc.

Any information on what kinds of additional tracing would be useful is very welcome.