In function DependencyGraph.get_leaf_nodes(), in case of circular dependency, the "# Create a list of tuples containing the number of parents paired with the corresponding node" loop takes quite some time during an "emerge -puD world". % time emerge -puD world These are the packages that I would merge, in order: Calculating world dependencies ...done! [ebuild U ] sys-devel/gettext-0.14.2 [0.14.1] [ebuild U ] app-arch/unzip-5.52 [5.50-r2] [ebuild U ] net-misc/fakeidentd-2.2-r1 [2.2] [ebuild U ] net-misc/netkit-telnetd-0.17-r5 [0.17-r4] <here, there is an ~5 seconds lag> [ebuild U ] app-arch/zip-2.31 [2.3-r4] [ebuild U ] dev-db/mysql-4.0.24-r1 [4.0.23-r2] [ebuild U ] sys-apps/iproute2-2.6.11.20050310 [2.6.10.20050124] [ebuild U ] sys-devel/libtool-1.5.14 [1.5.10-r5] [ebuild U ] media-video/realplayer-10.0.3 [10.0.2] [ebuild U ] sci-calculators/qalculate-0.7.2 [0.7.1] [ebuild U ] games-arcade/blobwars-1.03 [1.02] [ebuild U ] sys-devel/crossdev-0.9.1-r2 [0.9.1-r1] real 0m17.206s user 0m16.115s sys 0m0.644s Profiling gives me this: >>> p = pstats.Stats('/tmp/emerge-1.prof') >>> p.sort_stats('cumulative').print_stats(20) Sun Mar 13 21:36:13 2005 /tmp/emerge-1.prof 1521973 function calls (1482864 primitive calls) in 30.720 CPU seconds Ordered by: cumulative time List reduced from 429 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.010 0.010 30.720 30.720 /usr/lib/portage/pym/emerge.py:6(?) 18/1 0.000 0.000 30.720 30.720 <string>:1(?) 1 0.000 0.000 30.720 30.720 profile:0(import emerge) 1 0.000 0.000 22.000 22.000 /usr/lib/portage/pym/emerge.py:1333(xcreate) 6130/592 0.350 0.000 19.940 0.034 /usr/lib/portage/pym/emerge.py:891(create) 1174/459 0.250 0.000 16.040 0.035 /usr/lib/portage/pym/emerge.py:1143(select_dep) 1156 0.040 0.000 8.680 0.008 /usr/lib/portage/pym/config.py:452(setcpv) 10075/6370 0.290 0.000 7.120 0.001 /usr/lib/portage/pym/portage.py:3273(xmatch) 1 0.050 0.050 7.090 7.090 /usr/lib/portage/pym/emerge.py:2362(display_flat) 1059 0.500 0.000 6.940 0.007 /usr/lib/portage/pym/portage_dep.py:326(get_leaf_nodes) 126314 4.050 0.000 6.830 0.000 /usr/lib/portage/pym/portage_dep.py:574(match_from_list) 11763 0.180 0.000 6.470 0.001 /usr/lib/portage/pym/portage_dep.py:438(get_parent_nodes) 1156 0.000 0.000 6.360 0.006 /usr/lib/portage/pym/portage_dep.py:558(best_match_to_list) 1156 0.850 0.001 6.360 0.006 /usr/lib/portage/pym/portage_dep.py:547(match_to_list) 11789 6.290 0.001 6.290 0.001 /usr/lib/portage/pym/portage_dep.py:460(__traverse_nodes) 26405 0.320 0.000 3.820 0.000 /usr/lib/portage/pym/portage.py:1854(dep_expand) 1174 0.280 0.000 3.670 0.003 /usr/lib/portage/pym/portage.py:1913(dep_check) 26405 1.040 0.000 3.500 0.000 /usr/lib/portage/pym/portage.py:1766(cpv_expand) 5183 0.440 0.000 2.880 0.001 /usr/lib/portage/pym/portage.py:3254(cp_list) 2186 0.290 0.000 2.810 0.001 /usr/lib/portage/pym/portage.py:3377(gvisible) In case there are several nodes that have the whole graph as parent, there is no point in listing them all and then trying to find one which has itself as child, since they all have themself as child. So the following patch cut the iteration in such a case, as soon as such a node is found. That gives an ~5 seconds improvement here: % time emerge -puD world These are the packages that I would merge, in order: Calculating world dependencies ...done! [ebuild U ] sys-devel/gettext-0.14.2 [0.14.1] [ebuild U ] app-arch/unzip-5.52 [5.50-r2] [ebuild U ] net-misc/fakeidentd-2.2-r1 [2.2] [ebuild U ] net-misc/netkit-telnetd-0.17-r5 [0.17-r4] [ebuild U ] app-arch/zip-2.31 [2.3-r4] [ebuild U ] sys-devel/libtool-1.5.14 [1.5.10-r5] [ebuild U ] dev-db/mysql-4.0.24-r1 [4.0.23-r2] [ebuild U ] media-video/realplayer-10.0.3 [10.0.2] [ebuild U ] sci-calculators/qalculate-0.7.2 [0.7.1] [ebuild U ] sys-apps/iproute2-2.6.11.20050310 [2.6.10.20050124] [ebuild U ] games-arcade/blobwars-1.03 [1.02] [ebuild U ] sys-devel/crossdev-0.9.1-r2 [0.9.1-r1] real 0m11.950s user 0m10.871s sys 0m0.687s Note the package ordering is not exactly the same, but it should still be correct (but if i'm completly wrong in my understanding of that part of the code). Profiling after the patch gives: >>> p = pstats.Stats('/tmp/emerge-2.prof') >>> p.sort_stats('cumulative').print_stats(20) Sun Mar 13 21:40:02 2005 /tmp/emerge-2.prof 1515449 function calls (1476340 primitive calls) in 24.080 CPU seconds Ordered by: cumulative time List reduced from 429 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.010 0.010 24.080 24.080 /usr/lib/portage/pym/emerge.py:6(?) 18/1 0.000 0.000 24.080 24.080 <string>:1(?) 1 0.000 0.000 24.080 24.080 profile:0(import emerge) 1 0.000 0.000 21.000 21.000 /usr/lib/portage/pym/emerge.py:1333(xcreate) 6130/592 0.480 0.000 18.940 0.032 /usr/lib/portage/pym/emerge.py:891(create) 1174/459 0.190 0.000 15.100 0.033 /usr/lib/portage/pym/emerge.py:1143(select_dep) 1156 0.040 0.000 8.430 0.007 /usr/lib/portage/pym/config.py:452(setcpv) 10075/6370 0.220 0.000 6.780 0.001 /usr/lib/portage/pym/portage.py:3273(xmatch) 126314 3.720 0.000 6.560 0.000 /usr/lib/portage/pym/portage_dep.py:578(match_from_list) 1156 0.020 0.000 6.300 0.005 /usr/lib/portage/pym/portage_dep.py:562(best_match_to_list) 1156 0.900 0.001 6.280 0.005 /usr/lib/portage/pym/portage_dep.py:551(match_to_list) 26405 0.430 0.000 3.770 0.000 /usr/lib/portage/pym/portage.py:1854(dep_expand) 1174 0.240 0.000 3.420 0.003 /usr/lib/portage/pym/portage.py:1913(dep_check) 26405 0.860 0.000 3.340 0.000 /usr/lib/portage/pym/portage.py:1766(cpv_expand) 2186 0.210 0.000 2.740 0.001 /usr/lib/portage/pym/portage.py:3377(gvisible) 5183 0.450 0.000 2.620 0.001 /usr/lib/portage/pym/portage.py:3254(cp_list) 9785 0.590 0.000 2.500 0.000 /usr/lib/portage/pym/portage.py:3005(aux_get) 26917 0.680 0.000 2.290 0.000 /usr/lib/portage/pym/portage_file.py:55(listdir) 161 0.300 0.002 2.180 0.014 /usr/lib/portage/pym/config.py:494(regenerate) 158 0.000 0.000 2.150 0.014 /usr/lib/portage/pym/config.py:405(reset) Patch is against today's HEAD ($Id: portage_dep.py,v 1.26). Reproducible: Always Steps to Reproduce:
Created attachment 53361 [details, diff] portage_dep--get_leaf_nodes--shortcut.patch
Created attachment 53445 [details, diff] portage_dep--pop_leaf_node.patch Here are a few more hacks on portage_dep.py which seems to improve things a bit more. Handle with care tho, i've made the assumption that the loop breaking logic in get_leaf_nodes() was only an heuristic, which could be changed for something simpler and more efficient. But if it actually matters for correctness, that would be a different story...
Two other things i've tried that didn't improved anything: - using a couple of dicts instead of the couple of lists for the nodes children and parents in the graph (stacking an iterator instead of an index and length in the __traverse_nodes() code). - using, where ordering was obviously not important, a __traverse_node_dict() function similar to __traverse_node() but which returns a dict instead of list.
The logic is designed to minimize the number times that loop is entered. The end goal is parallel building of packages. Each time that loop is entered, whatever is returned cannot be built in parallel. Similarly, it also minimizes the number of cases where a package is built against an older version that is about to be upgraded. Parallel building is also why I chose get_leaf_nodes() rather than pop_leaf_node() or the like. Once everything is done, there will very rarely be any notable pause at all. The main cause of it is that --deep and --emptytree currently cause portage to ignore installed packages. Once installed packages are taken into account, the loop will almost never be entered and with a very small set when it is.
For sure the 2nd patch breaks the logic you're describing, and i agree it's a bad idea if the goal is parallel building. But the first one (the "shortcut" one) doesn't. All it does is to stop the search for the best node to return when there can't be better nodes anymore (ie. when we've found one which has the full graph in its parents set, meaning it is optimal from the parents count point of view, and is also its own child, meaning it would accepted anyway by the children circularity check that comes just after). Sure that's just a border case optimisation, but it happens that i can reproduce a ~1/3 CPU time saving on "emerge -puD world" here (no need to use --emptytree for that), so i think it's worth applying.
Comment on attachment 53445 [details, diff] portage_dep--pop_leaf_node.patch Obsoleted by Jason's explanation.
Okay. I missed the len(self.graph) part.