658940 – app-portage/genlop: improve current merge time prediction

Bug 658940 - app-portage/genlop: improve current merge time prediction

Summary: app-portage/genlop: improve current merge time prediction

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Portage Tools Team

URL:
Whiteboard:
Keywords:	PATCH

Depends on:
Blocks:

Reported:	2018-06-24 10:39 UTC by Kai Krakow
Modified:	2024-01-16 03:34 UTC (History)
CC List:	4 users (show)

See Also:	https://github.com/gentoo-perl/genlop/pull/12 922144
Package list:
Runtime testing required:	---

Attachments
genlop: Calculate more accurate merge time (0001-genlop-Calculate-more-accurate-merge-time.patch,2.32 KB, patch) 2018-06-24 10:39 UTC, Kai Krakow	Details \| Diff
genlop: Calculate more accurate merge time (v2) (v2-0001-genlop-Calculate-more-accurate-merge-time.patch,2.38 KB, patch) 2018-09-11 18:40 UTC, Kai Krakow	Details \| Diff
v3: Fix use of uninitialized values (v3-0001-genlop-Calculate-more-accurate-merge-time.patch,2.38 KB, patch) 2019-07-06 09:56 UTC, Kai Krakow	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kai Krakow 2018-06-24 10:39:18 UTC

Created attachment 537024 [details, diff]
genlop: Calculate more accurate merge time

This patch adds a better merge time prediction by utilizing statistical methods:

1. It considers only the last 10 merges
2. It slice off the worst and best merge times

Find attached a patch with details description.

Comment 1 Kai Krakow 2018-06-24 11:10:07 UTC

Here is an example of the patch in action:

$ genlop -t qtwebengine|fgrep "merge time"
       merge time: 2 hours, 20 minutes and 29 seconds.
       merge time: 1 hour, 23 minutes and 33 seconds.
       merge time: 1 hour, 38 minutes and 59 seconds.
       merge time: 1 hour, 33 minutes and 14 seconds.
       merge time: 1 hour, 54 minutes and 7 seconds.
       merge time: 1 hour, 21 minutes and 49 seconds.
       merge time: 1 hour, 19 minutes and 26 seconds.
       merge time: 1 hour, 48 minutes and 36 seconds.
       merge time: 1 hour, 38 minutes and 37 seconds.
       merge time: 1 hour, 20 minutes and 20 seconds.
       merge time: 2 hours, 26 minutes and 38 seconds.
       merge time: 21 seconds.
       merge time: 1 hour, 24 minutes and 36 seconds.
       merge time: 1 hour, 15 minutes and 4 seconds.
       merge time: 1 hour, 35 minutes.
       merge time: 1 hour, 28 minutes and 50 seconds.
       merge time: 2 hours, 8 minutes and 32 seconds.
       merge time: 59 minutes and 24 seconds.
       merge time: 1 hour, 11 minutes and 33 seconds.
       merge time: 1 hour, 5 minutes and 52 seconds.
       merge time: 2 hours and 27 seconds.
       merge time: 2 hours, 17 minutes and 30 seconds.
       merge time: 2 hours, 44 minutes and 40 seconds.
       merge time: 2 hours, 54 minutes and 16 seconds.
       merge time: 3 hours, 20 minutes and 9 seconds.
       merge time: 2 hours, 21 minutes and 43 seconds.
       merge time: 1 hour, 58 minutes and 33 seconds.
       merge time: 1 hour, 57 minutes and 28 seconds.
       merge time: 2 hours, 28 minutes and 24 seconds.
       merge time: 2 hours, 1 minute and 49 seconds.
       merge time: 1 hour, 44 minutes and 25 seconds.
       merge time: 1 hour, 55 minutes and 9 seconds.

$ ./genlop -c; genlop -c

 Currently merging 17 out of 18

 * dev-qt/qtwebengine-5.9.6

       current merge time: 49 minutes and 41 seconds.
       ETA: 1 hour, 23 minutes and 46 seconds.

 Currently merging 17 out of 18

 * dev-qt/qtwebengine-5.9.6

       current merge time: 49 minutes and 42 seconds.
       ETA: 58 minutes and 24 seconds.


It yields a much better idea of how long the ebuild is going to build.

Comment 2 Kai Krakow 2018-07-02 06:54:25 UTC

This patch currently shows warnings about uninitialized values if less than 10 merges are in the log for one package. I'm not sure how to easily fix this as I'm no perl dev.

So, feel free to fix the patch. Meanwhile, I'll try to come up with some solution later.

Comment 3 Kai Krakow 2018-09-11 18:40:29 UTC

Created attachment 546668 [details, diff]
genlop: Calculate more accurate merge time (v2)

This v2 patch fixes the perl warnings present with the first patch.

Comment 4 Kai Krakow 2019-07-06 09:56:35 UTC

Created attachment 582016 [details, diff]
v3: Fix use of uninitialized values

Comment 5 email200202 2020-02-15 02:39:07 UTC

I think the best prediction of the merge time is the last merge time. There is a systemic error when averaging the merge time of old versions of a package. Packages tend to get bigger and more complicated with time. In addition, compilation environment hardware and software can vary with time. This is a simple patch to use only the last merge time:

# cat /etc/portage/patches/app-portage/genlop/no-average-time.patch 
--- a/bin/genlop        2020-02-10 21:14:42.211175876 +1100
+++ b/bin/genlop        2020-02-10 21:27:52.347132262 +1100
@@ -744,9 +744,9 @@
                                if (m/^(.*?)\:  ::: completed .*?\) .*\/$ebuild_arg-[0-9].* to \//)
                                {
                                        $e_end = $1;
-                                       $e_count++;
+                                       $e_count = 1; 
                                        &gtime($e_end - $e_start);
-                                       $tm_secondi += ($e_end - $e_start);
+                                       $tm_secondi = ($e_end - $e_start); 
                                }
                        }
                }

Comment 6 Kai Krakow 2020-02-15 11:54:15 UTC

(In reply to email200202 from comment #5)
> I think the best prediction of the merge time is the last merge time. There
> is a systemic error when averaging the merge time of old versions of a
> package. Packages tend to get bigger and more complicated with time. In
> addition, compilation environment hardware and software can vary with time.

This is actually what my patch is about. Instead of using the complete history, it uses the last 10 merges only. Additionally, it discards the worst and best time from the results, leaving only 8 to average.

This is far better in detecting trends and it discards one exceptional good and one exceptional bad measurement because those are usually useless.

> This is a simple patch to use only the last merge time:

I don't think this will work well for the very reasons you mention: compilation environment. Using only the last merge time will not work well in an environment where you run portage in parallel mode: Packages may be compiled as a single upgrade, and next time as part of many with two others in parallel.

I was thinking about calculating an average by simply giving older compile times less and less weight. But this puts too much emphasis one the last run which may be a totally bad prediction for the reason outlined above. I looked at the history of multiple packages in my history and compile time varies by a factor of 2 or more for many.

Thus I decided to go with a short average window and also discard the best and worst result. This predicts much more stable compile and accurate times. It still cannot predict the single and parallel case very good but the average error is much smaller.

My approach should adapt to a new environment within 4-5 package updates which is fine for me: If I change environment, I'm expecting wrong merge time predictions for a while. But with parallel builds, your's is random more or less every time.

From a real world view in production, my solution tends to over-estimate merge times which I prefer for letting it run unattended. A last-merge-only prediction cannot do this, it randomly under- or over-estimates times.

Your solution may work better for non-parallel merges, tho. Maybe it's worth to combine both patches and use your solution when only one package is building at that time? But still your solution can vary too much. Maybe instead of using an 80% average like I did, look at the last 5 merges and select a result via median?

Comment 7 Kai Krakow 2020-02-15 12:09:02 UTC

About your concern with systemic error and packages getting more complex over time: This is true. Maybe compensate for that by discarding not only the worst and best time but 2 worst and 1 best times? OTOH, CPUs and compilers become more efficient over time which contradicts that idea. One of the best improvements is throwing just more RAM at the system and putting portage into a RAM disk: This improved merge times by a reasonable factor for me.

There's also another systemic measurement failure in genlop: For many small packages, the unpack and configure phase takes more time than the build phase itself but genlop only sees packages in the build phase. But still it considers the whole merge time as a single unit. Compile phases can be multi-process and are affected by environment the most while the unpack and configure phases aren't. This can make a big error margin for many smaller packages in a dynamic environment.

Comment 8 email200202 2020-02-16 03:42:50 UTC

We suggest the same idea: averaging the last N merge times. Your N = 10 and my N = 1.

I have only 8 CPU threads and they are all allocated to the compiler. I don't have to worry about parallel jobs in portage.

To get accurate time prediction with parallel portage, we need to know the sequence of the build. This information is only available to the "emerge" command. Without it, we can only guess the min and max merge time.

Actually, I would like to have the merge time function integrated into the "emerge" command. It will show the list of packages to be built, their data download, and merge time.

Comment 9 korte 2022-02-21 17:39:07 UTC

Some ideas about the merge times. Any volunteers with too much free time? ;)

* the more similar the versions are, the more relevant the times should be
* if the compile times are from the last days they should be more relevant than those from 3 month ago

I often run "watch genlop -cn". During that time genlop could collect even more data like ps to analyze the emerge command line and concurrent threads using RAM, CPU and IO, load in general, read the FEATURE and USE flags per package and hardware data like CPU, RAM, powered by battery at that time.

* How about some selflearning AI? ;)

Comment 10 Kai Krakow 2022-07-25 10:59:32 UTC

I've put my patch here:
https://github.com/kakra/genlop/tree/feature/merge-times

> * the more similar the versions are, the more relevant the times should be
> * if the compile times are from the last days they should be more relevant than those from 3 month ago

For a future implementation, I like the idea to consider recent merges more than old merges.


> Actually, I would like to have the merge time function integrated into the "emerge" command.

Also, I prefer the idea of bringing merge time calculation directly into emerge, it should receive some infrastructure logging such information first and exporting the current job phases via a socket. Trying to export this from the generic portage log and comparing with the process table exposes some bugs like genlop cannot reliably consider different slots building at the same time.


For my current patch, I can confirm that it adapts to a new system rather quickly but it's rather bad at predicting single package updates if you usually do big parallel updates. I've switched from an i7-3770K to an i7-12700K and the merge times have been quite accurate within three package updates, even with building with more parallel processes and jobs now.

Comment 11 Joachim Herb 2023-02-02 23:50:00 UTC

When using binary packages and even share these packages between different computers a merge either means compiling the package from source, if no binary package is available or just unpacking/installing a package. This means there can be orders of magnitude different merge times.

Would it be possible to consider the merge type (really build a package or just install it from a binary package) for the estimation of the merge time? This would require to store the build type in the historic data and only evaluate the old merges of correct type for the predcition for the new merge.

Example:

 genlop -t gcc | grep 'merge time'
       merge time: 48 minutes and 19 seconds.
       merge time: 52 minutes and 22 seconds.
       merge time: 1 hour, 20 minutes and 49 seconds.
       merge time: 41 minutes and 4 seconds.
       merge time: 27 seconds.
       merge time: 54 minutes and 12 seconds.
       merge time: 1 hour and 28 seconds.
       merge time: 58 minutes and 22 seconds.
       merge time: 16 seconds.
       merge time: 15 minutes and 7 seconds.
       merge time: 15 seconds.
       merge time: 17 seconds.
...
       merge time: 36 seconds.
       merge time: 36 seconds.
       merge time: 2 hours, 36 minutes and 38 seconds.
       merge time: 22 seconds.
       merge time: 24 seconds.
       merge time: 25 seconds.
       merge time: 25 seconds.

Comment 12 Kai Krakow 2023-07-18 19:03:14 UTC

Closing in favor of emlop.

Comment 13 Sam James archtester

2023-07-18 19:05:46 UTC

Well, it's still in tree, even if you're not using it.

But I'm sorry nobody saw your patch either. I'm happy to look at it now if it still applies.

Comment 14 Kai Krakow 2023-07-18 19:44:57 UTC

It applies, it's in my /etc/portage/patches directory.

The developer of emlop found that an approach using median works even better, and emlop is currently in the process of entering the portage tree:

https://github.com/vincentdephily/emlop/issues/26

Portage could probably better support prediction by more detailed logging each phase and how long it took, and if it even needs compiling or just installs from a binary package.

If you want to merge my patch into genlop, it may need some small fixes, e.g. there's a chance that it may access invalid list indexes and perl will complain loudly about it.

Comment 15 Larry the Git Cow gentoo-dev

2023-07-18 22:09:27 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=784e0cddc47130412c37569deb120e542767f4ca

commit 784e0cddc47130412c37569deb120e542767f4ca
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-07-18 22:06:20 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-07-18 22:09:16 +0000

    app-portage/genlop: add 0.30.11
    
    Closes: https://bugs.gentoo.org/283628
    Closes: https://bugs.gentoo.org/447436
    Closes: https://bugs.gentoo.org/540050
    Closes: https://bugs.gentoo.org/658940
    Closes: https://bugs.gentoo.org/677890
    Closes: https://bugs.gentoo.org/697504
    Signed-off-by: Sam James <sam@gentoo.org>

 app-portage/genlop/Manifest              |  1 +
 app-portage/genlop/genlop-0.30.11.ebuild | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

Comment 16 Sam James archtester

2023-07-18 22:15:05 UTC

Thank you.