Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 905933 - sys-apps/portage-3.0.47 controls at the wrong load average
Summary: sys-apps/portage-3.0.47 controls at the wrong load average
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 184128
  Show dependency tree
 
Reported: 2023-05-08 11:08 UTC by peter@prh.myzen.co.uk
Modified: 2023-05-26 05:36 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info =sys-apps/portage-3.0.47 (emerge.info,7.54 KB, text/plain)
2023-05-08 11:10 UTC, peter@prh.myzen.co.uk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description peter@prh.myzen.co.uk 2023-05-08 11:08:59 UTC
I ran 'emerge -e @world' with EMERGE_DEFAULT_OPTS="--jobs=10 --load-
average=40 ...". It took 350m46s.

Then I ran the same -e with --load-average=40, but no --jobs and no -j specified. That took 351m21s. The load average was controlled at about 72, not 40. I watched it for some time, and even though all three load averages were at 72-75, portage kept on starting more packages.

The machine has 24 threads and 64GB RAM, so how was the 72 figure arrived at?


Reproducible: Always
Comment 1 peter@prh.myzen.co.uk 2023-05-08 11:10:23 UTC
Created attachment 861305 [details]
emerge --info =sys-apps/portage-3.0.47
Comment 2 Enne Eziarc 2023-05-10 18:19:33 UTC
Your --info output says that you're setting --jobs in make.conf; in that case merely removing it from the command line variable assignment won't make it go away.

Load average of 72 looks expected in either case, you're starting up to 10 emerge processes and ending up with 6 CPU-bound ones running, multiplied by the MAKEOPTS="-j12" you have configured (because you didn't bother to specify a load limit there). With that configuration, were you expecting something else to happen?
Comment 3 peter@prh.myzen.co.uk 2023-05-11 08:29:12 UTC
Why is --load-average=40 being ignored?
Comment 4 Enne Eziarc 2023-05-13 17:15:09 UTC
(In reply to peter@prh.myzen.co.uk from comment #3)
> Why is --load-average=40 being ignored?

It's not. emerge is only running six compilation jobs out of a possible ten by the time the system passes load 40, and does not start any more at that point. But by then it's too late, because you did not set a load limit in MAKEOPTS.
Comment 5 peter@prh.myzen.co.uk 2023-05-13 23:51:21 UTC
I'm sorry, but that is just not true. I sat and watched it as it started more packages while the load was at 72, 75, 75. As I said at the beginning, the load was already above the limit when it started yet another job. Numerous times, and the load average stayed at those figures for many minutes.
Comment 6 Enne Eziarc 2023-05-14 18:56:24 UTC
(In reply to peter@prh.myzen.co.uk from comment #5)
> I'm sorry, but that is just not true. I sat and watched it as it started
> more packages while the load was at 72, 75, 75. As I said at the beginning,
> the load was already above the limit when it started yet another job.
> Numerous times, and the load average stayed at those figures for many
> minutes.

Yes, because load average is an *average* - it lags behind by 60 seconds. You're starting n×12 compiler processes and then refusing to understand why your system gets swamped before emerge can react.

This is exactly why everything from the handbook to the make.conf(5) and emerge(1) manpages says you *should use MAKEOPTS* to limit load. GNU Make is doing exactly as you've told it to, as many times as you've told it to, it does not know what EMERGE_DEFAULT_OPTS is. What are you expecting portage to do here, kill -9 random process trees?
Comment 7 peter@prh.myzen.co.uk 2023-05-14 21:54:56 UTC
No, it should just refrain from starting new packages while the load average is above what I've set. I sat for at least half an hour watching while it kept the average between 72 and 75, starting packages to keep the load steady. There was no change in the load averages reported by emerge in all that time.

You're refusing to understand where the problem lies. It is not with make but with portage. What is --load-average for, if it can't restrict the launching of emerge jobs to limit the load, over what must be approaching an hour? Man make.conf says specifically that its purpose is to control the load, but it doesn't do so.

I don't know - why not just remove --load-average altogether, as it seems to do nothing?
Comment 8 Zac Medico gentoo-dev 2023-05-15 22:21:12 UTC
(In reply to peter@prh.myzen.co.uk from comment #5)
> I'm sorry, but that is just not true. I sat and watched it as it started
> more packages while the load was at 72, 75, 75. As I said at the beginning,
> the load was already above the limit when it started yet another job.
> Numerous times, and the load average stayed at those figures for many
> minutes.

The current expectation is for emerge to always have at least one job running, so it's normal for it to start one job if there are no other jobs running, even if the load average exceeds the --load-average setting. I believe this is consistent with the make --load-average documentation, which says:

> no new jobs (commands) should be started if there are others jobs running

I think "if there are no others jobs running" implies that it is always allowed to start at least one job regardless of the load average.
Comment 9 peter@prh.myzen.co.uk 2023-05-23 13:30:20 UTC
Zac said: "I think "if there are no others jobs running" implies that it is always allowed to start at least one job regardless of the load average."

Somebody doesn't understand plain English. Is that you or me?

"If there are no other jobs running" says, perfectly clearly, that it is NOT allowed to start any jobs if any others are running.

Man 5 make.conf has this:

"In order to avoid excess load, the  --load-average option is recommended."

That says to me, perfectly clearly again, that --load-average is intended to limit the load to avoid excess. That is exactly what it is not doing.

The fix is either (1) correct portage so that it does what the man page says it should, or (2) get rid of --load-average altogether and be ready for complaints that the load can no longer be controlled in any useful fashion.

Why is this so hard to understand?
Comment 10 Zac Medico gentoo-dev 2023-05-23 14:38:14 UTC
(In reply to peter@prh.myzen.co.uk from comment #9)
> Zac said: "I think "if there are no others jobs running" implies that it is
> always allowed to start at least one job regardless of the load average."
> 
> Somebody doesn't understand plain English. Is that you or me?
> 
> "If there are no other jobs running" says, perfectly clearly, that it is NOT
> allowed to start any jobs if any others are running.

It sounds to me like we're in agreement here. If there are no other jobs running, then emerge starts one job, regardless of the load average.

> Man 5 make.conf has this:
> 
> "In order to avoid excess load, the  --load-average option is recommended."
> 
> That says to me, perfectly clearly again, that --load-average is intended to
> limit the load to avoid excess. That is exactly what it is not doing.
> 
> The fix is either (1) correct portage so that it does what the man page says
> it should, or (2) get rid of --load-average altogether and be ready for
> complaints that the load can no longer be controlled in any useful fashion.
> 
> Why is this so hard to understand?

Maybe there's some confusion over the definition of a job. Currently, emerge only counts jobs up to the end of the src_install phase. After that, it no longer considers the package to be part of the job count, and it is free to start a single job regardless of the load average.
Comment 11 peter@prh.myzen.co.uk 2023-05-23 14:57:06 UTC
(In reply to Zac Medico from comment #10)

> Maybe there's some confusion over the definition of a job. Currently, emerge
> only counts jobs up to the end of the src_install phase. After that, it no
> longer considers the package to be part of the job count, and it is free to
> start a single job regardless of the load average.

I'm sure there's plenty of confusion, much of it caused by using the one term "job" for different things*, but that's not the issue here. The issue is the system load average being shown as 72, 75, 75 and portage starting new jobs. Repeatedly. If the load has averaged 75 for 15 minutes, there's no case for supposing portage was running short of work and needed to keep starting more packages. In fact I watched it for half an hour, during which time those numbers hardly changed, and that means the load had averaged much more than my limit of 48 for at least 45 minutes.

*  It would be useful to rewrite some of the documents to use different words for (1) a package compilation started by portage and (2) a make invocation launched during such a compilation.
Comment 12 Mike Gilbert gentoo-dev 2023-05-23 15:14:29 UTC
I tried to reproduce this locally, with smaller limits due to smaller hardware.

I have MAKEOPTS="-j6" in make.conf.

I ran the following:

emerge -ev1 --jobs=12 --load-average=8 @world

I observed the load average get as high as 16.

However, Portage stopped starting new jobs whenever the load average was above 8, as expected.
Comment 13 peter@prh.myzen.co.uk 2023-05-26 05:36:58 UTC
(In reply to Mike Gilbert from comment #12)

> Portage stopped starting new jobs whenever the load average was
> above 8, as expected.

I wonder whether I've tripped over one of those unexplained mysteries - things going bump in the night. I cannot reproduce the problem either.

Looks like time to close this bug.