Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 476104

Summary: load-average reschedules immediately after 'fg' SIGCONT
Product: Portage Development Reporter: Tomáš "tpruzina" Pružina (amd64 [ex]AT) <pruzinat>
Component: Core - Interface (emerge)Assignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal Keywords: InVCS
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on: 438650    
Bug Blocks: 472632    
Attachments: Untested 'concept' patch

Description Tomáš "tpruzina" Pružina (amd64 [ex]AT) 2013-07-07 19:49:59 UTC
Emerge reschedules too quickly after returning from SIGTSTP <Ctrl-Z> via fg, causes it to recognize 'idle load' and runs max '--jobs'.

Reproducible: Always

Steps to Reproduce:
1. emerge --load-average=1 --jobs 5 -1 firefox qtcore qtgui pypy gentoo-sources vanilla-sources libreoffice
2. <C-Z>
3. Wait for load to drop 
4. fg
Actual Results:  
After fg, max number of jobs is being run, because load is low.
After few seconds it skyrockets, note the nasty and demanding compilation packages I chose in example.

Expected Results:  
Register SIGCONT, don't schedule new jobs for a few seconds. Reschedule after load from running compilation kicks in.
Comment 1 Zac Medico gentoo-dev 2013-07-07 20:03:43 UTC
(In reply to Tomáš "tpruzina" Pružina (amd64 AT) from comment #0)
> 1. emerge --load-average=1 --jobs 5 -1 firefox qtcore qtgui pypy
> gentoo-sources vanilla-sources libreoffice

Your example seems somewhat contrived. Having such a large difference between your --load-average and --jobs values doesn't make much sense in practice, does it?

Anyway, I suppose that we could add a separate option for dampening of the scheduler. Without such an option, we may not be able to avoid over-dampening in some cases (see bug #438650).
Comment 2 Tomáš "tpruzina" Pružina (amd64 [ex]AT) 2013-07-07 21:44:25 UTC
(In reply to Zac Medico from comment #1):
It indeed doesn't, I was making sure that example is clear enough.
I ran into this when recompiling stable chroot with ~glibc,
when I suddenly needed to convert a video, paused process .. and stuff hit me.

I considered fixing this myself via dirty hack, something like:
1. signal handler registers SIGCONT
2. set self.no_reschedule = true
3. sleep
4. set self.no_reschedule = false

And ofc, I would make schedule instances wait till no_reschedule == true.

Haven't got to do this yet, should be matter of few lines of code I suppose.
Also, I kinda don't like to mess with other peoples code unless I absolutely have to.
Comment 3 Zac Medico gentoo-dev 2013-07-07 21:54:27 UTC
Yeah, using a SIGCONT handler like that make sense. The Scheduler class already handles SIGINT and SIGTERM, so I suppose it can handle SIGCONT as well. The signal handler can save a timestamp in a variable, and we can have the scheduling code check for that and expire it at an appropriate time.
Comment 4 Tomáš "tpruzina" Pružina (amd64 [ex]AT) 2013-07-08 00:01:02 UTC
Created attachment 352820 [details, diff]
Untested 'concept' patch

Created 'concept patch', not really sure this thingy even works since I have not tested it at all.
time.clock() is used instead of time.time to avoid potential problems if system time is changed during emerge.

Perhaps even other functions should be using time.clock instead of time.time, unless time is used to output time to user.
Comment 5 Tomáš "tpruzina" Pružina (amd64 [ex]AT) 2013-07-08 00:43:58 UTC
Oh, patch lacks registering the handler, mea culpa, it's getting late...
Comment 6 Zac Medico gentoo-dev 2013-07-08 03:06:04 UTC
Its not good to sleep in an even-loop driven program like this, because it will starve the event loop. What we really want to do is add some logic to the Scheduler._job_delay() method. Instead of sleeping, is uses self._event_loop.timeout_add() to delay scheduling until a later time.

Also, it looks like time.clock() is deprecated in python3.3. We can simply use time.time() like _job_delay() does. If the current time is less than the previously recorded time (due to system clock adjustement), then it simply returns False (so it doesn't try to dampen scheduling in this case). This type of behavior should suffice in place of time.clock().
Comment 8 Zac Medico gentoo-dev 2013-07-09 21:52:53 UTC
This is fixed in 2.1.12.13 and 2.2.0_alpha188.