Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 394487

Summary: >=sys-kernel/gentoo-sources-3.0.7 - Wrong/impossible cputimes for kernel processes.
Product: Gentoo Linux Reporter: Gunnar Eggen <geggen54>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED FIXED    
Severity: normal CC: bes.internal, bobbykent32, david+gentoo.org, dogshu, roberto.passuello, ross, th-gen
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
URL: https://bugzilla.kernel.org/show_bug.cgi?id=47341
Whiteboard: linux-3.6.11
Package list:
Runtime testing required: ---
Attachments: git bisect log
Kernel config used
emerge --info

Description Gunnar Eggen 2011-12-12 21:52:35 UTC
The cputimes/cpuload reported from top/ps etc. report extreme loads on kernel processes migration/x.   The problem isn't permanent as it takes a while to set off, but once it does it never stops.  

I am unable to locate what's triggering it, but I did a bisect and found the commit at fault: 
commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 - posix-cpu-timers: Cure SMP wobbles.

The load reported doesn't seem real, as there's no sign that the system is busy or unresponsive, but  it seems like cpu frequency-scaling (ondemand) is kicking in and never scales the frequency down again.


Reproducible: Always

Steps to Reproduce:
1. Boot normally with kernel >= 3.0.7
2. Wait ~1 hour (unable to figure out what sets it off. Idle vs full load doesn't matter)

Actual Results:  
Cpuload/cputime on some migration/x kernel processes show a constant load between 20-100%.  Cpu frequency scaling (ondemand) seems to be affected as it never scales down the cores/threads affected. 

Expected Results:  
Report normal load on migration processes. Allow ondemand to scale down.
Comment 1 Gunnar Eggen 2011-12-12 21:54:20 UTC
Created attachment 295605 [details]
git bisect log
Comment 2 Gunnar Eggen 2011-12-12 21:55:26 UTC
Created attachment 295607 [details]
Kernel config used
Comment 3 Gunnar Eggen 2011-12-12 21:56:41 UTC
Created attachment 295609 [details]
emerge --info
Comment 4 Gunnar Eggen 2011-12-14 22:26:36 UTC
I see a discussion regarding this is referenced in the bug now. https://lkml.org/lkml/2011/10/12/382

Is the proposed patch accepted?  I tested 3.2_rc5, and the problem persists but I'm unable to determine if the patch is included or not.

I find it strange that this problem isn't more widespread.  I'm experiencing this on two different machines (AMD64), but not on the third which is X86.   Tried different configs including default ones with bare minimum of drivers, but still no change.
Comment 5 Mike Pagano gentoo-dev 2012-03-14 14:50:46 UTC
I see some changes around this area. Can you test with gentoo-sources-3.2.11 ?
Comment 6 Gunnar Eggen 2012-03-17 15:40:05 UTC
Unfortunately I get the same results using 3.6.11.
Comment 7 r01 2012-05-15 07:57:53 UTC
(In reply to comment #6)
> Unfortunately I get the same results using 3.6.11.

As of today, latest from kernel.org is 3.3.6; how did you get 3.6.??


I'm running from 3.2-rcX to now 3.4-rc7; the [migration/0] is using 75% pcpu:


[root@archtop ~]# ps fuww -e |awk 'NR <=2 || /migration/'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    May10   0:00 [kthreadd]
root         6 76.1  0.0      0     0 ?        S    May10 4649:06  \_ [migration/0]
root       982  0.0  0.0      0     0 ?        S    May14   0:00  \_ [migration/1]
root       985  0.3  0.0      0     0 ?        S    May14   0:39  \_ [migration/2]
root       988  0.0  0.0      0     0 ?        S    May14   0:00  \_ [migration/3]
Comment 8 r01 2012-05-15 08:00:27 UTC
And:

# ps ax -o pid,comm,pcpu | grep "migration"
    6 migration/0     76.1
  982 migration/1      0.0
  985 migration/2      0.3
  988 migration/3      0.0
Comment 9 Gunnar Eggen 2012-05-16 15:01:43 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Unfortunately I get the same results using 3.6.11.
> 
> As of today, latest from kernel.org is 3.3.6; how did you get 3.6.??
> 

I was running 3.2.11 at the time.  My bad.

Recent versions show no improvement on the issue, but I've turned off tickless system (NO_HZ = n), and that "fixed" it.

Still hoping for some real fix eventually though...
Comment 10 swanpoint 2012-07-23 14:56:05 UTC
I tried the NO_NZ=n route and still see unreal time's on the migration kernel threads..

# uname -r -v -m
3.2.12-gentoo #1 SMP Tue Jul 17 09:41:18 EDT 2012 x86_64

# uptime
 10:53:28 up 4 days, 23:04,  1 user,  load average: 0.14, 0.23, 0.24

# ps uax|grep -e US[E]R -e migra[t]ion
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         6  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/0]
root         7 44.2  0.0      0     0 ?        S    Jul18 3160:23 [migration/1]
root        11  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/2]
root        14 67.0  0.0      0     0 ?        S    Jul18 4789:14 [migration/3]
root        17  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/4]
root        20  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/5]
root        23  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/6]
root        26  0.0  0.0      0     0 ?        S    Jul18   0:00 [migration/7]

# zcat /proc/config.gz |grep NO_HZ
# CONFIG_NO_HZ is not set
Comment 11 BobbyK 2012-08-11 17:05:22 UTC
Seems the issue is present in 3.4.4 also:

# uname -r -v -m ; uptime ; ps uax|grep -e US[E]R -e migra[t]ion ; zcat /proc/config.gz |grep NO_HZ

3.4.4-gentoo #7 SMP Sat Jul 14 15:15:33 EDT 2012 x86_64

 13:02:23 up 16 days, 22:22,  9 users,  load average: 3.60, 3.95, 4.47

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         6 64.1  0.0      0     0 ?        S    Jul25 15636:28 [migration/0]
root         7 89.8  0.0      0     0 ?        S    Jul25 21911:32 [migration/1]
root        11 95.5  0.0      0     0 ?        S    Jul25 23304:44 [migration/2]
root        14 90.3  0.0      0     0 ?        S    Jul25 22021:08 [migration/3]
root        17 81.6  0.0      0     0 ?        S    Jul25 19900:55 [migration/4]
root        20 95.9  0.0      0     0 ?        S    Jul25 23387:25 [migration/5]
root        23 99.6  0.0      0     0 ?        S    Jul25 24305:11 [migration/6]
root        26 82.2  0.0      0     0 ?        S    Jul25 20052:41 [migration/7]
root        29 51.9  0.0      0     0 ?        S    Jul25 12666:22 [migration/8]
root        32 91.8  0.0      0     0 ?        S    Jul25 22387:41 [migration/9]
root        35 99.3  0.0      0     0 ?        S    Jul25 24229:26 [migration/10]
root        38 94.8  0.0      0     0 ?        S    Jul25 23136:40 [migration/11]

# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_NO_HZ=y

Perhaps I should disable tickless to see if it helps ...
Comment 12 ross 2012-08-12 18:22:29 UTC
While I was staring at 'top -d5' (and sorted by total time T) the migration/15 process on the very top showed exactly 3200% CPU usage for at least one cycle and then it was back to zero.  This was suspicious since I have never before witnessed any CPU usage by migration processes during 5-second intervals, but more and more migration processes keep collecting at the top of the listing based on total time.  Also, I have 32 processors and a load average above 20, but the load is completely explainable from normal activities (not the migration processes).  I have several processes running at 100%, but they are no longer at the top of the list.  Hope this helps.

ross@wilson:~$ uname -a
Linux wilson 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux

Also, I'm running Debian Wheezy, but didn't see this bug in their tracking, so I attached myself here.
Comment 13 ross 2012-08-12 18:31:04 UTC
(In reply to comment #12)

top doesn't show CPU usage, but ps does:

root@wilson:~# uname -r -v -m ; uptime ; ps uax|grep -e US[E]R -e migra[t]ion
3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64
 13:27:23 up 2 days,  5:53, 33 users,  load average: 22.09, 21.61, 21.74
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         6 89.7  0.0      0     0 ?        S    Aug10 2903:49 [migration/0]
root         8 79.5  0.0      0     0 ?        S    Aug10 2573:33 [migration/1]
root        13  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/2]
root        17  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/3]
root        21  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/4]
root        25  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/5]
root        29  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/6]
root        33  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/7]
root        37  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/8]
root        41 79.0  0.0      0     0 ?        S    Aug10 2554:48 [migration/9]
root        45  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/10]
root        49 55.4  0.0      0     0 ?        S    Aug10 1792:50 [migration/11]
root        53  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/12]
root        57 66.5  0.0      0     0 ?        S    Aug10 2152:52 [migration/13]
root        61 74.3  0.0      0     0 ?        S    Aug10 2405:08 [migration/14]
root        65 98.6  0.0      0     0 ?        S    Aug10 3189:35 [migration/15]
root        69 89.7  0.0      0     0 ?        S    Aug10 2903:31 [migration/16]
root        73  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/17]
root        78  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/18]
root        89 97.5  0.0      0     0 ?        S    Aug10 3153:28 [migration/19]
root        93 84.3  0.0      0     0 ?        S    Aug10 2727:58 [migration/20]
root        97  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/21]
root       101  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/22]
root       105  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/23]
root       109 94.3  0.0      0     0 ?        S    Aug10 3050:03 [migration/24]
root       113  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/25]
root       117 78.7  0.0      0     0 ?        S    Aug10 2545:33 [migration/26]
root       121 76.2  0.0      0     0 ?        S    Aug10 2465:08 [migration/27]
root       129  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/28]
root       139  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/29]
root       143  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/30]
root       147  0.0  0.0      0     0 ?        S    Aug10   0:00 [migration/31]
Comment 14 BobbyK 2012-08-14 21:52:16 UTC
Disabling tickless seems to have done the trick, which is nice :-)

# uname -r -v -m ; uptime ; ps uax|grep -e US[E]R -e migra[t]ion ; zcat /proc/config.gz |grep NO_HZ

3.4.4-gentoo #8 SMP Sat Aug 11 14:19:27 EDT 2012 x86_64

 17:49:42 up 3 days,  3:13,  9 users,  load average: 4.69, 4.80, 4.97

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         6  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/0]
root         7  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/1]
root        11  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/2]
root        14  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/3]
root        17  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/4]
root        20  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/5]
root        23  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/6]
root        26  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/7]
root        29  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/8]
root        32  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/9]
root        35  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/10]
root        38  0.0  0.0      0     0 ?        S    Aug11   0:00 [migration/11]

# CONFIG_NO_HZ is not set
Comment 15 BobbyK 2012-08-26 15:41:48 UTC
Blast, they're back with 3.4.9:

# uname -r -v -m ; uptime ; ps uax|grep -e US[E]R -e migra[t]ion ; zcat /proc/config.gz |grep NO_HZ

3.4.9-gentoo #1 SMP Fri Aug 24 08:38:41 EDT 2012 x86_64

 11:40:06 up 2 days,  2:37,  9 users,  load average: 3.71, 3.87, 4.12

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         6  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/0]
root         7 50.5  0.0      0     0 ?        S    Aug24 1534:49 [migration/1]
root        11  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/2]
root        14  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/3]
root        17  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/4]
root        20  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/5]
root        23  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/6]
root        26  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/7]
root        29 65.4  0.0      0     0 ?        S    Aug24 1986:42 [migration/8]
root        32 77.2  0.0      0     0 ?        S    Aug24 2347:25 [migration/9]
root        35  0.0  0.0      0     0 ?        S    Aug24   0:00 [migration/10]
root        38 87.0  0.0      0     0 ?        S    Aug24 2643:40 [migration/11]

# CONFIG_NO_HZ is not set
Comment 16 Mike Pagano gentoo-dev 2012-09-07 21:32:10 UTC
Can you please bring this issue upstream at http://bugzilla.kernel.org and post the url to the back back here.
Comment 17 Gunnar Eggen 2012-09-10 19:56:47 UTC
I just reported it here: https://bugzilla.kernel.org/show_bug.cgi?id=47341
Comment 18 Mike Pagano gentoo-dev 2012-10-10 12:47:44 UTC
ok, thanks. Watching the upstream bug, and we'll attempt to backport any fix identified.
Comment 19 BobbyK 2013-01-06 16:16:44 UTC
Nothing much seems to have happened in the upstream, though 3.6.11 appears to be cured:

# uname -r -v -m ; uptime ; ps uax|grep -e US[E]R -e migra[t]ion ; zcat /proc/config.gz |grep NO_HZ

3.6.11-gentoo #1 SMP Mon Dec 24 08:29:21 EST 2012 x86_64

 10:33:54 up 13 days, 38 min,  9 users,  load average: 5.26, 4.81, 4.26

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         8  0.0  0.0      0     0 ?        S     2012   3:03 [migration/0]
root         9  0.0  0.0      0     0 ?        S     2012   1:32 [migration/1]
root        13  0.0  0.0      0     0 ?        S     2012   1:30 [migration/2]
root        17  0.0  0.0      0     0 ?        S     2012   2:26 [migration/3]
root        21  0.0  0.0      0     0 ?        S     2012   2:08 [migration/4]
root        25  0.0  0.0      0     0 ?        S     2012   1:44 [migration/5]
root        29  0.0  0.0      0     0 ?        S     2012   1:09 [migration/6]
root        33  0.0  0.0      0     0 ?        S     2012   1:03 [migration/7]
root        37  0.0  0.0      0     0 ?        S     2012   1:09 [migration/8]
root        41  0.0  0.0      0     0 ?        S     2012   1:06 [migration/9]
root        45  0.0  0.0      0     0 ?        S     2012   0:54 [migration/10]
root        49  0.0  0.0      0     0 ?        S     2012   1:08 [migration/11]

# CONFIG_NO_HZ is not set

I may try re-enabling tickless ...
Comment 20 Gunnar Eggen 2013-01-12 01:22:16 UTC
You're right. >= 3.6.11 has fixed the issue, and I've tried both with and without CONFIG_NO_HZ.