Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 538970 - hugepaged blocking userspace (symptom: sys-devel/gcc-4.8.3 hangs during build)
Summary: hugepaged blocking userspace (symptom: sys-devel/gcc-4.8.3 hangs during build)
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: Normal normal (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-05 20:17 UTC by Stuart Longland
Modified: 2015-02-10 19:38 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log.xz (build.log.xz,55.15 KB, application/x-xz)
2015-02-05 20:19 UTC, Stuart Longland
Details
emerge --info (emerge-info.txt,5.17 KB, text/plain)
2015-02-05 20:20 UTC, Stuart Longland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stuart Longland 2015-02-05 20:17:03 UTC
I've resurrected an old Pentium 4 laptop which had a slightly old (untouched in a year) installation of Gentoo on it.  A new enough install that Portage didn't break when I did `emerge --sync` but old enough to present a few challenges.

One is gcc-4.5.3, which is now too old to build many packages (including udev, which is now old enough to block many others).

I believe the machine to be electrically sound, having successfully compiled kernel 3.18 and updated binutils to 2.24.  However, yesterday I tried updating gcc: I typed `emerge sys-devel/gcc`, early in the afternoon.

Before I went home I checked on it and made a note of where the build was by typing a bash comment into the console: since `ebuild` doesn't take user input, the bash comment would remain in the stdin buffer, and would initially disappear past the scrollback buffer as more output was generated then wind up being fed to the shell if the build completed.  I came in this morning, and find my bash comment is still on screen, with the build still supposedly "compiling".

The CPU is idle and cc1plus is seemingly hung.  I'm trying to build gcc-4.7 in the meantime.

Reproducible: Always

Steps to Reproduce:
1. emerge gcc

Actual Results:  
Build gets to a point then hangs.

Expected Results:  
gcc compiles, allowing me to select it as the default gcc instance.

Machine is just running console at the moment.  I've disabled the `xdm` boot service.
Comment 1 Stuart Longland 2015-02-05 20:19:54 UTC
Created attachment 395634 [details]
build.log.xz

Build log.  Note the sudden build failure is when I came across the machine dawdling rather than compiling as alledged.

I've compressed it with XZ since it otherwise exceeds the 1MB limit.
Comment 2 Stuart Longland 2015-02-05 20:20:13 UTC
Created attachment 395636 [details]
emerge --info
Comment 3 Jeroen Roovers (RETIRED) gentoo-dev 2015-02-06 09:20:59 UTC
There is no telling now what is was doing when you stopped it. It might as well have been trying to allocate more RAM, which may or may not be visible. You might want to try again, perhaps without -pipe in your C*FLAGS so it uses less memory. And then analysing the situation instead of killing the build.
Comment 4 Stuart Longland 2015-02-10 01:47:22 UTC
Okay, further digging, xgcc (as built by the gcc build system) was not the cause but rather a victim.  sysrq pointed the way.

At first it seemed to be just gcc doing this which is why I suspected it, but then I noticed other things hanging, like the `xz` step of the kernel when building a bzImage with XZ compression.

The following is from the kernel log:

Feb 10 10:39:47 [kernel] [62427.230362] SysRq : Show Blocked State
Feb 10 10:39:47 [kernel] [62427.230420]   task                PC stack   pid father
Feb 10 10:39:47 [kernel] [62427.230430] khugepaged      D 99dd3c07     0    19      2 0x00000000
Feb 10 10:39:47 [kernel] [62427.230438]  f685f0d0 00000046 0000000b 99dd3c07 000038c5 f685f0d0 f6887fec f68392bc
Feb 10 10:39:47 [kernel] [62427.230447]  f6887c6c c1434e2c 026a0f08 f68392bc 0000ee08 f68392a4 f68392bc f6887c6c
Feb 10 10:39:47 [kernel] [62427.230455]  00d4160c 0000007b f688007b c1000000 00000000 ffffffc4 00000282 c16b2a80
Feb 10 10:39:47 [kernel] [62427.230463] Call Trace:
Feb 10 10:39:47 [kernel] [62427.230478]  [<c1434e2c>] ? common_interrupt+0x2c/0x34
Feb 10 10:39:47 [kernel] [62427.230488]  [<c1434016>] ? schedule_timeout+0xc7/0xe5
Feb 10 10:39:47 [kernel] [62427.230497]  [<c1048817>] ? timer_cpu_notify+0x112/0x112
Feb 10 10:39:47 [kernel] [62427.230503]  [<c1432ed8>] ? io_schedule_timeout+0x5f/0x9c
Feb 10 10:39:47 [kernel] [62427.230511]  [<c108659f>] ? congestion_wait+0x52/0x7a
Feb 10 10:39:47 [kernel] [62427.230519]  [<c103b053>] ? __wake_up_sync+0x9/0x9
Feb 10 10:39:47 [kernel] [62427.230528]  [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea
Feb 10 10:39:47 [kernel] [62427.230534]  [<c1081439>] ? shrink_zone+0x339/0x50f
Feb 10 10:39:47 [kernel] [62427.230542]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230547]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230553]  [<c1082178>] ? try_to_free_pages+0x1db/0x1fe
Feb 10 10:39:47 [kernel] [62427.230559]  [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137
Feb 10 10:39:47 [kernel] [62427.230566]  [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec
Feb 10 10:39:47 [kernel] [62427.230574]  [<c10a0a47>] ? khugepaged+0x87/0xa7c
Feb 10 10:39:47 [kernel] [62427.230581]  [<c103b053>] ? __wake_up_sync+0x9/0x9
Feb 10 10:39:47 [kernel] [62427.230587]  [<c10a09c0>] ? maybe_pmd_mkwrite+0xd/0xd
Feb 10 10:39:47 [kernel] [62427.230593]  [<c1032b0a>] ? kthread+0xa0/0xa5
Feb 10 10:39:47 [kernel] [62427.230599]  [<c1434740>] ? ret_from_kernel_thread+0x20/0x30
Feb 10 10:39:47 [kernel] [62427.230605]  [<c1032a6a>] ? kthread_freezable_should_stop+0x3a/0x3a
Feb 10 10:39:47 [kernel] [62427.230661] emerge          D f3446b80     0 12607   4899 0x00000000
Feb 10 10:39:47 [kernel] [62427.230667]  cf927540 00200086 0000000b f3446b80 000038c6 cf927540 f23f3fec f3cb16ac
Feb 10 10:39:47 [kernel] [62427.230675]  f23f3c44 c1434e2c 026936d8 f3cb16ac 0000ee08 f3cb1694 f3cb16ac f23f3c44
Feb 10 10:39:47 [kernel] [62427.230682]  00ece686 ffff007b 0000007b 00000000 00000000 ffffffc4 00200282 c16b2a80
Feb 10 10:39:47 [kernel] [62427.230690] Call Trace:
Feb 10 10:39:47 [kernel] [62427.230697]  [<c1434e2c>] ? common_interrupt+0x2c/0x34
Feb 10 10:39:47 [kernel] [62427.230703]  [<c1434016>] ? schedule_timeout+0xc7/0xe5
Feb 10 10:39:47 [kernel] [62427.230709]  [<c1048817>] ? timer_cpu_notify+0x112/0x112
Feb 10 10:39:47 [kernel] [62427.230715]  [<c1432ed8>] ? io_schedule_timeout+0x5f/0x9c
Feb 10 10:39:47 [kernel] [62427.230721]  [<c108659f>] ? congestion_wait+0x52/0x7a
Feb 10 10:39:47 [kernel] [62427.230728]  [<c103b053>] ? __wake_up_sync+0x9/0x9
Feb 10 10:39:47 [kernel] [62427.230734]  [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea
Feb 10 10:39:47 [kernel] [62427.230740]  [<c1081439>] ? shrink_zone+0x339/0x50f
Feb 10 10:39:47 [kernel] [62427.230748]  [<c1088a09>] ? compaction_suitable+0x27/0x77
Feb 10 10:39:47 [kernel] [62427.230753]  [<c1088a09>] ? compaction_suitable+0x27/0x77
Feb 10 10:39:47 [kernel] [62427.230760]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230766]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230772]  [<c1082178>] ? try_to_free_pages+0x1db/0x1fe
Feb 10 10:39:47 [kernel] [62427.230778]  [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137
Feb 10 10:39:47 [kernel] [62427.230784]  [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec
Feb 10 10:39:47 [kernel] [62427.230791]  [<c10a24b6>] ? do_huge_pmd_wp_page+0x10c/0x461
Feb 10 10:39:47 [kernel] [62427.230797]  [<c108c930>] ? do_wp_page.isra.90+0x451/0x4b0
Feb 10 10:39:47 [kernel] [62427.230802]  [<c108dc58>] ? handle_mm_fault+0xfc/0x6f8
Feb 10 10:39:47 [kernel] [62427.230810]  [<c10355aa>] ? check_preempt_curr+0x1f/0x59
Feb 10 10:39:47 [kernel] [62427.230817]  [<c101e5e9>] ? __do_page_fault+0x429/0x467
Feb 10 10:39:47 [kernel] [62427.230822]  [<c101e627>] ? __do_page_fault+0x467/0x467
Feb 10 10:39:47 [kernel] [62427.230828]  [<c1434fb8>] ? error_code+0x58/0x60
Feb 10 10:39:47 [kernel] [62427.230833]  [<c101e627>] ? __do_page_fault+0x467/0x467
Feb 10 10:39:47 [kernel] [62427.230840]  [<c119786d>] ? __put_user_4+0x19/0x24
Feb 10 10:39:47 [kernel] [62427.230846]  [<c101e627>] ? __do_page_fault+0x467/0x467
Feb 10 10:39:47 [kernel] [62427.230851]  [<c1434fb8>] ? error_code+0x58/0x60
Feb 10 10:39:47 [kernel] [62427.230856]  [<c101e627>] ? __do_page_fault+0x467/0x467
Feb 10 10:39:47 [kernel] [62427.230859] xz              D 76190b24     0 15088      1 0x00000004
Feb 10 10:39:47 [kernel] [62427.230864]  cf925630 00000086 00000046 76190b24 000038c6 cf925630 c1e89fec 02684fee
Feb 10 10:39:47 [kernel] [62427.230872]  00000064 c15cee08 c1e89c64 c1434e2c 02684fee 00000002 0000ee08 00000064
Feb 10 10:39:47 [kernel] [62427.230888] Call Trace:
Feb 10 10:39:47 [kernel] [62427.230918]  [<c108659f>] ? congestion_wait+0x52/0x7a
Feb 10 10:39:47 [kernel] [62427.230923]  [<c103b053>] ? __wake_up_sync+0x9/0x9
Feb 10 10:39:47 [kernel] [62427.230930]  [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea
Feb 10 10:39:47 [kernel] [62427.230936]  [<c1081439>] ? shrink_zone+0x339/0x50f
Feb 10 10:39:47 [kernel] [62427.230943]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230949]  [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5
Feb 10 10:39:47 [kernel] [62427.230955]  [<c1082178>] ? try_to_free_pages+0x1db/0x1fe
Feb 10 10:39:47 [kernel] [62427.230961]  [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137
Feb 10 10:39:47 [kernel] [62427.230967]  [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec
Feb 10 10:39:47 [kernel] [62427.230974]  [<c10a200a>] ? do_huge_pmd_anonymous_page+0x168/0x270
Feb 10 10:39:47 [kernel] [62427.230980]  [<c108dc11>] ? handle_mm_fault+0xb5/0x6f8
Feb 10 10:39:47 [kernel] [62427.230986]  [<c101e5e9>] ? __do_page_fault+0x429/0x467
Feb 10 10:39:47 [kernel] [62427.230991]  [<c1044d8e>] ? handle_irq_event_percpu+0xc3/0xd3
Feb 10 10:39:47 [kernel] [62427.230997]  [<c1046577>] ? unmask_irq+0x11/0x1a
Feb 10 10:39:47 [kernel] [62427.231002]  [<c1046740>] ? handle_level_irq+0x82/0x84
Feb 10 10:39:47 [kernel] [62427.231008]  [<c1002956>] ? do_IRQ+0x6f/0x7f
Feb 10 10:39:47 [kernel] [62427.231013]  [<c101e627>] ? __do_page_fault+0x467/0x467
Feb 10 10:39:47 [kernel] [62427.231018]  [<c1434fb8>] ? error_code+0x58/0x60
Feb 10 10:39:47 [kernel] [62427.231024]  [<c101e627>] ? __do_page_fault+0x467/0x467

I'm investigating a possible culprit in the kernel configuration: CONFIG_TRANSPARENT_HUGEPAGE

I hadn't myself set this, but it may have been set during an update of the kernel.  Prior to me dusting this machine off it ran kernel 3.7.10, and I simply moved .config out of the way, did a `make mrproper` then fetched and checked out a newer branch of the "stable" kernel tree, moved .config back and did a `make olddefconfig`.

I'll continue to post what I find in case anyone else reports a similar issue and goes hunting for the cause.  It may be worth investigating and perhaps suggesting in the handbook that people check some of these settings to ensure such nasties like the one above don't occur.
Comment 5 Stuart Longland 2015-02-10 19:38:09 UTC
Okay, so ½ a day later, the machine's still chugging along building packages and NOT stalled, so I think I found the culprit.

If anyone finds processes (particularly cc1plus, etc) stalling on an older machine with a modest amount of RAM (in my case, a P4 with 1GB), this is one flag in the kernel configuration they might want to check as it seems to be the default.