Summary: | hugepaged blocking userspace (symptom: sys-devel/gcc-4.8.3 hangs during build) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Stuart Longland <stuartl> |
Component: | [OLD] Core system | Assignee: | Gentoo Linux bug wranglers <bug-wranglers> |
Status: | RESOLVED INVALID | ||
Severity: | normal | ||
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
build.log.xz
emerge --info |
Description
Stuart Longland
2015-02-05 20:17:03 UTC
Created attachment 395634 [details]
build.log.xz
Build log. Note the sudden build failure is when I came across the machine dawdling rather than compiling as alledged.
I've compressed it with XZ since it otherwise exceeds the 1MB limit.
Created attachment 395636 [details]
emerge --info
There is no telling now what is was doing when you stopped it. It might as well have been trying to allocate more RAM, which may or may not be visible. You might want to try again, perhaps without -pipe in your C*FLAGS so it uses less memory. And then analysing the situation instead of killing the build. Okay, further digging, xgcc (as built by the gcc build system) was not the cause but rather a victim. sysrq pointed the way. At first it seemed to be just gcc doing this which is why I suspected it, but then I noticed other things hanging, like the `xz` step of the kernel when building a bzImage with XZ compression. The following is from the kernel log: Feb 10 10:39:47 [kernel] [62427.230362] SysRq : Show Blocked State Feb 10 10:39:47 [kernel] [62427.230420] task PC stack pid father Feb 10 10:39:47 [kernel] [62427.230430] khugepaged D 99dd3c07 0 19 2 0x00000000 Feb 10 10:39:47 [kernel] [62427.230438] f685f0d0 00000046 0000000b 99dd3c07 000038c5 f685f0d0 f6887fec f68392bc Feb 10 10:39:47 [kernel] [62427.230447] f6887c6c c1434e2c 026a0f08 f68392bc 0000ee08 f68392a4 f68392bc f6887c6c Feb 10 10:39:47 [kernel] [62427.230455] 00d4160c 0000007b f688007b c1000000 00000000 ffffffc4 00000282 c16b2a80 Feb 10 10:39:47 [kernel] [62427.230463] Call Trace: Feb 10 10:39:47 [kernel] [62427.230478] [<c1434e2c>] ? common_interrupt+0x2c/0x34 Feb 10 10:39:47 [kernel] [62427.230488] [<c1434016>] ? schedule_timeout+0xc7/0xe5 Feb 10 10:39:47 [kernel] [62427.230497] [<c1048817>] ? timer_cpu_notify+0x112/0x112 Feb 10 10:39:47 [kernel] [62427.230503] [<c1432ed8>] ? io_schedule_timeout+0x5f/0x9c Feb 10 10:39:47 [kernel] [62427.230511] [<c108659f>] ? congestion_wait+0x52/0x7a Feb 10 10:39:47 [kernel] [62427.230519] [<c103b053>] ? __wake_up_sync+0x9/0x9 Feb 10 10:39:47 [kernel] [62427.230528] [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea Feb 10 10:39:47 [kernel] [62427.230534] [<c1081439>] ? shrink_zone+0x339/0x50f Feb 10 10:39:47 [kernel] [62427.230542] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230547] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230553] [<c1082178>] ? try_to_free_pages+0x1db/0x1fe Feb 10 10:39:47 [kernel] [62427.230559] [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137 Feb 10 10:39:47 [kernel] [62427.230566] [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec Feb 10 10:39:47 [kernel] [62427.230574] [<c10a0a47>] ? khugepaged+0x87/0xa7c Feb 10 10:39:47 [kernel] [62427.230581] [<c103b053>] ? __wake_up_sync+0x9/0x9 Feb 10 10:39:47 [kernel] [62427.230587] [<c10a09c0>] ? maybe_pmd_mkwrite+0xd/0xd Feb 10 10:39:47 [kernel] [62427.230593] [<c1032b0a>] ? kthread+0xa0/0xa5 Feb 10 10:39:47 [kernel] [62427.230599] [<c1434740>] ? ret_from_kernel_thread+0x20/0x30 Feb 10 10:39:47 [kernel] [62427.230605] [<c1032a6a>] ? kthread_freezable_should_stop+0x3a/0x3a Feb 10 10:39:47 [kernel] [62427.230661] emerge D f3446b80 0 12607 4899 0x00000000 Feb 10 10:39:47 [kernel] [62427.230667] cf927540 00200086 0000000b f3446b80 000038c6 cf927540 f23f3fec f3cb16ac Feb 10 10:39:47 [kernel] [62427.230675] f23f3c44 c1434e2c 026936d8 f3cb16ac 0000ee08 f3cb1694 f3cb16ac f23f3c44 Feb 10 10:39:47 [kernel] [62427.230682] 00ece686 ffff007b 0000007b 00000000 00000000 ffffffc4 00200282 c16b2a80 Feb 10 10:39:47 [kernel] [62427.230690] Call Trace: Feb 10 10:39:47 [kernel] [62427.230697] [<c1434e2c>] ? common_interrupt+0x2c/0x34 Feb 10 10:39:47 [kernel] [62427.230703] [<c1434016>] ? schedule_timeout+0xc7/0xe5 Feb 10 10:39:47 [kernel] [62427.230709] [<c1048817>] ? timer_cpu_notify+0x112/0x112 Feb 10 10:39:47 [kernel] [62427.230715] [<c1432ed8>] ? io_schedule_timeout+0x5f/0x9c Feb 10 10:39:47 [kernel] [62427.230721] [<c108659f>] ? congestion_wait+0x52/0x7a Feb 10 10:39:47 [kernel] [62427.230728] [<c103b053>] ? __wake_up_sync+0x9/0x9 Feb 10 10:39:47 [kernel] [62427.230734] [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea Feb 10 10:39:47 [kernel] [62427.230740] [<c1081439>] ? shrink_zone+0x339/0x50f Feb 10 10:39:47 [kernel] [62427.230748] [<c1088a09>] ? compaction_suitable+0x27/0x77 Feb 10 10:39:47 [kernel] [62427.230753] [<c1088a09>] ? compaction_suitable+0x27/0x77 Feb 10 10:39:47 [kernel] [62427.230760] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230766] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230772] [<c1082178>] ? try_to_free_pages+0x1db/0x1fe Feb 10 10:39:47 [kernel] [62427.230778] [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137 Feb 10 10:39:47 [kernel] [62427.230784] [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec Feb 10 10:39:47 [kernel] [62427.230791] [<c10a24b6>] ? do_huge_pmd_wp_page+0x10c/0x461 Feb 10 10:39:47 [kernel] [62427.230797] [<c108c930>] ? do_wp_page.isra.90+0x451/0x4b0 Feb 10 10:39:47 [kernel] [62427.230802] [<c108dc58>] ? handle_mm_fault+0xfc/0x6f8 Feb 10 10:39:47 [kernel] [62427.230810] [<c10355aa>] ? check_preempt_curr+0x1f/0x59 Feb 10 10:39:47 [kernel] [62427.230817] [<c101e5e9>] ? __do_page_fault+0x429/0x467 Feb 10 10:39:47 [kernel] [62427.230822] [<c101e627>] ? __do_page_fault+0x467/0x467 Feb 10 10:39:47 [kernel] [62427.230828] [<c1434fb8>] ? error_code+0x58/0x60 Feb 10 10:39:47 [kernel] [62427.230833] [<c101e627>] ? __do_page_fault+0x467/0x467 Feb 10 10:39:47 [kernel] [62427.230840] [<c119786d>] ? __put_user_4+0x19/0x24 Feb 10 10:39:47 [kernel] [62427.230846] [<c101e627>] ? __do_page_fault+0x467/0x467 Feb 10 10:39:47 [kernel] [62427.230851] [<c1434fb8>] ? error_code+0x58/0x60 Feb 10 10:39:47 [kernel] [62427.230856] [<c101e627>] ? __do_page_fault+0x467/0x467 Feb 10 10:39:47 [kernel] [62427.230859] xz D 76190b24 0 15088 1 0x00000004 Feb 10 10:39:47 [kernel] [62427.230864] cf925630 00000086 00000046 76190b24 000038c6 cf925630 c1e89fec 02684fee Feb 10 10:39:47 [kernel] [62427.230872] 00000064 c15cee08 c1e89c64 c1434e2c 02684fee 00000002 0000ee08 00000064 Feb 10 10:39:47 [kernel] [62427.230888] Call Trace: Feb 10 10:39:47 [kernel] [62427.230918] [<c108659f>] ? congestion_wait+0x52/0x7a Feb 10 10:39:47 [kernel] [62427.230923] [<c103b053>] ? __wake_up_sync+0x9/0x9 Feb 10 10:39:47 [kernel] [62427.230930] [<c1080e8f>] ? shrink_inactive_list+0x79/0x2ea Feb 10 10:39:47 [kernel] [62427.230936] [<c1081439>] ? shrink_zone+0x339/0x50f Feb 10 10:39:47 [kernel] [62427.230943] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230949] [<c1081e03>] ? do_try_to_free_pages+0x1ba/0x2e5 Feb 10 10:39:47 [kernel] [62427.230955] [<c1082178>] ? try_to_free_pages+0x1db/0x1fe Feb 10 10:39:47 [kernel] [62427.230961] [<c142f3bb>] ? __alloc_pages_direct_compact+0x40/0x137 Feb 10 10:39:47 [kernel] [62427.230967] [<c107bc62>] ? __alloc_pages_nodemask+0x4b5/0x6ec Feb 10 10:39:47 [kernel] [62427.230974] [<c10a200a>] ? do_huge_pmd_anonymous_page+0x168/0x270 Feb 10 10:39:47 [kernel] [62427.230980] [<c108dc11>] ? handle_mm_fault+0xb5/0x6f8 Feb 10 10:39:47 [kernel] [62427.230986] [<c101e5e9>] ? __do_page_fault+0x429/0x467 Feb 10 10:39:47 [kernel] [62427.230991] [<c1044d8e>] ? handle_irq_event_percpu+0xc3/0xd3 Feb 10 10:39:47 [kernel] [62427.230997] [<c1046577>] ? unmask_irq+0x11/0x1a Feb 10 10:39:47 [kernel] [62427.231002] [<c1046740>] ? handle_level_irq+0x82/0x84 Feb 10 10:39:47 [kernel] [62427.231008] [<c1002956>] ? do_IRQ+0x6f/0x7f Feb 10 10:39:47 [kernel] [62427.231013] [<c101e627>] ? __do_page_fault+0x467/0x467 Feb 10 10:39:47 [kernel] [62427.231018] [<c1434fb8>] ? error_code+0x58/0x60 Feb 10 10:39:47 [kernel] [62427.231024] [<c101e627>] ? __do_page_fault+0x467/0x467 I'm investigating a possible culprit in the kernel configuration: CONFIG_TRANSPARENT_HUGEPAGE I hadn't myself set this, but it may have been set during an update of the kernel. Prior to me dusting this machine off it ran kernel 3.7.10, and I simply moved .config out of the way, did a `make mrproper` then fetched and checked out a newer branch of the "stable" kernel tree, moved .config back and did a `make olddefconfig`. I'll continue to post what I find in case anyone else reports a similar issue and goes hunting for the cause. It may be worth investigating and perhaps suggesting in the handbook that people check some of these settings to ensure such nasties like the one above don't occur. Okay, so ½ a day later, the machine's still chugging along building packages and NOT stalled, so I think I found the culprit. If anyone finds processes (particularly cc1plus, etc) stalling on an older machine with a modest amount of RAM (in my case, a P4 with 1GB), this is one flag in the kernel configuration they might want to check as it seems to be the default. |