After updating to go-1.21.0 I noticed that go ebuilds with dependency tarballs take a looong time to emerge. The symptom is that during "go mod verify" the go process takes ~120% CPU, while in parallel a bash process takes ~30%. Both noodle around for several minutes. They do eventually finish, but this should not take more than half a second? Reproducible: Always Steps to Reproduce: 1. emerge go-1.21.0 2. emerge -v1 --quiet-build=n app-containers/docker 3. ..zzz... Actual Results: Very slow go mod verify phase I tried to peek inside the bash and go processes and both seem very busy signaling ecah other without much progress. The go process seems to take inordinate amounts of time doing sha26 block checksumming, but takes forever doing so.
Additional info: - trying to build node_exporter is a better example - for some reason it only seems to happen on my AMD Zen2 laptop, not my SandyBridge Intel boxes where "go mod verify" takes <3 seconds.
Downgrading to 1.20 immediately fixes the problem, to the point where "go" does not even show up in top; the "go mod verify" call is basically immediate. On this machine the TSC is broken and so it uses the HPET. I somehow suspect that a scheduler change leads to this stall/livelock, because as soon as I start sysdig to trace the go/bash processes, they start to make progress.
As suspected setting GOMAXPROCS=1 fixes the problem; all values >1 lead to go/bash stepping on each other's toes, probably with missed signal handling. diff --git a/eclass/go-module.eclass b/eclass/go-module.eclass index 6c58d7f26..a006da290 100644 --- a/eclass/go-module.eclass +++ b/eclass/go-module.eclass @@ -383,7 +383,7 @@ go-module_src_unpack() { cd "${S}" local nf [[ -n ${NONFATAL_VERIFY} ]] && nf=nonfatal - ${nf} ego mod verify + GOMAXPROCS=1 ${nf} ego mod verify fi fi }
For anybody also affected by this: instead of editing the eclass, passing GOMAXPROCS=1 to emerge has the same helpful effect. Compile times do not seem to suffer too much due to multiple parallel compiler jobs; IMHO an acceptable workaround for now.
Holger, could you report this one upstream and link it here? I don't think it's worth one of us relaying it given we can't reproduce it yet and they're likely to want more info. Thanks!
(In reply to Sam James from comment #5) > Holger, could you report this one upstream and link it here? I don't think > it's worth one of us relaying it given we can't reproduce it yet and they're > likely to want more info. Thanks! Fair enough, but before I do so I'll have to run a few more experiments to see where this gremlin really hides.
After trying basically everything under the sun I found that FEATURES='-sandbox -usersandbox' seems to fix the problem right away, reliably and independent of GOMAXPROCS or the number of online cores. This would explain why I cannot reproduce the problem oustide of ebuild/emerge, i.e. manually in a go package tree, or why everything else works just fine at runtime. Setting SANDBOX_DEBUG=yes and enabled sandbox I see no violations (everything green) but it hangs on random reads of files in the GOMODCACHE - which exist (in the $WORKDIR/go-mod/cache directories) and are available, i.e. there is no need to download anything. It seems I/O just stops.
This behaviour also extends to building go itself (e.g. a rebuild of 1.21.0 with itself), where various go-based tasks ("dist", "compile") would show the same symptom, so it is not limited to "mod verify". I guess all I/O is affected.
This sounds like some issue with sandbox's ptrace code, or a related kernel bug. Given the highly specific reproduction steps that seem to require hardware I do not own, I really have no way to reproduce and debug this.
I guess I'll have to resign myself to setting GOMAXPROCS=1 in make.conf and move on then. Other than for building a few packages I don't use go and don't want to waste any more time on this either. Thanks anyway.
Created attachment 878944 [details] strace from bash This is a size-limited strace snapshot of a non-progressing bash, the parent of the started go command with GOMAXPROCS > 1.
Managed to reproduce this outside of the shell/python/emerge lasagne! \o/ Here in a prepared node_exporter source tree with GOMAXPROCS unset: holger>while (true) ; do time LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" && sleep 1 ; done LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.02s system 167% cpu 0.016 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.02s system 152% cpu 0.016 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.71s user 8.38s system 150% cpu 6.043 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.07s system 156% cpu 0.051 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.01s system 145% cpu 0.014 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.02s user 0.00s system 153% cpu 0.016 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 6.10s user 74.70s system 150% cpu 53.695 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.01s system 140% cpu 0.014 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.01s system 165% cpu 0.016 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.02s user 0.00s system 173% cpu 0.014 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 5.61s user 62.72s system 150% cpu 45.318 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.01s system 153% cpu 0.015 total LD_PRELOAD=/usr/lib64/libsandbox.so sh -c "go mod verify > /dev/null" 0.01s user 0.01s system 122% cpu 0.016 total The outliers (8.38, 74.7, 62.72 s) are the cases where things are stuck for no reason.
It may be fixed in recent versions of dev-lang/go. I tried today on a zen4 machine and it worked fine with sandbox and usersandbox.