Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 712136

Summary: sys-apps/coreutils-8.32-r1: emerge or ebuild(1) hangs on pkg_postinst due to mkfifo hanging on call to open(..., O_RDONLY)
Product: Gentoo Linux Reporter: Guilherme Amadio <amadio>
Component: Current packagesAssignee: Gentoo's Team for Core System packages <base-system>
Status: CONFIRMED ---    
Severity: normal CC: dev-portage, luke-jr+gentoobugs, sam, zeekec
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: emerge --info
build log

Description Guilherme Amadio gentoo-dev 2020-03-11 13:44:53 UTC
Created attachment 617966 [details]
emerge --info

I have a prefix where portage hangs on pkg_postinst(). Happens with a few of the latest versions as of recently. Could be due to an old kernel, as this is with kernel 2.6.32-754.27.1.el6.x86_64 from Scientific Linux 6. The stack trace is shown below using Python 3.8, but I see the same with Python 3.6:

#0  0x00007f7710c96a73 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f77105bfa12 in select_epoll_poll_impl (self=0x7f770f989e10, maxevents=<optimized out>, timeout_obj=<optimized out>)
    at /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/src/debug/dev-lang/python-3.8.2/Python-3.8.2/Modules/selectmodule.c:1558
#2  select_epoll_poll (self=0x7f770f989e10, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>)
    at /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/src/debug/dev-lang/python-3.8.2/Python-3.8.2/Modules/clinic/selectmodule.c.h:859
#3  0x00007f7710e107ec in method_vectorcall_FASTCALL_KEYWORDS (func=0x7f77105e8130, args=0x560804cb1ca8, nargsf=<optimized out>, kwnames=0x0)
    at Objects/descrobject.c:371
#4  0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#5  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#6  _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#7  0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f77105f1660, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=2, kwnames=0x0, kwargs=0x560804e8d130, kwcount=0, kwstep=1, defs=0x7f77105f3178, defcount=1, kwdefs=0x0, closure=0x0, name=0x7f77105d36f0, 
    qualname=0x7f77105f0620) at Python/ceval.c:4298
#8  0x00007f7710e06839 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:435
#9  0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#12 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#13 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#14 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#15 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#16 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#17 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#18 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#20 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:283
#21 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#22 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#23 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#24 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:283
#25 0x00007f7710e09efb in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=0x7f770c0f7f38, callable=0x7f770fce7ee0)
    at ./Include/cpython/abstract.h:127
#26 method_vectorcall (method=<optimized out>, args=0x7f770c0f7f40, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/classobject.c:60
#27 0x00007f7710de21c2 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#29 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3469
#30 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#31 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#32 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#33 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#34 0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f770f650a80, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=1, kwnames=0x0, kwargs=0x560805027cb8, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f770f64d930, 
    qualname=0x7f770f64fc10) at Python/ceval.c:4298
#35 0x00007f7710e06839 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:435
#36 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#37 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#38 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#39 0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f770f6500e0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=1, kwnames=0x0, kwargs=0x560804786930, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f771074a5f0, 
    qualname=0x7f770f64dab0) at Python/ceval.c:4298
#40 0x00007f7710e06839 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:435
#41 0x00007f7710ddcef3 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#42 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#43 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#44 0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f770f8ce190, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=1, kwnames=0x7f770f8c3238, kwargs=0x5608047d4640, kwcount=1, kwstep=1, defs=0x7f770f6b3dd8, defcount=6, kwdefs=0x0, closure=0x0, 
    name=0x7f770f8cf170, qualname=0x7f770f8cf170) at Python/ceval.c:4298
#45 0x00007f7710e06839 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:435
#46 0x00007f7710ddc867 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#47 call_function (kwnames=0x7f770f8c3220, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#48 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515
#49 0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f770f956710, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, 
    argcount=1, kwnames=0x0, kwargs=0x560804448de8, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f770fcf20f0, 
    qualname=0x7f770fcf20f0) at Python/ceval.c:4298
#50 0x00007f7710e06839 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:435
#51 0x00007f7710ddcc6a in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ./Include/cpython/abstract.h:123
#52 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#53 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500
#54 0x00007f7710dda883 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#55 0x00007f7710ddcc6a in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at ./Include/cpython/abstract.h:123
#56 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56080426ca80) at Python/ceval.c:4987
#57 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500
#58 0x00007f7710ecb4e7 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f77107850e0, globals=globals@entry=0x7f77108c3c40, locals=locals@entry=0x7f77108c3c40, 
    args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, 
    closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4298
#59 0x00007f7710ecb7cd in PyEval_EvalCodeEx (_co=_co@entry=0x7f77107850e0, globals=globals@entry=0x7f77108c3c40, locals=locals@entry=0x7f77108c3c40, 
    args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4327
#60 0x00007f7710ecb7ff in PyEval_EvalCode (co=co@entry=0x7f77107850e0, globals=globals@entry=0x7f77108c3c40, locals=locals@entry=0x7f77108c3c40)
    at Python/ceval.c:718
#61 0x00007f7710f015fa in run_eval_code_obj (co=co@entry=0x7f77107850e0, globals=globals@entry=0x7f77108c3c40, locals=locals@entry=0x7f77108c3c40)
    at Python/pythonrun.c:1125
#62 0x00007f7710f016b5 in run_mod (mod=mod@entry=0x5608043078d0, filename=filename@entry=0x7f7710775ed0, globals=globals@entry=0x7f77108c3c40, 
    locals=locals@entry=0x7f77108c3c40, flags=flags@entry=0x7ffed8708298, arena=arena@entry=0x7f77108bccf0) at Python/pythonrun.c:1147
#63 0x00007f7710f040d2 in PyRun_FileExFlags (fp=fp@entry=0x5608042a3880, 
    filename_str=filename_str@entry=0x7f7710776150 "/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge", 
    start=start@entry=257, globals=globals@entry=0x7f77108c3c40, locals=locals@entry=0x7f77108c3c40, closeit=closeit@entry=1, flags=0x7ffed8708298)
    at Python/pythonrun.c:1063
#64 0x00007f7710f04240 in PyRun_SimpleFileExFlags (fp=fp@entry=0x5608042a3880, filename=<optimized out>, closeit=closeit@entry=1, 
    flags=flags@entry=0x7ffed8708298) at Python/pythonrun.c:428
#65 0x00007f7710f04721 in PyRun_AnyFileExFlags (fp=fp@entry=0x5608042a3880, filename=<optimized out>, closeit=closeit@entry=1, 
    flags=flags@entry=0x7ffed8708298) at Python/pythonrun.c:86
#66 0x00007f7710f1edfd in pymain_run_file (cf=0x7ffed8708298, config=0x56080426ab00) at Modules/main.c:381
#67 pymain_run_python (exitcode=0x7ffed8708290) at Modules/main.c:565
#68 Py_RunMain () at Modules/main.c:644
#69 0x00007f7710f1f179 in pymain_main (args=args@entry=0x7ffed8708380) at Modules/main.c:674
#70 0x00007f7710f1f24a in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:698
#71 0x00007f7710bb7156 in __libc_start_main (main=0x560802aa4050 <main>, argc=5, argv=0x7ffed87084a8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffed8708498) at ../csu/libc-start.c:295
#72 0x0000560802aa408a in _start () at ../sysdeps/x86_64/start.S:120


The call to epoll_wait is from this loop in Python:

    do {
        Py_BEGIN_ALLOW_THREADS
        errno = 0;
        nfds = epoll_wait(self->epfd, evs, maxevents, (int)ms); // this line
        Py_END_ALLOW_THREADS

        if (errno != EINTR)
            break;

        /* poll() was interrupted by a signal */
        if (PyErr_CheckSignals())
            goto error;

        if (timeout >= 0) {
            timeout = deadline - _PyTime_GetMonotonicClock();
            if (timeout < 0) {
                nfds = 0;
                break;
            }
            ms = _PyTime_AsMilliseconds(timeout, _PyTime_ROUND_CEILING);
            /* retry epoll_wait() with the recomputed timeout */
        }
    } while(1);

Please let me know how else I can test/debug this. It would be nice to be able to fix it.
Comment 1 Zac Medico gentoo-dev 2020-03-11 17:21:14 UTC
If it's anything like bug 711322, then the --load-average option might cause it to un-hang itself (it has a 30 second time that will wake it up). In cases like this it's also possible to send the main emerge process a SIGUSR1 signal in order to get a pdb prompt, then move up the stack to the scope of the Scheduler class and call self._schedule() to un-hang it.
Comment 2 Guilherme Amadio gentoo-dev 2020-03-12 11:43:43 UTC
I am already using --load-average, so the hang happens with or without it.
Here is a stack trace from pdb in any case:

>>> Completed installing dev-db/sqlite-3.31.1 into /tmp/portage/dev-db/sqlite-3.31.1/image

 * Final size of build directory: 28960 KiB (28.2 MiB)
 * Final size of installed tree:   2340 KiB ( 2.2 MiB)

--Return--
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(53)<module>()
-> retval = emerge_main()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/main.py(1309)emerge_main()
-> return run_action(emerge_config)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(3374)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(564)action_build()
-> retval = mergetask.merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1020)merge()
-> rval = self._merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1414)_merge()
-> self._main_loop()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1390)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/portage/util/_eventloop/asyncio_event_loop.py(127)_run_until_complete()
-> return self._loop.run_until_complete(future)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(603)run_until_complete()
-> self.run_forever()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(570)run_forever()
-> self._run_once()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(1823)_run_once()
-> event_list = self._selector.select(timeout)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/selectors.py(468)select()
-> fd_event_list = self._selector.poll(timeout, max_ev)
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()

for portage-3.2.84, the stack trace is
--Return--
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) 
(Pdb) bt
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(53)<module>()
-> retval = emerge_main()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/main.py(1309)emerge_main()
-> return run_action(emerge_config)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(3358)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(563)action_build()
-> retval = mergetask.merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1020)merge()
-> rval = self._merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1414)_merge()
-> self._main_loop()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1390)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/portage/util/_eventloop/asyncio_event_loop.py(140)_run_until_complete()
-> return self._loop.run_until_complete(future)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(603)run_until_complete()
-> self.run_forever()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(570)run_forever()
-> self._run_once()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(1823)_run_once()
-> event_list = self._selector.select(timeout)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/selectors.py(468)select()
-> fd_event_list = self._selector.poll(timeout, max_ev)
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) 

I didn't manage to unhang portage-2.3.93 by calling self._schedule() either. Even calling just ebuild portage-2.3.93.ebuild install already causes the hang for me. All versions of portage currently in the tree show this behavior, so I wonder if the machine I'm running this on is not the problem...
Comment 3 Guilherme Amadio gentoo-dev 2020-03-12 11:44:02 UTC
I am already using --load-average, so the hang happens with or without it.
Here is a stack trace from pdb in any case:

>>> Completed installing dev-db/sqlite-3.31.1 into /tmp/portage/dev-db/sqlite-3.31.1/image

 * Final size of build directory: 28960 KiB (28.2 MiB)
 * Final size of installed tree:   2340 KiB ( 2.2 MiB)

--Return--
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(53)<module>()
-> retval = emerge_main()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/main.py(1309)emerge_main()
-> return run_action(emerge_config)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(3374)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(564)action_build()
-> retval = mergetask.merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1020)merge()
-> rval = self._merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1414)_merge()
-> self._main_loop()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1390)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/portage/util/_eventloop/asyncio_event_loop.py(127)_run_until_complete()
-> return self._loop.run_until_complete(future)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(603)run_until_complete()
-> self.run_forever()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(570)run_forever()
-> self._run_once()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(1823)_run_once()
-> event_list = self._selector.select(timeout)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/selectors.py(468)select()
-> fd_event_list = self._selector.poll(timeout, max_ev)
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()

for portage-3.2.84, the stack trace is
--Return--
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) 
(Pdb) bt
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(53)<module>()
-> retval = emerge_main()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/main.py(1309)emerge_main()
-> return run_action(emerge_config)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(3358)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/actions.py(563)action_build()
-> retval = mergetask.merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1020)merge()
-> rval = self._merge()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1414)_merge()
-> self._main_loop()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/_emerge/Scheduler.py(1390)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/site-packages/portage/util/_eventloop/asyncio_event_loop.py(140)_run_until_complete()
-> return self._loop.run_until_complete(future)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(603)run_until_complete()
-> self.run_forever()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(570)run_forever()
-> self._run_once()
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/asyncio/base_events.py(1823)_run_once()
-> event_list = self._selector.select(timeout)
  /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python3.8/selectors.py(468)select()
-> fd_event_list = self._selector.poll(timeout, max_ev)
> /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/lib/python-exec/python3.8/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) 

I didn't manage to unhang portage-2.3.93 by calling self._schedule() either. Even calling just ebuild portage-2.3.93.ebuild install already causes the hang for me. All versions of portage currently in the tree show this behavior, so I wonder if the machine I'm running this on is not the problem...
Comment 4 Zac Medico gentoo-dev 2020-03-12 17:22:06 UTC
Typically this type of hang indicates that the stdout/stderr file descriptors of the ebuild process have not reached EOF, which can be due to some process running in the background which has inherited these file descriptors.

Does this issue affect all ebuilds? Do you have anything in a bashrc like ${EPREFIX}/etc/portage/bashrc or in ${EPREFIX}/etc/portage/env/* that might leave a process running in the background? Can you see any processes running in the background, using tools like ps? Typically I would use a command like `ps axf` to view the process tree, but it's possible that the process would be a child of pid 1 if it has daemonized itself.

Over the years we have come up with 3 mechanisms to combat this kind issue:

1) The sys-apps/portage ebuilds have an ipc USE flag which enables an ipc call for the ebuild to indicate when a given phase has completed. The ipc USE flag is sometimes disabled for prefix installations, so you might try enabling it if possible.

2) FEATURES=cgroup (I wouldn't recommend this one since it's not really maintained and could have various issues like for example it is incompatible with systemd)

3) FEATURES=pid-sandbox (requires root privileges)
Comment 5 Guilherme Amadio gentoo-dev 2020-03-17 11:09:01 UTC
This is what I have in FEATURES:

gentoo ~ $ portageq envvar FEATURES
assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles force-prefix ipc-sandbox merge-sync multilib-strict network-sandbox parallel-fetch pid-sandbox preserve-libs protect-owned sfperms strict unknown-features-warn unmerge-logs unmerge-orphans unprivileged

And how portage is configured:
gentoo ~ $ emerge -pv1 --nodeps portage

These are the packages that would be merged, in order:

[ebuild     U  ] sys-apps/portage-2.3.94::gentoo [2.3.84-r1::gentoo] USE="(ipc) -apidoc% -build -doc -gentoo-dev (-native-extensions) -rsync-verify (-selinux) -xattr (-epydoc%)" PYTHON_TARGETS="python3_6 python3_7 python3_8 -pypy3 -python2_7" 0 KiB

I rebooted the machine, and no suspicious process is running. At the point ebuild gets stuck, I see this process tree:

 2720 pts/0    Ss     0:00  |       \_ -bash
 2767 pts/0    S      0:00  |           \_ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/bin/bash /home/jenkins/bin/gentoo
 2768 pts/0    S      0:00  |               \_ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/bin/bash -l
 2839 pts/0    S+     0:01  |                   \_ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/bin/python3.6 -b /cvmfs/sft.cern.ch/lcg/contrib/gento
 7577 pts/0    S+     0:00  |                       \_ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/bin/bash /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/
 9317 pts/0    S+     0:00  |                           \_ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/bin/bash /cvmfs/sft.cern.ch/lcg/contrib/gentoo/li
 9332 pts/0    S+     0:00  |                               \_ mkfifo -m 600 /tmp/portage/sys-apps/portage-2.3.94/temp/multijob.CuM3SW

So, it seems that mkfifo might be involved. This is prefix, so I cannot try things that need root privileges. If I ctrl-C then to ebuild ... install again, then it thinks installation finished. After that, I can use ebuild ... qmerge to finish installing the package. I still don't know how this started to happen. The issue affects all ebuilds.
Comment 6 Guilherme Amadio gentoo-dev 2020-03-17 11:22:51 UTC
Killing the mkfifo points to portage getting stuck here:
https://gitweb.gentoo.org/proj/portage.git/tree/bin/helper-functions.sh#n25
Comment 7 Guilherme Amadio gentoo-dev 2020-03-17 13:12:00 UTC
So, turns out that it's mkfifo that hangs on a call to open(..., O_RDONLY):

$ strace mkfifo -m 600 /tmp/myfifo
execve("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/bin/mkfifo", ["mkfifo", "-m", "600", "/tmp/myfifo"], [/* 45 vars */]) = 0
brk(0)                                  = 0x55b786134000
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f213b2ec000
access("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=554196, ...}) = 0
mmap(NULL, 554196, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f213b264000
close(3)                                = 0
open("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\"\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=16585416, ...}) = 0
mmap(NULL, 1804472, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f213b0ab000
mprotect(0x7f213b0cb000, 1634304, PROT_NONE) = 0
mmap(0x7f213b0cb000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7f213b0cb000
mmap(0x7f213b216000, 274432, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16b000) = 0x7f213b216000
mmap(0x7f213b25a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ae000) = 0x7f213b25a000
mmap(0x7f213b260000, 14520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f213b260000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f213b0a8000
arch_prctl(ARCH_SET_FS, 0x7f213b0a8700) = 0
mprotect(0x7f213b25a000, 16384, PROT_READ) = 0
mprotect(0x55b784f03000, 4096, PROT_READ) = 0
mprotect(0x7f213b314000, 4096, PROT_READ) = 0
munmap(0x7f213b264000, 554196)          = 0
brk(0)                                  = 0x55b786134000
brk(0x55b786155000)                     = 0x55b786155000
umask(0)                                = 022
umask(022)                              = 0
mknod("/tmp/myfifo", S_IFIFO|0600)      = 0
open("/tmp/myfifo", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|0x200000

The mkfifo from the host does not hang (this is a prefix system). After downgrading coreutils to 8.31-r1, mkfifo stopped hanging, and strace now shows:

$ strace mkfifo -m 600 /tmp/myfifo
execve("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/usr/bin/mkfifo", ["mkfifo", "-m", "600", "/tmp/myfifo"], [/* 45 vars */]) = 0
brk(0)                                  = 0x562b1ccbb000
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb562343000
access("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=554196, ...}) = 0
mmap(NULL, 554196, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb5622bb000
close(3)                                = 0
open("/cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\"\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=16585416, ...}) = 0
mmap(NULL, 1804472, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fb562102000
mprotect(0x7fb562122000, 1634304, PROT_NONE) = 0
mmap(0x7fb562122000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7fb562122000
mmap(0x7fb56226d000, 274432, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16b000) = 0x7fb56226d000
mmap(0x7fb5622b1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ae000) = 0x7fb5622b1000
mmap(0x7fb5622b7000, 14520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fb5622b7000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb5620ff000
arch_prctl(ARCH_SET_FS, 0x7fb5620ff700) = 0
mprotect(0x7fb5622b1000, 16384, PROT_READ) = 0
mprotect(0x562b1c98f000, 4096, PROT_READ) = 0
mprotect(0x7fb56236b000, 4096, PROT_READ) = 0
munmap(0x7fb5622bb000, 554196)          = 0
brk(0)                                  = 0x562b1ccbb000
brk(0x562b1ccdc000)                     = 0x562b1ccdc000
umask(0)                                = 022
umask(022)                              = 0
mknod("/tmp/myfifo", S_IFIFO|0600)      = 0
chmod("/tmp/myfifo", 0600)              = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Portage now works fine after the downgrade.
Comment 8 Sheng Yang 2020-06-25 00:32:54 UTC
In case anyone had the same problem, here is how I managed to fix it (on a gentoo-prefix).

1. Make a backup of your ${EPREFIX}/bin/mkfifo
2. Make a symbolic link of the mkfifo file in the host (typically at /usr/bin/mkfifo or /bin/mkfifo) to ${EPREFIX}/bin/mkfifo
3. Downgrade coreutils with emerge -v1 '=sys-apps/coreutils-8.30*'
4. Re-install coreutils with emerge sys-apps/coreutils

I am not sure if step 3 is necessary.
Comment 9 Luke-Jr 2020-11-04 22:44:46 UTC
Getting this in a non-prefix install.
Comment 10 Luke-Jr 2020-11-04 23:21:29 UTC
...and rebuilding coreutils fixed it for me too. *shrug*
Comment 11 Luke-Jr 2020-11-04 23:55:30 UTC
(In reply to Luke-Jr from comment #10)
> ...and rebuilding coreutils fixed it for me too. *shrug*

Sigh, spoke too soon. Only reproduces with the -m option

Looks like 8.31 is using native lchmod, while 8.32 is using a (broken) substitute for some reason.
Comment 12 Mike Gilbert gentoo-dev 2020-11-05 16:33:29 UTC
(In reply to Luke-Jr from comment #11)
> Looks like 8.31 is using native lchmod, while 8.32 is using a (broken) substitute for some reason.

Do you see this in the configure output?

checking for lchmod... yes

If not, it is probably using an implementation from gnulib as a fallback.
Comment 13 Luke-Jr 2020-11-05 21:58:20 UTC
(In reply to Mike Gilbert from comment #12)
> (In reply to Luke-Jr from comment #11)
> > Looks like 8.31 is using native lchmod, while 8.32 is using a (broken) substitute for some reason.
> 
> Do you see this in the configure output?
> 
> checking for lchmod... yes
> 
> If not, it is probably using an implementation from gnulib as a fallback.

No, both 8.31 and 8.32 have "no" here.
It does look like lchmod is coming from gnulib.

In 8.31, it appears that it is #define'd to normal chmod...
Comment 14 Luke-Jr 2020-11-05 21:58:43 UTC
Created attachment 670109 [details]
build log
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-04-12 06:48:23 UTC
If you're still hitting this, please include your config.log (and ideally a fresh build.log & emerge --info), and share anything interesting about your machine.