Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 337465 - >portage-2.2_rc67: EbuildIpcDaemon problem on FreeBSD.
Summary: >portage-2.2_rc67: EbuildIpcDaemon problem on FreeBSD.
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All FreeBSD
: High major (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS, REGRESSION
Depends on:
Blocks: 335925
  Show dependency tree
 
Reported: 2010-09-15 10:36 UTC by Naohiro Aota
Modified: 2014-10-08 00:14 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
debug/test output from portage (portage-debug.txt,143.77 KB, text/plain)
2010-09-18 14:05 UTC, Naohiro Aota
Details
portage backtrace (portage.txt,5.08 KB, text/plain)
2010-09-21 17:20 UTC, Naohiro Aota
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Naohiro Aota gentoo-dev 2010-09-15 10:36:27 UTC
portage git head cannot merge some packages. It seems ebuild merge process not begun.

FYI: I've git bisect'ed the source repo, and get "commit 93cded47"
cause this situation.



Reproducible: Always

Steps to Reproduce:
1.install portage git head.
2.ebuild --debug /usr/portage/sys-apps/portage/portage-2.1.9.6.ebuild clean merge


Actual Results:  
ebuild merge not finish with the following outputs.

> <snip>
> sourcepath     	on
> xpg_echo       	off ]]
> + [[  	
>  != \ \	\
>  ]]
> + return 0
> + for x in '{pre_,,post_}${phase_name}'
> + ebuild_phase post_pkg_preinst
> + declare -F post_pkg_preinst
> + set +x
> ebuild-ipc timed out during select after 15 seconds, retrying...
> ebuild-ipc timed out during select after 30 seconds, retrying...
> ebuild-ipc timed out during select after 45 seconds, retrying...
> ebuild-ipc timed out during select after 60 seconds, retrying...
> ebuild-ipc timed out during select after 75 seconds, retrying...
> ebuild-ipc timed out during select after 90 seconds, retrying...
> ebuild-ipc timed out during select after 105 seconds, retrying...

"ps auxw|grep ebuild-ipc" return this:
root  50529  0.0  0.8  9504  7684   1  S+    7:10PM   0:00.29 /usr/bin/python2.6 /usr/lib/portage/bin/ebuild-ipc.py exit 0



Portage HEAD (default/bsd/fbsd/x86/8.0, gcc-4.4.4, freebsd-lib-8.0-r0, 8.0-RELEASE i386)
=================================================================
System uname: FreeBSD-8.0-RELEASE-i386-32bit-ELF
Timestamp of tree: Wed, 15 Sep 2010 08:00:02 +0000
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.9-r1
dev-lang/python:     2.5.4-r4, 2.6.5-r3, 3.1.2-r4
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1
sys-apps/openrc:     0.6.1-r1
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.13, 2.65-r1
sys-devel/automake:  1.9.6-r2, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       3.4.6-r2, 4.3.5, 4.4.4-r1
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
virtual/os-headers:  8.0 (sys-freebsd/freebsd-lib)
ACCEPT_KEYWORDS="x86-fbsd ~x86-fbsd"
ACCEPT_LICENSE="* -@EULA sun-bcla-java-vm skype-eula"
CBUILD="i686-gentoo-freebsd8.0"
CFLAGS="-O2 -pipe -march=prescott --param l1-cache-size=16 --param l1-cache-line-size=64 -mtune=prescott"
CHOST="i686-gentoo-freebsd8.0"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-O2 -pipe -march=prescott --param l1-cache-size=16 --param l1-cache-line-size=64 -mtune=prescott"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests chflags distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="ja_JP.UTF-8"
LC_ALL="ja_JP.UTF-8"
LDFLAGS=""
LINGUAS="ja"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/usr/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/gentoo-bsd /var/lib/layman/sunrise /var/lib/layman/haskell /home/naota/src/portage-overlay /usr/local/portage/pc_local"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acl acpi berkdb bzip2 cjk cli cracklib crypt cxx dri emacs fortran gdbm gif gtk gzip-el iconv ipv6 java5 java6 jpeg jpeg2k mmx modules ncurses nls nptlonly openal opengl oss pam pcre perl png python readline reflection session source spell spl sse sse2 ssl svg tcpd toolkit-scroll-bars truetype unicode usb x86-fbsd xft xorg zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="FreeBSD" INPUT_DEVICES="keyboard mouse" KERNEL="FreeBSD" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="ja" RUBY_TARGETS="ruby18" USERLAND="BSD" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, MAKEOPTS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Zac Medico gentoo-dev 2010-09-15 11:03:33 UTC
I've disabled the EbuildIpcDaemon on FreeBSD as a temporary workaround:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0b68523b7f5332da377291ab89771af7ff3a61a2
Comment 2 Zac Medico gentoo-dev 2010-09-15 16:43:41 UTC
(In reply to comment #1)
> I've disabled the EbuildIpcDaemon on FreeBSD as a temporary workaround:
> 
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0b68523b7f5332da377291ab89771af7ff3a61a2

This is in portage-2.1.9.7 and 2.2_rc83. I'll leave this bug open until we have a real fix.

Please try installing portage with FEATURES=test to see if the tests succeed or fail. Some of them test EbuildIpcDaemon.
Comment 4 Naohiro Aota gentoo-dev 2010-09-18 14:05:39 UTC
Created attachment 247853 [details]
debug/test output from portage

Thank you for your comments.

I've updated the portage repo and check if the code work or not.

# rm -r /usr/lib/portage/*; cp -r pym bin /usr/lib/portage/
# FEATURES=test emerge portage

But, emerge failed to fetch package. (maybe this is different bug. occured since I've update to >rc67)

>  * portage-2.2_rc68.tar.bz2 RMD160 SHA1 SHA256 size ;-) ...      [ ok ]
>  * portage-2.2_rc83.patch.bz2 RMD160 SHA1 SHA256 size ;-) ...    [ ok ]
>  * checking ebuild checksums ;-) ...                             [ ok ]
>  * checking auxfile checksums ;-) ...                            [ ok ]
>  * checking miscfile checksums ;-) ...                           [ ok ]
>  * Fetch failed for 'sys-apps/portage-2.2_rc83', Log file:
>  *  '/var/tmp/portage/sys-apps/portage-2.2_rc83/temp/build.log'
> 
> >>> Failed to emerge sys-apps/portage-2.2_rc83, Log file:

So, I've run the tests by hand.

# ebuild --debug /usr/portage/sys-apps/portage/portage-2.2_rc83.ebuild clean test merge | tee /tmp/portage-debug.txt

attached the result txt file.
Comment 5 Zac Medico gentoo-dev 2010-09-18 14:19:45 UTC
(In reply to comment #4)
> Created an attachment (id=247853) [details]
> debug/test output from portage
>
>   File "/var/tmp/portage/sys-apps/portage-2.2_rc83/work/portage->2.2_rc68/pym/portage/dbapi/porttree.py", line 593, in aux_get
>     raise KeyError(mycpv)
> KeyError: u'dev-libs/A-1'

I got identical errors to this when I tried to run the tests on the GhostBSD-1.5-x86 livedvd in qemu. For some reason the returncode from the bash process is not right. Since you seem to be getting the same result, I'll investigate it some more.
Comment 7 Naohiro Aota gentoo-dev 2010-09-19 21:40:28 UTC
FYI, with the following patch, git head passed the Python 2.6 test on FreeBSD. (but I don't think this is core problem...)

diff --git a/pym/portage/package/ebuild/config.py b/pym/portage/package/ebuild/config.py
index aa8d3c0..3b48601 100644
--- a/pym/portage/package/ebuild/config.py
+++ b/pym/portage/package/ebuild/config.py
@@ -416,6 +416,8 @@ class config(object):
                        if self.mygcfg is None:
                                self.mygcfg = {}
 
+                       self.mygcfg={u'FEATURES':u'sandbox'}
+
                        for k, v in self._default_globals.items():
                                self.mygcfg.setdefault(k, v)
Comment 8 Zac Medico gentoo-dev 2010-09-19 22:01:59 UTC
(In reply to comment #7)
> +                       self.mygcfg={u'FEATURES':u'sandbox'}
> +

So sandbox makes the tests pass? That's odd. I still need to figure out why it's failing without sandbox.

Anyway, if you apply the patch from comment #3, and revert the patch from comment #1, does EbuildIpcDaemon operate correctly on FreeBSD for normal emerge operations? Comment #6 says it works for Darwin, so can we confirm that it works for FreeBSD too?
Comment 9 Naohiro Aota gentoo-dev 2010-09-19 22:32:38 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > +                       self.mygcfg={u'FEATURES':u'sandbox'}
> > +
> 
> So sandbox makes the tests pass? That's odd. I still need to figure out why
> it's failing without sandbox.

yes, really strange. If there is something I can help, please let me know :)

> Anyway, if you apply the patch from comment #3, and revert the patch from
> comment #1, does EbuildIpcDaemon operate correctly on FreeBSD for normal emerge
> operations? Comment #6 says it works for Darwin, so can we confirm that it
> works for FreeBSD too?

Yeah, confirmed the git head reverting the patch from comment #1 merged sys-apps/portage well.
Comment 10 Zac Medico gentoo-dev 2010-09-19 22:45:56 UTC
(In reply to comment #9)
> > So sandbox makes the tests pass? That's odd. I still need to figure out why
> > it's failing without sandbox.
> 
> yes, really strange. If there is something I can help, please let me know :)

I still want to troubleshoot it some more in qemu since I was able to reproduce the problem there.

> Yeah, confirmed the git head reverting the patch from comment #1 merged
> sys-apps/portage well.

Ok, I've enabled EbuildIpcDaemon for Darwin and FreeBSD in git:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=6085103bdd7623d941e3c6ae5905e66658c49fd7
Comment 11 Zac Medico gentoo-dev 2010-09-19 22:46:50 UTC
(In reply to comment #6)
> Same problem on Darwin.  After this commit, I can undo this commit again:
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2f7fbc1e0511a55a4bbe9c92b2cd105cdb1660dd

Thanks for feedback. I guess I don't need you on CC anymore.
Comment 12 Zac Medico gentoo-dev 2010-09-20 02:36:34 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Created an attachment (id=247853) [details] [details]
> > debug/test output from portage
> >
> >   File "/var/tmp/portage/sys-apps/portage-2.2_rc83/work/portage->2.2_rc68/pym/portage/dbapi/porttree.py", line 593, in aux_get
> >     raise KeyError(mycpv)
> > KeyError: u'dev-libs/A-1'
> 
> I got identical errors to this when I tried to run the tests on the
> GhostBSD-1.5-x86 livedvd in qemu. For some reason the returncode from the bash
> process is not right. Since you seem to be getting the same result, I'll
> investigate it some more.

I've found that os.waitpid(self.pid, os.WNOHANG) on line 84 of pym/_emerge/SubProcess.py is returning (0, 673975160) when the documentation says that it should return (0, 0) in this case:

  http://docs.python.org/library/os.html#os.WNOHANG

If I substitute 0 in place of os.WNOHANG in the waipid parameters, the test succeeds. I'm not yet sure how to handle this discrepancy. It might be a python bug since the return value doesn't match the documentation.
Comment 13 Zac Medico gentoo-dev 2010-09-20 03:14:44 UTC
(In reply to comment #12)
> I'm not yet sure how to handle this discrepancy. It might be a python
> bug since the return value doesn't match the documentation.

I've changed to only check the first of the tuple that waitpid returns, since this seems to consistently be 0 on all platforms in the case that we are checking for:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=146dea1276fcecb641ee57f080a4d8f2ccce1396

After this change, all of the tests passed on the GhostBSD-1.5-x86 livedvd.
Comment 14 Zac Medico gentoo-dev 2010-09-20 04:28:12 UTC
This is fixed in 2.1.9.9 and 2.2_rc85.
Comment 15 Fabian Groffen gentoo-dev 2010-09-20 20:40:40 UTC
I just had to disable it again on Darwin, where it got stuck after/during collision-protect check.  Interestingly, the CPU maxed out, while it did only for a short period with ipc-daemon disabled and then just merged the package (portage in this case).
Comment 16 Fabian Groffen gentoo-dev 2010-09-20 20:47:41 UTC
Seems that the timeout during read correlates to the amount of files being checked for collisions.  Probably this is harder to reproduce on faster machines.
Comment 17 Zac Medico gentoo-dev 2010-09-20 23:25:53 UTC
(In reply to comment #16)
> Seems that the timeout during read correlates to the amount of files being
> checked for collisions.  Probably this is harder to reproduce on faster
> machines.

Does it keep retrying and timeout during every read attemp? If so, it seems like the reply got lost somehow. Maybe this will help:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=8ddff22917b89f4377eedb8065a7ee73e0561517
Comment 18 Zac Medico gentoo-dev 2010-09-21 01:08:57 UTC
(In reply to comment #17)
> Does it keep retrying and timeout during every read attemp? If so, it seems
> like the reply got lost somehow. Maybe this will help:
> 
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=8ddff22917b89f4377eedb8065a7ee73e0561517

Now I've rewritten the timeout handling, hopefully eliminating all possible race conditions:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=32cef15206e24c81da75a87626c8ae5192800a03
Comment 19 Fabian Groffen gentoo-dev 2010-09-21 06:38:14 UTC
(In reply to comment #17)
> Does it keep retrying and timeout during every read attemp? If so, it seems
> like the reply got lost somehow. Maybe this will help:

yes, I've waited for 90 seconds, which is way more than it takes to complete checking 242 files without the daemon.

Unfortunately after your most recent commits, still
ebuild-ipc timed out during read after 30 seconds, retrying...

CPU usage remains very high (100%), which is very suspicious, although I can't see a spin-lock somewhere.
Comment 20 Zac Medico gentoo-dev 2010-09-21 07:47:26 UTC
(In reply to comment #19)
> CPU usage remains very high (100%), which is very suspicious, although I can't
> see a spin-lock somewhere.

Is it the ebuild-ipc process or the emerge process? Whichever process it is, you get send it SIGUSR1 to get the debugger, and use the 'bt' command to get a backtrace.
Comment 21 Naohiro Aota gentoo-dev 2010-09-21 16:13:08 UTC
Same here on FreeBSD.

Most recent git head stucks again. Python is using 100% CPU.
Comment 22 Zac Medico gentoo-dev 2010-09-21 16:35:28 UTC
I've tried the tests on FreeBSD in qemu and they succeeded. Do the tests succeed for either of you? When it's consuming 100% cpu, please use kill to send a SIGUSR1 signal to the process, and when it brings up the debugger prompt use the 'bt' command to print a traceback.
Comment 23 Fabian Groffen gentoo-dev 2010-09-21 16:49:48 UTC
 7059  p5  R+     0:59.09 /Library/Gentoo/usr/bin/python2.6 /Library/Gentoo/usr/bin/emerge -1av portage
 9245  p5  S+     0:00.23 /Library/Gentoo/bin/bash /Library/Gentoo/usr/lib/portage/bin/ebuild.sh preinst
 9264  p5  S+     0:00.04 /Library/Gentoo/bin/bash /Library/Gentoo/usr/lib/portage/bin/ebuild.sh preinst
 9279  p5  S+     0:00.42 /Library/Gentoo/usr/bin/python2.6 /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py has_version / <=sys-apps/portage-2.2.00.13346
 9283  p5  S+     0:00.00 /Library/Gentoo/usr/bin/python2.6 /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py has_version / <=sys-apps/portage-2.2.00.13346
 9288  p6  R+     0:00.01 ps axwww
11343  p6  Ss     0:03.85 -/Library/Gentoo/bin/tcsh

it's 7059 that's making the CPU go 100%.

unfortunately the only interaction I can get with pdb is for one process:

ebuild-ipc timed out during read after 135 seconds, retrying...
--Return--
> /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(17)debug_signal()->None
-> pdb.set_trace()
(Pdb) ebuild-ipc timed out during read after 150 seconds, retrying...
bt
ebuild-ipc timed out during read after 165 seconds, retrying...
--Return--
> /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(17)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(220)<module>()
-> sys.exit(ebuild_ipc_main(sys.argv[1:]))
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(217)ebuild_ipc_main()
-> return ebuild_ipc.communicate(args)
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(63)communicate()
-> return self._communicate(args)
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(210)_communicate()
-> retval = self._receive_reply()
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(136)_receive_reply()
-> input_file = open(self.ipc_out_fifo, 'rb', 0)
> /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(17)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(220)<module>()
-> sys.exit(ebuild_ipc_main(sys.argv[1:]))
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(217)ebuild_ipc_main()
-> return ebuild_ipc.communicate(args)
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(63)communicate()
-> return self._communicate(args)
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(210)_communicate()
-> retval = self._receive_reply()
  /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(136)_receive_reply()
-> input_file = open(self.ipc_out_fifo, 'rb', 0)
> /Library/Gentoo/usr/lib/portage/bin/ebuild-ipc.py(17)debug_signal()->None
-> pdb.set_trace()
(Pdb) ebuild-ipc timed out during read after 180 seconds, retrying...
ebuild-ipc timed out during read after 195 seconds, retrying...
ebuild-ipc timed out during read after 210 seconds, retrying...
ebuild-ipc timed out during read after 225 seconds, retrying...
--Return--
> /Library/Gentoo/usr/bin/emerge(28)debug_signal()->None
-> pdb.set_trace()
(Pdb) 
Comment 24 Naohiro Aota gentoo-dev 2010-09-21 17:20:41 UTC
Created attachment 248282 [details]
portage backtrace

(In reply to comment #22)
> I've tried the tests on FreeBSD in qemu and they succeeded. Do the tests
> succeed for either of you? 

Yes, all the tests succeed.

> When it's consuming 100% cpu, please use kill to
> send a SIGUSR1 signal to the process, and when it brings up the debugger prompt
> use the 'bt' command to print a traceback.

attached the output.
Comment 25 Zac Medico gentoo-dev 2010-09-21 18:17:14 UTC
(In reply to comment #23)
>  7059  p5  R+     0:59.09 /Library/Gentoo/usr/bin/python2.6
> /Library/Gentoo/usr/bin/emerge -1av portage
> 
> it's 7059 that's making the CPU go 100%.
> 
> unfortunately the only interaction I can get with pdb is for one process:

Hmm, pdb is not much help here.

(In reply to comment #24)
> Created an attachment (id=248282) [details]
> portage backtrace
> 
> (In reply to comment #22)
> > I've tried the tests on FreeBSD in qemu and they succeeded. Do the tests
> > succeed for either of you? 
> 
> Yes, all the tests succeed.

Hmm, I guess we need a new test case.

> > When it's consuming 100% cpu, please use kill to
> > send a SIGUSR1 signal to the process, and when it brings up the debugger prompt
> > use the 'bt' command to print a traceback.
> 
> attached the output.

The traceback shows it in a blocking poll all, so if that process is spinning the cpu at 100% then it seems like maybe there's an endless stream of poll events that's triggering it. The events are probably being delivered to EbuildIpcDaemon._input_handler() and if we add a breakpoint or some print statements in there then we can examine them and try to determine their meaning.
Comment 26 Zac Medico gentoo-dev 2010-09-22 02:38:42 UTC
I've disabled EbuildIpcDaemon on Darwin and FreeBSD for now, so you'll need to revert this patch when testing:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0906fee12ffe0ebd07e8952316f03927962af29b

Here's something to try, which reverts to the previous code that we were using for reads in v2.1.9.9 and v2.2_rc85:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2be98bc1847c6616fbcc248721c75167fac48651
Comment 27 Zac Medico gentoo-dev 2010-09-22 02:51:50 UTC
Re-opening until this is fixed...
Comment 28 Zac Medico gentoo-dev 2010-09-22 03:46:13 UTC
(In reply to comment #26)
These changes are in v2.1.9.10 and v2.2_rc86.
Comment 29 Fabian Groffen gentoo-dev 2010-09-22 17:27:25 UTC
unfortunately no luck, it still times out after read.
Comment 30 Javier Villavicencio (RETIRED) gentoo-dev 2010-09-23 06:35:53 UTC
Been debugging this one for the last 4 hours, I think I've found the source of it:

This is (a guided) trace of the "hung" portage process that eats 100% cpu while ebuild-ipc.py spits "timed out during read ...", going across the event registered by the EbuildIpcDaemon class:


(Pdb) n
> /usr/lib/portage/pym/_emerge/PollScheduler.py(232)_schedule_wait()
-> handler(f, event)
(Pdb) pp handler
<bound method EbuildIpcDaemon._input_handler of <_emerge.EbuildIpcDaemon.EbuildIpcDaemon object at 0x2923e2ac>>
(Pdb) step
--Call--
> /usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py(29)_input_handler()
-> def _input_handler(self, fd, event):
(Pdb) step
> /usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py(31)_input_handler()
-> buf = self._read_buf(self._files.pipe_in, event)
(Pdb) step
[...] I reach up to this point:
(Pdb) step
> /usr/lib/portage/pym/_emerge/AbstractPollTask.py(32)_read_buf()
-> buf = array.array('B')
(Pdb) step
> /usr/lib/portage/pym/_emerge/AbstractPollTask.py(33)_read_buf()
-> try:
(Pdb) step
> /usr/lib/portage/pym/_emerge/AbstractPollTask.py(34)_read_buf()
-> buf.fromfile(f, self._bufsize)
(Pdb) step
EOFError: 'not enough items in file'
> /usr/lib/portage/pym/_emerge/AbstractPollTask.py(34)_read_buf()
-> buf.fromfile(f, self._bufsize)

The EOF should be fine given that _bufsize is 4096 and the fifo only has "exit 0" pickled. But then:


(Pdb) pp buf.tostring()
''

It's empty, and if I just go and read a few bytes off "f":
(Pdb) os.read(f,15)
"(lp0\nS'exit'\np1"

The stuff shows up.

Seems to happen 100% of the time after a call to has_version during post_install(), not during has_version but precisely as the last ebuild-ipc call made by ebuild.sh (my testcase is sys-libs/timezone-data).

I haven't figured out yet why it's behaving like this in this particular case, but it's time for me to sleep. Hopefully this would help further debugging this f'ed up weirdness.
Comment 31 Javier Villavicencio (RETIRED) gentoo-dev 2010-09-23 07:03:05 UTC
Another oddity:

--- AbstractPollTask.py 2010-09-22 20:23:35.600592321 -0300
+++ AbstractPollTask.py 2010-09-23 03:56:08.255598306 -0300
@@ -2,6 +2,7 @@
 # Distributed under the terms of the GNU General Public License v2

 import array
+import os

 from _emerge.AsynchronousTask import AsynchronousTask
 from _emerge.PollConstants import PollConstants
@@ -33,6 +34,11 @@
                        try:
                                buf.fromfile(f, self._bufsize)
                        except (EOFError, IOError):
+                               if not buf:
+                                       try:
+                                               buf.fromstring(os.read(f.fileno(), self._bufsize))
+                                       except (EOFError, IOError):
+                                               pass
                                pass
                return buf


Doing just f.read(self._bufsize) at that place has the same effect as buf.fromfile(f, self._bufsize), yet os.read(f.fileno(), self._bufsize) (as in the above diff) finally reads the stuck message.
Comment 32 Zac Medico gentoo-dev 2010-09-23 07:15:27 UTC
(In reply to comment #31)
> Doing just f.read(self._bufsize) at that place has the same effect as
> buf.fromfile(f, self._bufsize), yet os.read(f.fileno(), self._bufsize) (as in
> the above diff) finally reads the stuck message.

Great, thanks for troubleshooting this. The following patch should be equivalent to yours:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0d7eb266a9d7909591817e4fd899c1c2ab07b53c

I'll wait until we've had copious testing of this new approach, on both FreeBSD and Darwin, before I enable EbuildIpcDaemon for either one of them again.
Comment 33 Fabian Groffen gentoo-dev 2010-09-23 17:23:06 UTC
Portage is able to emerge itself (it previously wasn't) with ipc-daemon enabled.

I'll leave it enabled locally to see if I run into any problems.
Comment 34 Javier Villavicencio (RETIRED) gentoo-dev 2010-09-23 17:47:52 UTC
One hour after I went to sleep, an eletric peak of 245 volts killed my test
machine PSU. Will be able to properly test tomorrow, if the only thing dead was the PSU. T_T
Comment 35 Javier Villavicencio (RETIRED) gentoo-dev 2010-09-24 01:15:09 UTC
After a new PSU and a new ethernet nic:

  File "/usr/lib/portage/pym/_emerge/SubProcess.py", line 81, in _wait
    self.scheduler.schedule(self._reg_id)
  File "/usr/lib/portage/pym/_emerge/PollScheduler.py", line 232, in _schedule_wait
    handler(f, event)
  File "/usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py", line 39, in _input_handler
    data = os.read(fd, self._bufsize)
OSError: [Errno 35] Resource temporarily unavailable

While emerging openrc. Perhaps catching that particular exception should suffice.
Comment 36 Zac Medico gentoo-dev 2010-09-24 02:45:06 UTC
(In reply to comment #35)
>   File "/usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py", line 39, in
> _input_handler
>     data = os.read(fd, self._bufsize)
> OSError: [Errno 35] Resource temporarily unavailable

Hopefully this solves it:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=7e5b81da12dd7bd59f6620840dc0d824e3f4d69a
Comment 37 Javier Villavicencio (RETIRED) gentoo-dev 2010-09-24 19:41:26 UTC
(In reply to comment #36)
> Hopefully this solves it:
> 
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=7e5b81da12dd7bd59f6620840dc0d824e3f4d69a
> 

Looks good, survived my testcases and finished a @system rebuild.
Comment 38 Zac Medico gentoo-dev 2010-09-25 08:20:50 UTC
(In reply to comment #37)
> Looks good, survived my testcases and finished a @system rebuild.

Excellent, thanks for testing. I've re-enabled EbuildIpcDaemon for both Darwin and FreeBSD in git now:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=181e1613d9cd43563519a9439de99abe2a51d31c
Comment 39 Gianluigi Tiesi 2010-09-25 14:33:14 UTC
I've got timeout on uclibc, had to disable ipc daemon to upgrade portage,
then some emerge went ok. After a while rehappened

my env is an uclibc chroot over a 64bit debian lenny
I've also noticed some time ago, that the problem was intermittent and
after exit and chroot again it fixed
Comment 40 Zac Medico gentoo-dev 2010-09-25 14:38:11 UTC
(In reply to comment #39)
Please test with portage-9999 to see if it's fixed.
Comment 41 Gianluigi Tiesi 2010-09-25 17:28:29 UTC
lol I need git :) 17 packages :)
Comment 42 Gianluigi Tiesi 2010-09-25 17:44:25 UTC
hmm sorry I cannot right now,
git 'needs' iconv, added a note to a closed bug since there is not iconv dep on uclibc.
iconv needs ghc, ghc fails to build on uclibc :(
Comment 43 Benda Xu gentoo-dev 2010-09-25 17:55:11 UTC
Hi, I have met the similar problem on AIX-5.2 running gentoo-prefix

Please disable that fancy IPC feature for AIX.

Or what patch should I apply to eliminate this? (sorry i did not quite follow
the above discussions)

$ ebuild perl-5.12.1-r1.ebuild install

Traceback (most recent call last):
  File "/home/k/benda/local/usr/bin/ebuild", line 287, in <module>
    vartree=portage.db[portage.root]['vartree'])
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 779, in doebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1153, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 1167, in spawnebuild
    fd_pipes=fd_pipes, returnpid=returnpid)
  File
"/home/k/benda/local/usr/lib/portage/pym/portage/package/ebuild/doebuild.py",
line 112, in _spawn_phase
    ebuild_phase.wait()
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/AsynchronousTask.py",
line 40, in wait
    self._wait()
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/CompositeTask.py", line
56, in _wait
    task.wait()
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/AsynchronousTask.py",
line 40, in wait
    self._wait()
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/SubProcess.py", line
81, in _wait
    self.scheduler.schedule(self._reg_id)
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/PollScheduler.py", line
232, in _schedule_wait
    handler(f, event)
  File "/home/k/benda/local/usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py",
line 39, in _input_handler
    data = os.read(fd, self._bufsize)
OSError: [Errno 11] Resource temporarily unavailable



(In reply to comment #38)
> (In reply to comment #37)
> > Looks good, survived my testcases and finished a @system rebuild.
Comment 44 Zac Medico gentoo-dev 2010-09-25 19:12:40 UTC
(In reply to comment #43)
>   File "/home/k/benda/local/usr/lib/portage/pym/_emerge/EbuildIpcDaemon.py",
> line 39, in _input_handler
>     data = os.read(fd, self._bufsize)
> OSError: [Errno 11] Resource temporarily unavailable

The patch posted in comment #36 is supposed to handle that.

Comment 45 Fabian Groffen gentoo-dev 2010-09-25 21:02:40 UTC
(In reply to comment #43)
> Hi, I have met the similar problem on AIX-5.2 running gentoo-prefix
> 
> Please disable that fancy IPC feature for AIX.
> 
> Or what patch should I apply to eliminate this? (sorry i did not quite follow
> the above discussions)

Please sync in half an hour from now.  It should give you a news item describing how to fix portage, and a new portage version that has ipc disabled by default (USE=-ipc).
Comment 46 Zac Medico gentoo-dev 2010-09-26 07:28:09 UTC
This is fixed in (In reply to comment #38)
> (In reply to comment #37)
> > Looks good, survived my testcases and finished a @system rebuild.
> 
> Excellent, thanks for testing. I've re-enabled EbuildIpcDaemon for both Darwin
> and FreeBSD in git now:
> 
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=181e1613d9cd43563519a9439de99abe2a51d31c

This is in 2.1.9.11 and 2.2_rc87.

If there are remaining issues with kernels other than FreeBSD then please file separate bugs.