Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 336644

Summary: _input_handler pickle_str = self._files.pipe_in.read() IOError: [Errno 11] Resource temporarily unavailable
Product: Portage Development Reporter: Richard <shiningarcanine>
Component: CoreAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: jer
Priority: High Keywords: InVCS, REGRESSION
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 335925    
Attachments: The complete terminal output.

Description Richard 2010-09-09 20:27:05 UTC
Compiling in a tmpfs can result in build failures. I have experienced this in Linux kernel 2.6.33 through 2.6.35. It might be an upstream issue.

Reproducible: Sometimes

Steps to Reproduce:
1. Build many packages at once in a tmpfs.
2. It might fail. It might succeed.
Actual Results:  
>>> Jobs: 7 of 120 complete, 12 running             Load avg: 14.0, 18.1, 12.6
Traceback (most recent call last):
  File "/usr/bin/emerge", line 43, in <module>
    retval = emerge_main()
  File "/usr/lib64/portage/pym/_emerge/main.py", line 1683, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/lib64/portage/pym/_emerge/actions.py", line 436, in action_build
    retval = mergetask.merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1081, in merge
    rval = self._merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1397, in _merge
    self._main_loop()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1539, in _main_loop
    self._poll_loop()
  File "/usr/lib64/portage/pym/_emerge/PollScheduler.py", line 138, in _poll_loop
    handler(f, event)
  File "/usr/lib64/portage/pym/_emerge/EbuildIpcDaemon.py", line 77, in _input_handler
    reply_hook()
  File "/usr/lib64/portage/pym/_emerge/AbstractEbuildProcess.py", line 143, in _exit_command_callback
    self.scheduler.schedule(self._reg_id, timeout=self._exit_timeout)
  File "/usr/lib64/portage/pym/_emerge/PollScheduler.py", line 232, in _schedule_wait
    handler(f, event)
  File "/usr/lib64/portage/pym/_emerge/EbuildIpcDaemon.py", line 48, in _input_handler
    pickle_str = self._files.pipe_in.read()
IOError: [Errno 11] Resource temporarily unavailable

Expected Results:  
It should have worked.
Comment 1 Richard 2010-09-09 20:41:53 UTC
Created attachment 246627 [details]
The complete terminal output.

I have 8GB of RAM, 8GB of swap and an 8GB tmpfs. Theoretically, everything should be fine, but occasionally things fail. This used to happen to me shortly after I switched from Windows 7 to Gentoo Linux. I had changed the RAM from 4x1GB modules to 2x2GB modules in the process and these issues occurred quite frequently. Since upgrading to 4x2GB, which was my original intention, this is the first it has happened. I know that I only observe it when compiling stuff in portage under a tmpfs and it has never happened when I did not have a tmpfs mounted. My system uses a SSD, so there is a need for me to do compilations in a tmpfs because of the write cycle limit.

According to df, my tmpfs is only using 1604336 1K blocks out of the 8388608 1K blocks available, so it definitely did not fill. Memory usage is only 2GB according to System Monitor and the swap usage is currently zero, so there is nothing to suggest that memory starvation occurred. The kernel log also lacks anything to suggest that the oomkiller activated. I cannot reproduce any issues with the tmpfs when filling it manually, so I suspect that this could be an issue in portage's IO handling.

Here is emerge --info:
Portage 2.1.9.2 (default/linux/amd64/10.0/desktop/kde, gcc-4.4.4, glibc-2.12.1-r1, 2.6.35.4 x86_64)
=================================================================
System uname: Linux-2.6.35.4-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q9550_@_2.83GHz-with-gentoo-2.0.1
Timestamp of tree: Thu, 09 Sep 2010 17:45:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11
dev-lang/python:     2.6.5-r3, 3.1.2-r4
dev-util/ccache:     2.4-r8
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1
sys-apps/openrc:     0.6.3
sys-apps/sandbox:    2.3-r1
sys-devel/autoconf:  2.13, 2.67
sys-devel/automake:  1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.3.5, 4.4.4-r1
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
virtual/os-headers:  2.6.35 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=core2 -mtune=core2 -mcx16 -msahf -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/portage /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=core2 -mtune=core2 -mcx16 -msahf -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests buildpkg ccache distlocks fixlafiles fixpackages multilib-strict news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox"
FFLAGS="-march=core2 -mtune=core2 -mcx16 -msahf -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -O2 -pipe"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://gentoo.mirrors.tds.net/gentoo http://gentoo.netnitco.net"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--sort-common"
LINGUAS="en"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/java-overlay /var/lib/layman/vmware /var/lib/layman/sunrise /usr/local/portage"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="X a52 aac acpi alsa amd64 berkdb branding bzip2 cairo cdr cli consolekit cracklib crypt cups cxx dbus dri dts dvd dvdr emboss encode exif fam ffmpeg firefox flac fontconfig fortran gdbm gif gpm hal iconv ipv6 java jpeg kde lcms ldap libnotify lzma mad mikmod mmx mng modules mp3 mp4 mpeg mudflap multilib ncurses nls nptl nptlonly nsplugin ogg opengl openmp pam pango pcre pdf perl png ppds pppd python qt3support qt4 readline reflection sdl session spell sse sse2 ssl ssse3 startup-notification svg sysfs tcpd theora tiff truetype unicode usb vorbis x264 xcb xcomposite xml xorg xulrunner xv xvid xvmc zlib zsh-completion" ALSA_CARDS="snd-ctxfi" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="*" ELIBC="glibc" INPUT_DEVICES="evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 2 Zac Medico gentoo-dev 2010-09-10 01:48:38 UTC
The error is similar to the one in bug #264435. We may be able to re-use some of the related code to solve this bug.
Comment 3 Richard 2010-09-10 02:35:09 UTC
(In reply to comment #2)
> The error is similar to the one in bug #264435. We may be able to re-use some
> of the related code to solve this bug.
> 

Unfortunately, this bug is very difficult to reproduce. I only encounter it when compiling under a tmpfs and only when doing massive compilations. Compilation of very large things like KDE and Open Office is the only thing that seems to trigger it and it always occurs when the tmpfs has plenty of space. It also never seems to occur the same way, although I have not restarted compilations from scratch to be able to verify that. I usually just emerge --resume.
Comment 4 Zac Medico gentoo-dev 2010-09-10 05:11:39 UTC
The non-blocking read error handling code from the SpawnProcess class is known to work well, so I've fixed the error handling in AbstractPollTask and ebuild-ipc.py to work in the same way:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=34cd7af0911547d0d58b76f0309e744f6184da78
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=26434ee1b77dbff8f1904dd12204a93e87c8b6d3
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2010-09-10 05:21:54 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > The error is similar to the one in bug #264435. We may be able to re-use some
> > of the related code to solve this bug.
> > 
> 
> Unfortunately, this bug is very difficult to reproduce. I only encounter it
> when compiling under a tmpfs and only when doing massive compilations.
> Compilation of very large things like KDE and Open Office is the only thing
> that seems to trigger it and it always occurs when the tmpfs has plenty of
> space.

So it's actually rather easy to reproduce. You compile openoffice in your tmpfs.
Comment 6 Richard 2010-09-10 05:35:31 UTC
(In reply to comment #4)
> The non-blocking read error handling code from the SpawnProcess class is known
> to work well, so I've fixed the error handling in AbstractPollTask and
> ebuild-ipc.py to work in the same way:
> 
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=34cd7af0911547d0d58b76f0309e744f6184da78
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=26434ee1b77dbff8f1904dd12204a93e87c8b6d3
> 

Do you think that this might fix the IOError I have encountered, or do you think that this will make it produce more descriptive error messages when an issue does occur?

I would like to have some idea of which is the case so I have some idea of what to expect before I try this out on my systems.

(In reply to comment #5)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > The error is similar to the one in bug #264435. We may be able to re-use some
> > > of the related code to solve this bug.
> > > 
> > 
> > Unfortunately, this bug is very difficult to reproduce. I only encounter it
> > when compiling under a tmpfs and only when doing massive compilations.
> > Compilation of very large things like KDE and Open Office is the only thing
> > that seems to trigger it and it always occurs when the tmpfs has plenty of
> > space.
> 
> So it's actually rather easy to reproduce. You compile openoffice in your
> tmpfs.
> 

It does not happen every time. If it did, I would not have open office installed right now. Assuming that this is a portage regression that occurred between when I installed openoffice and now, then there is a possibility that open office compilations are broken on my systems now. I will try recompiling open office, but I do not think that it is guaranteed to trigger this issue.
Comment 7 Zac Medico gentoo-dev 2010-09-10 05:40:10 UTC
(In reply to comment #6)
> Do you think that this might fix the IOError I have encountered, or do you
> think that this will make it produce more descriptive error messages when an
> issue does occur?

It might fix it, but we can't know without testing.

> I would like to have some idea of which is the case so I have some idea of what
> to expect before I try this out on my systems.

You can use the portage-9999 ebuild to test it.
Comment 8 Zac Medico gentoo-dev 2010-09-10 20:30:22 UTC
Hopefully this is fixed in 2.1.9.3 and 2.2_rc79. Please re-open if the problem persists.