Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 223417 - portage-2.1.5.2 and 2.1.5.5: Random emerge failures in the merging phase (KeyError)
Summary: portage-2.1.5.2 and 2.1.5.5: Random emerge failures in the merging phase (Key...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS, REGRESSION
Depends on:
Blocks: 216231
  Show dependency tree
 
Reported: 2008-05-24 10:08 UTC by Francisco J. Vazquez
Modified: 2008-06-18 00:13 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
use floating point mtime for finer grained validation in cachedir() (float_mtime.patch,420 bytes, patch)
2008-05-24 18:56 UTC, Zac Medico
Details | Diff
disable the vardbapi.cpv_all() cache (disable_cpv_all_cache.patch,441 bytes, patch)
2008-06-16 00:35 UTC, Zac Medico
Details | Diff
actively invalidate cache when adding/removing packages (don't rely on mtime) (invalid_cache.patch,2.98 KB, patch)
2008-06-16 13:36 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Francisco J. Vazquez 2008-05-24 10:08:31 UTC
Installing a new gentoo system from stage3 I got 3 or 4 errors when emerging -e system. They always happen in the merging phase. An example:


>>> Install tcp-wrappers-7.6-r8 into /var/tmp/portage/sys-apps/tcp-wrappers-7.6-r8/image/ category sys-apps
>>> Completed installing tcp-wrappers-7.6-r8 into /var/tmp/portage/sys-apps/tcp-wrappers-7.6-r8/image/

ecompressdir: bzip2 -9 /usr/share/man
strip: x86_64-pc-linux-gnu-strip --strip-unneeded -R .comment
   usr/sbin/try-from
   usr/sbin/safe_finger
   usr/sbin/tcpdchk
   usr/sbin/tcpd
   usr/sbin/tcpdmatch
   lib64/libwrap.so.0.7.6
   usr/lib64/libwrap.a
Traceback (most recent call last):
  File "/usr/bin/emerge", line 8996, in <module>
    retval = emerge_main()
  File "/usr/bin/emerge", line 8990, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 8408, in action_build
    retval = mergetask.merge(pkglist, favorites, mtimedb)
  File "/usr/bin/emerge", line 5748, in merge
    return self._merge(mylist, favorites, mtimedb)
  File "/usr/bin/emerge", line 6059, in _merge
    blockers=self._find_blockers(pkg))
  File "/usr/lib64/portage/pym/portage.py", line 5346, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10055, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10063, in _merge
    cleanup=cleanup, mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 9366, in treewalk
    blockers = self._blockers()
  File "/usr/bin/emerge", line 5724, in get_blockers
    return self._find_blockers_with_lock(new_pkg, acquire_lock=0)
  File "/usr/bin/emerge", line 5734, in _find_blockers_with_lock
    acquire_lock=acquire_lock):
  File "/usr/bin/emerge", line 1676, in findInstalledBlockers
    acquire_lock=acquire_lock)
  File "/usr/bin/emerge", line 1172, in __init__
    metadata = dict(izip(mykeys, real_dbapi.aux_get(cpv, mykeys)))
  File "/usr/lib64/portage/pym/portage.py", line 6897, in aux_get
    raise KeyError(mycpv)
KeyError: 'sys-devel/autoconf-wrapper-4-r3'


The thing is, autoconf-wrapper-4-r3 is NOT installed, it had been previously updated to autoconf-wrapper-5 in the same emerge command. I don't know why portage looks for it when merging tcp-wrappers, but that seems to cause the error. I wasn't able to reproduce it with python-2.1.4.4

Reproducible: Sometimes

Steps to Reproduce:
1.untar stage3 2008 beta2, chroot.
2.emerge portage-2.1.5.2
3.emerge -1 autoconf-wrapper tcp-wrappers

If it does not fail the first time:

4.while emerge -1 '<autoconf-wrapper-5' && emerge -1 autoconf-wrapper tcp-wrappers; do sleep 1; done




quad / # emerge --info
Portage 2.1.5.2 (default/linux/amd64/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24.5 x86_64)
=================================================================
System uname: 2.6.24.5 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Timestamp of tree: Sat, 24 May 2008 08:03:01 +0000
app-shells/bash:     3.2_p17-r1
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.61-r1
sys-devel/automake:  1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64 ~amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks parallel-fetch sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LANG="en_US.UTF-8"
LDFLAGS=""
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/overlay"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa amd64 berkdb bzip2 cairo cli cracklib crypt curl dbus dri dvdr encode fbcon ffmpeg flac fortran gdbm gif glade gnome gnome-keyring gpm gstreamer gtk hal iconv ipv6 isdnlog joystick jpeg lirc mad maildir midi mmx mp3 mudflap multilib nautilus ncurses nls nptl nptlonly nsplugin ogg opengl openmp pam pcre pdf perl png pppd python rar readline reflection session spl srt sse sse2 ssl subversion svg tcpd tiff unicode v4l v4l2 vorbis x264 xorg xulrunner xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIRC_DEVICES="devinput" USERLAND="GNU" VIDEO_CARDS="i810 vesa fglrx"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Francisco J. Vazquez 2008-05-24 10:12:11 UTC
(In reply to comment #0)

> I wasn't able to reproduce it with python-2.1.4.4

Sorry, I meant portage-2.1.4.4
Comment 2 Zac Medico gentoo-dev 2008-05-24 18:56:16 UTC
Created attachment 154177 [details, diff]
use floating point mtime for finer grained validation in cachedir()

It seems that it was using a stale directory list. This patch fixes it to use floating point mtime to increase granularity of staleness checks.
Comment 3 Zac Medico gentoo-dev 2008-06-01 10:40:45 UTC
This is fixed in 2.1.5.3.
Comment 4 Francisco J. Vazquez 2008-06-14 20:25:11 UTC
No, it's not fixed in 2.1.5.5. Today, while emerging -e system (merging sys-apps/coreutils):

Traceback (most recent call last):
  File "/usr/bin/emerge", line 9273, in <module>
    retval = emerge_main()
  File "/usr/bin/emerge", line 9267, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 8641, in action_build
    retval = mergetask.merge(mymergelist, favorites, mtimedb)
  File "/usr/bin/emerge", line 5939, in merge
    return self._merge(mylist, favorites, mtimedb)
  File "/usr/bin/emerge", line 6247, in _merge
    blockers=self._find_blockers(pkg))
  File "/usr/lib64/portage/pym/portage.py", line 5524, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10476, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10484, in _merge
    cleanup=cleanup, mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 9929, in treewalk
    owners = self.vartree.dbapi._owners.get_owners(collisions)
  File "/usr/lib64/portage/pym/portage.py", line 7441, in get_owners
    for owner, f in self.iter_owners(path_iter):
  File "/usr/lib64/portage/pym/portage.py", line 7459, in iter_owners
    owners_cache = self._populate()
  File "/usr/lib64/portage/pym/portage.py", line 7416, in _populate
    hash_value = hash_pkg(cpv)
  File "/usr/lib64/portage/pym/portage.py", line 7384, in _hash_pkg
    cpv, ["COUNTER", "_mtime_"])
  File "/usr/lib64/portage/pym/portage.py", line 7155, in aux_get
    raise KeyError(mycpv)
KeyError: 'sys-apps/texinfo-4.8-r5'

------------------------------------------

quad ~ # emerge -a texinfo

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ] sys-apps/texinfo-4.12

------------------------------------------

quad ~ # tail newroot/var/log/emerge.log
1213473677:  === (83 of 116) Compiling/Merging
(sys-apps/texinfo-4.12::/usr/portage/sys-apps/texinfo/texinfo-4.12.ebuild)
1213473693:  === (83 of 116) Post-Build Cleaning
(sys-apps/texinfo-4.12::/usr/portage/sys-apps/texinfo/texinfo-4.12.ebuild)
1213473693:  ::: completed emerge (83 of 116) sys-apps/texinfo-4.12 to /
1213473693:  >>> emerge (84 of 116) sys-apps/coreutils-6.12 to /
1213473693:  === (84 of 116) Cleaning
(sys-apps/coreutils-6.12::/usr/portage/sys-apps/coreutils/coreutils-6.12.ebuild)
1213473693:  === (84 of 116) Compiling/Merging
(sys-apps/coreutils-6.12::/usr/portage/sys-apps/coreutils/coreutils-6.12.ebuild)
1213473770:  *** terminating.

-------------------------------------------

So, texinfo had been upgraded from 4.8 to 4.12 just before coreutils, but portage was looking for the old texinfo directory.
Comment 5 Zac Medico gentoo-dev 2008-06-15 00:36:11 UTC
It seems quite odd that the floating point mtime staleness checks are insufficient. Is there anything special about the filesystem that /var/db/pkg is stored on? Are you using any special mount options or kernel patches?

What's also suspicious is that that with all the other people testing portage-2.1.5.x, you're the only person who has reported this issue (maybe 2 because there's 1 person on the CC list).
Comment 6 Francisco J. Vazquez 2008-06-15 10:11:35 UTC
(In reply to comment #5)
> It seems quite odd that the floating point mtime staleness checks are
> insufficient. Is there anything special about the filesystem that /var/db/pkg
> is stored on? Are you using any special mount options or kernel patches?
> 
Nope, just plain old ext3 with noatime. And I don't think it's a hardware problem because I've been running the same machine for almost a year.

Anyway, to be fair, I've been trying to reproduce the error like last time and I couldn't. I left an emerging loop for some time and it never failed. But since the error was exactly the same I assumed the problem was still there.

I'll try with another emerge -e system from stage 3 in other harddisk.
Comment 7 Francisco J. Vazquez 2008-06-15 10:17:41 UTC
Oh, and no kernel patches either. I use vanilla-sources-2.6.24.7
Comment 8 Francisco J. Vazquez 2008-06-15 13:20:00 UTC
Ok, there must definitely be something wrong with portage-2.1.5.*. Other partition in another hard disk:


>>> Completed installing libpcre-7.7 into /var/tmp/portage/dev-libs/libpcre-7.7/image/

ecompressdir: bzip2 -9 /usr/share/man
strip: x86_64-pc-linux-gnu-strip --strip-unneeded -R .comment
   usr/lib64/libpcre.so.0.0.1
   usr/lib64/libpcrecpp.so.0.0.0
   usr/lib64/libpcreposix.so.0.0.0
   usr/bin/pcretest
   usr/bin/pcregrep
   usr/lib64/libpcrecpp.a
   usr/lib64/libpcre.a
   usr/lib64/libpcreposix.a
removing executable bit: usr/lib64/libpcrecpp.la
removing executable bit: usr/lib64/libpcre.la
removing executable bit: usr/lib64/libpcreposix.la
Traceback (most recent call last):
  File "/usr/bin/emerge", line 9273, in <module>
    retval = emerge_main()
  File "/usr/bin/emerge", line 9267, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 8641, in action_build
    retval = mergetask.merge(mymergelist, favorites, mtimedb)
  File "/usr/bin/emerge", line 5939, in merge
    return self._merge(mylist, favorites, mtimedb)
  File "/usr/bin/emerge", line 6247, in _merge
    blockers=self._find_blockers(pkg))
  File "/usr/lib64/portage/pym/portage.py", line 5524, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10476, in merge
    mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 10484, in _merge
    cleanup=cleanup, mydbapi=mydbapi, prev_mtimes=prev_mtimes)
  File "/usr/lib64/portage/pym/portage.py", line 9798, in treewalk
    blockers = self._blockers()
  File "/usr/bin/emerge", line 5915, in get_blockers
    return self._find_blockers_with_lock(new_pkg, acquire_lock=0)
  File "/usr/bin/emerge", line 5925, in _find_blockers_with_lock
    acquire_lock=acquire_lock):
  File "/usr/bin/emerge", line 1782, in findInstalledBlockers
    FakeVartree(self._root_config, acquire_lock=acquire_lock)
  File "/usr/bin/emerge", line 1205, in __init__
    real_dbapi.flush_cache()
  File "/usr/lib64/portage/pym/portage.py", line 7060, in flush_cache
    self._owners.populate() # index any unindexed contents
  File "/usr/lib64/portage/pym/portage.py", line 7397, in populate
    self._populate()
  File "/usr/lib64/portage/pym/portage.py", line 7416, in _populate
    hash_value = hash_pkg(cpv)
  File "/usr/lib64/portage/pym/portage.py", line 7384, in _hash_pkg
    cpv, ["COUNTER", "_mtime_"])
  File "/usr/lib64/portage/pym/portage.py", line 7155, in aux_get
    raise KeyError(mycpv)
KeyError: 'app-arch/cpio-2.9-r1'

-----------------------------------------------

/var/log/emerge.log:

1213535054:  ::: completed emerge (18 of 116) app-arch/bzip2-1.0.5 to /
1213535054:  >>> emerge (19 of 116) app-arch/cpio-2.9-r2 to /
1213535054:  === (19 of 116) Cleaning (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/cpio-2.9-r2.$
1213535055:  === (19 of 116) Compiling/Merging (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/cpi$
1213535075:  === (19 of 116) Post-Build Cleaning (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/c$
1213535075:  ::: completed emerge (19 of 116) app-arch/cpio-2.9-r2 to /
1213535075:  >>> emerge (20 of 116) sys-kernel/linux-headers-2.6.25-r4 to /
1213535075:  === (20 of 116) Cleaning (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/sys-kernel/l$
1213535075:  === (20 of 116) Compiling/Merging (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/sys$
1213535083:  === (20 of 116) Post-Build Cleaning (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/s$
1213535083:  ::: completed emerge (20 of 116) sys-kernel/linux-headers-2.6.25-r4 to /
1213535083:  >>> emerge (21 of 116) dev-libs/mpfr-2.3.1 to /
1213535083:  === (21 of 116) Cleaning (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mpfr-2.3.1.eb$
1213535083:  === (21 of 116) Compiling/Merging (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mpfr$
1213535114:  === (21 of 116) Post-Build Cleaning (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mp$
1213535114:  ::: completed emerge (21 of 116) dev-libs/mpfr-2.3.1 to /
1213535114:  >>> emerge (22 of 116) dev-libs/libpcre-7.7 to /
1213535114:  === (22 of 116) Cleaning (dev-libs/libpcre-7.7::/usr/portage/dev-libs/libpcre/libpcre-7$
1213535114:  === (22 of 116) Compiling/Merging (dev-libs/libpcre-7.7::/usr/portage/dev-libs/libpcre/$
1213535136:  *** terminating.


This never happened with portage-2.1.4.*. I'll stick with it while I can (or maybe I'll try my luck with 2.2_pre).
Comment 9 Zac Medico gentoo-dev 2008-06-16 00:35:05 UTC
Created attachment 157019 [details, diff]
disable the vardbapi.cpv_all() cache

This patch should solve the problem for you. However, I'm still not sure what's special about you that makes this case trigger.
Comment 10 Zac Medico gentoo-dev 2008-06-16 01:04:43 UTC
There aren't enough vardbapi.cpv_all() calls for disabling the cache to cause a noticeable performance decrease. However, I'm reluctant to disable it by default since I don't see how you are able to trigger this problem so often. Normally, the mtime of a directory changes when it's contents are modified. For some reason directory mtime checks appear to be unreliable on the systems where you've triggered the KeyError.
Comment 11 Zac Medico gentoo-dev 2008-06-16 01:44:49 UTC
Perhaps the problem is that the version of python that you have has int mtimes instead of float mtimes. Please post the output of this command:

python -c 'import os; print os.stat_float_times()'

If that turn out to be the difference then I suppose we should simply disable vardbapi.cpv_all() caching when float times aren't available.
Comment 12 Zac Medico gentoo-dev 2008-06-16 02:14:22 UTC
(In reply to comment #11)
> If that turn out to be the difference then I suppose we should simply disable
> vardbapi.cpv_all() caching when float times aren't available.

Actually, I think the best fix for this will be to add some cache maintenance in the merge/unmerge code. We already do cache maintenance on the vardbapi.match() cache (inside dblink.treewalk() and dblink.delete()), so we can add similar maintenance for the vardbapi.cpv_all() cache.
Comment 13 Zac Medico gentoo-dev 2008-06-16 09:02:25 UTC
As a temporary workaround, I've increased the python dependency to 2.5 since that's needed for float mtime support. After we remove the dependence on float mtimes, we can decrease the dependency back to the previous 2.4 value again.
Comment 14 Francisco J. Vazquez 2008-06-16 12:59:31 UTC
(In reply to comment #11)
> Perhaps the problem is that the version of python that you have has int mtimes
> instead of float mtimes. Please post the output of this command:
> 
> python -c 'import os; print os.stat_float_times()'

It returns True; the first things I do after a chroot into the stage3 are:

env-update
source /etc/profile
emerge --metadata
emerge -1 portage
etc-update
emerge gentoolkit ufed
emerge -1 python
python-updater
emerge -C '<python-2.5'
(update toolchain)
emerge -eva system

So I had python 2.5 installed before the emerge -e system.

I also think it's strange that nobody else is reporting this problem. I'm quite confident in my computer stability, after 8 months crunching numbers without errors. And if it was a hardware problem (memory/cpu/mb) I suppose I'd also get other types of errors. Maybe people don't report it because it's a random error and emerge --resume "solves" it. 

Anyway, now I have access to another quad core system similar to mine. I'll leave it for a day in an emerge loop and see if it fails. If it doesn't, I'll have to assume that my computer has a weird problem that only manifests with this error.
Comment 15 Zac Medico gentoo-dev 2008-06-16 13:36:43 UTC
Created attachment 157083 [details, diff]
actively invalidate cache when adding/removing packages (don't rely on mtime)

Hopefully this will solve it. Please test.
Comment 16 Francisco J. Vazquez 2008-06-16 22:57:45 UTC
(In reply to comment #15)
> Created an attachment (id=157083) [edit]
> actively invalidate cache when adding/removing packages (don't rely on mtime)
> 
> Hopefully this will solve it. Please test.
> 
Yeah, it works ok now. At least I completed an emerge -e system from a stage3 without errors, something I couldn't do before. Thanks :)
Comment 17 Zac Medico gentoo-dev 2008-06-18 00:13:37 UTC
Thank you for testing. It's released in 2.1.5.6.