Summary: | portage-2.1.5.2 and 2.1.5.5: Random emerge failures in the merging phase (KeyError) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Francisco J. Vazquez <dv> |
Component: | [OLD] Core system | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | hans |
Priority: | High | Keywords: | InVCS, REGRESSION |
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 216231 | ||
Attachments: |
use floating point mtime for finer grained validation in cachedir()
disable the vardbapi.cpv_all() cache actively invalidate cache when adding/removing packages (don't rely on mtime) |
Description
Francisco J. Vazquez
2008-05-24 10:08:31 UTC
(In reply to comment #0) > I wasn't able to reproduce it with python-2.1.4.4 Sorry, I meant portage-2.1.4.4 Created attachment 154177 [details, diff]
use floating point mtime for finer grained validation in cachedir()
It seems that it was using a stale directory list. This patch fixes it to use floating point mtime to increase granularity of staleness checks.
This is fixed in 2.1.5.3. No, it's not fixed in 2.1.5.5. Today, while emerging -e system (merging sys-apps/coreutils): Traceback (most recent call last): File "/usr/bin/emerge", line 9273, in <module> retval = emerge_main() File "/usr/bin/emerge", line 9267, in emerge_main myopts, myaction, myfiles, spinner) File "/usr/bin/emerge", line 8641, in action_build retval = mergetask.merge(mymergelist, favorites, mtimedb) File "/usr/bin/emerge", line 5939, in merge return self._merge(mylist, favorites, mtimedb) File "/usr/bin/emerge", line 6247, in _merge blockers=self._find_blockers(pkg)) File "/usr/lib64/portage/pym/portage.py", line 5524, in merge mydbapi=mydbapi, prev_mtimes=prev_mtimes) File "/usr/lib64/portage/pym/portage.py", line 10476, in merge mydbapi=mydbapi, prev_mtimes=prev_mtimes) File "/usr/lib64/portage/pym/portage.py", line 10484, in _merge cleanup=cleanup, mydbapi=mydbapi, prev_mtimes=prev_mtimes) File "/usr/lib64/portage/pym/portage.py", line 9929, in treewalk owners = self.vartree.dbapi._owners.get_owners(collisions) File "/usr/lib64/portage/pym/portage.py", line 7441, in get_owners for owner, f in self.iter_owners(path_iter): File "/usr/lib64/portage/pym/portage.py", line 7459, in iter_owners owners_cache = self._populate() File "/usr/lib64/portage/pym/portage.py", line 7416, in _populate hash_value = hash_pkg(cpv) File "/usr/lib64/portage/pym/portage.py", line 7384, in _hash_pkg cpv, ["COUNTER", "_mtime_"]) File "/usr/lib64/portage/pym/portage.py", line 7155, in aux_get raise KeyError(mycpv) KeyError: 'sys-apps/texinfo-4.8-r5' ------------------------------------------ quad ~ # emerge -a texinfo These are the packages that would be merged, in order: Calculating dependencies... done! [ebuild R ] sys-apps/texinfo-4.12 ------------------------------------------ quad ~ # tail newroot/var/log/emerge.log 1213473677: === (83 of 116) Compiling/Merging (sys-apps/texinfo-4.12::/usr/portage/sys-apps/texinfo/texinfo-4.12.ebuild) 1213473693: === (83 of 116) Post-Build Cleaning (sys-apps/texinfo-4.12::/usr/portage/sys-apps/texinfo/texinfo-4.12.ebuild) 1213473693: ::: completed emerge (83 of 116) sys-apps/texinfo-4.12 to / 1213473693: >>> emerge (84 of 116) sys-apps/coreutils-6.12 to / 1213473693: === (84 of 116) Cleaning (sys-apps/coreutils-6.12::/usr/portage/sys-apps/coreutils/coreutils-6.12.ebuild) 1213473693: === (84 of 116) Compiling/Merging (sys-apps/coreutils-6.12::/usr/portage/sys-apps/coreutils/coreutils-6.12.ebuild) 1213473770: *** terminating. ------------------------------------------- So, texinfo had been upgraded from 4.8 to 4.12 just before coreutils, but portage was looking for the old texinfo directory. It seems quite odd that the floating point mtime staleness checks are insufficient. Is there anything special about the filesystem that /var/db/pkg is stored on? Are you using any special mount options or kernel patches? What's also suspicious is that that with all the other people testing portage-2.1.5.x, you're the only person who has reported this issue (maybe 2 because there's 1 person on the CC list). (In reply to comment #5) > It seems quite odd that the floating point mtime staleness checks are > insufficient. Is there anything special about the filesystem that /var/db/pkg > is stored on? Are you using any special mount options or kernel patches? > Nope, just plain old ext3 with noatime. And I don't think it's a hardware problem because I've been running the same machine for almost a year. Anyway, to be fair, I've been trying to reproduce the error like last time and I couldn't. I left an emerging loop for some time and it never failed. But since the error was exactly the same I assumed the problem was still there. I'll try with another emerge -e system from stage 3 in other harddisk. Oh, and no kernel patches either. I use vanilla-sources-2.6.24.7 Ok, there must definitely be something wrong with portage-2.1.5.*. Other partition in another hard disk:
>>> Completed installing libpcre-7.7 into /var/tmp/portage/dev-libs/libpcre-7.7/image/
ecompressdir: bzip2 -9 /usr/share/man
strip: x86_64-pc-linux-gnu-strip --strip-unneeded -R .comment
usr/lib64/libpcre.so.0.0.1
usr/lib64/libpcrecpp.so.0.0.0
usr/lib64/libpcreposix.so.0.0.0
usr/bin/pcretest
usr/bin/pcregrep
usr/lib64/libpcrecpp.a
usr/lib64/libpcre.a
usr/lib64/libpcreposix.a
removing executable bit: usr/lib64/libpcrecpp.la
removing executable bit: usr/lib64/libpcre.la
removing executable bit: usr/lib64/libpcreposix.la
Traceback (most recent call last):
File "/usr/bin/emerge", line 9273, in <module>
retval = emerge_main()
File "/usr/bin/emerge", line 9267, in emerge_main
myopts, myaction, myfiles, spinner)
File "/usr/bin/emerge", line 8641, in action_build
retval = mergetask.merge(mymergelist, favorites, mtimedb)
File "/usr/bin/emerge", line 5939, in merge
return self._merge(mylist, favorites, mtimedb)
File "/usr/bin/emerge", line 6247, in _merge
blockers=self._find_blockers(pkg))
File "/usr/lib64/portage/pym/portage.py", line 5524, in merge
mydbapi=mydbapi, prev_mtimes=prev_mtimes)
File "/usr/lib64/portage/pym/portage.py", line 10476, in merge
mydbapi=mydbapi, prev_mtimes=prev_mtimes)
File "/usr/lib64/portage/pym/portage.py", line 10484, in _merge
cleanup=cleanup, mydbapi=mydbapi, prev_mtimes=prev_mtimes)
File "/usr/lib64/portage/pym/portage.py", line 9798, in treewalk
blockers = self._blockers()
File "/usr/bin/emerge", line 5915, in get_blockers
return self._find_blockers_with_lock(new_pkg, acquire_lock=0)
File "/usr/bin/emerge", line 5925, in _find_blockers_with_lock
acquire_lock=acquire_lock):
File "/usr/bin/emerge", line 1782, in findInstalledBlockers
FakeVartree(self._root_config, acquire_lock=acquire_lock)
File "/usr/bin/emerge", line 1205, in __init__
real_dbapi.flush_cache()
File "/usr/lib64/portage/pym/portage.py", line 7060, in flush_cache
self._owners.populate() # index any unindexed contents
File "/usr/lib64/portage/pym/portage.py", line 7397, in populate
self._populate()
File "/usr/lib64/portage/pym/portage.py", line 7416, in _populate
hash_value = hash_pkg(cpv)
File "/usr/lib64/portage/pym/portage.py", line 7384, in _hash_pkg
cpv, ["COUNTER", "_mtime_"])
File "/usr/lib64/portage/pym/portage.py", line 7155, in aux_get
raise KeyError(mycpv)
KeyError: 'app-arch/cpio-2.9-r1'
-----------------------------------------------
/var/log/emerge.log:
1213535054: ::: completed emerge (18 of 116) app-arch/bzip2-1.0.5 to /
1213535054: >>> emerge (19 of 116) app-arch/cpio-2.9-r2 to /
1213535054: === (19 of 116) Cleaning (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/cpio-2.9-r2.$
1213535055: === (19 of 116) Compiling/Merging (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/cpi$
1213535075: === (19 of 116) Post-Build Cleaning (app-arch/cpio-2.9-r2::/usr/portage/app-arch/cpio/c$
1213535075: ::: completed emerge (19 of 116) app-arch/cpio-2.9-r2 to /
1213535075: >>> emerge (20 of 116) sys-kernel/linux-headers-2.6.25-r4 to /
1213535075: === (20 of 116) Cleaning (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/sys-kernel/l$
1213535075: === (20 of 116) Compiling/Merging (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/sys$
1213535083: === (20 of 116) Post-Build Cleaning (sys-kernel/linux-headers-2.6.25-r4::/usr/portage/s$
1213535083: ::: completed emerge (20 of 116) sys-kernel/linux-headers-2.6.25-r4 to /
1213535083: >>> emerge (21 of 116) dev-libs/mpfr-2.3.1 to /
1213535083: === (21 of 116) Cleaning (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mpfr-2.3.1.eb$
1213535083: === (21 of 116) Compiling/Merging (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mpfr$
1213535114: === (21 of 116) Post-Build Cleaning (dev-libs/mpfr-2.3.1::/usr/portage/dev-libs/mpfr/mp$
1213535114: ::: completed emerge (21 of 116) dev-libs/mpfr-2.3.1 to /
1213535114: >>> emerge (22 of 116) dev-libs/libpcre-7.7 to /
1213535114: === (22 of 116) Cleaning (dev-libs/libpcre-7.7::/usr/portage/dev-libs/libpcre/libpcre-7$
1213535114: === (22 of 116) Compiling/Merging (dev-libs/libpcre-7.7::/usr/portage/dev-libs/libpcre/$
1213535136: *** terminating.
This never happened with portage-2.1.4.*. I'll stick with it while I can (or maybe I'll try my luck with 2.2_pre).
Created attachment 157019 [details, diff]
disable the vardbapi.cpv_all() cache
This patch should solve the problem for you. However, I'm still not sure what's special about you that makes this case trigger.
There aren't enough vardbapi.cpv_all() calls for disabling the cache to cause a noticeable performance decrease. However, I'm reluctant to disable it by default since I don't see how you are able to trigger this problem so often. Normally, the mtime of a directory changes when it's contents are modified. For some reason directory mtime checks appear to be unreliable on the systems where you've triggered the KeyError. Perhaps the problem is that the version of python that you have has int mtimes instead of float mtimes. Please post the output of this command: python -c 'import os; print os.stat_float_times()' If that turn out to be the difference then I suppose we should simply disable vardbapi.cpv_all() caching when float times aren't available. (In reply to comment #11) > If that turn out to be the difference then I suppose we should simply disable > vardbapi.cpv_all() caching when float times aren't available. Actually, I think the best fix for this will be to add some cache maintenance in the merge/unmerge code. We already do cache maintenance on the vardbapi.match() cache (inside dblink.treewalk() and dblink.delete()), so we can add similar maintenance for the vardbapi.cpv_all() cache. As a temporary workaround, I've increased the python dependency to 2.5 since that's needed for float mtime support. After we remove the dependence on float mtimes, we can decrease the dependency back to the previous 2.4 value again. (In reply to comment #11) > Perhaps the problem is that the version of python that you have has int mtimes > instead of float mtimes. Please post the output of this command: > > python -c 'import os; print os.stat_float_times()' It returns True; the first things I do after a chroot into the stage3 are: env-update source /etc/profile emerge --metadata emerge -1 portage etc-update emerge gentoolkit ufed emerge -1 python python-updater emerge -C '<python-2.5' (update toolchain) emerge -eva system So I had python 2.5 installed before the emerge -e system. I also think it's strange that nobody else is reporting this problem. I'm quite confident in my computer stability, after 8 months crunching numbers without errors. And if it was a hardware problem (memory/cpu/mb) I suppose I'd also get other types of errors. Maybe people don't report it because it's a random error and emerge --resume "solves" it. Anyway, now I have access to another quad core system similar to mine. I'll leave it for a day in an emerge loop and see if it fails. If it doesn't, I'll have to assume that my computer has a weird problem that only manifests with this error. Created attachment 157083 [details, diff]
actively invalidate cache when adding/removing packages (don't rely on mtime)
Hopefully this will solve it. Please test.
(In reply to comment #15) > Created an attachment (id=157083) [edit] > actively invalidate cache when adding/removing packages (don't rely on mtime) > > Hopefully this will solve it. Please test. > Yeah, it works ok now. At least I completed an emerge -e system from a stage3 without errors, something I couldn't do before. Thanks :) Thank you for testing. It's released in 2.1.5.6. |