big_daddy layman # eupdatedb * indexing: 14815 ebuilds to go * Missing digest for '/usr/local/portage/app-portage/some-package/some-package-0.6.0.ebuild' 8293 ebuilds to goTraceback (most recent call last): File "/usr/bin/eupdatedb", line 5, in <module> main() File "/usr/lib64/python2.7/site-packages/esearch/update.py", line 252, in main success = updatedb(config) File "/usr/lib64/python2.7/site-packages/esearch/update.py", line 208, in updatedb str(description), UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 125: ordinal not in range(128) It appears that unicode is getting into ebuild descriptions now. Looks like we need to convert the db into full unicode. Which judging from layman's unicode db may be a pain to code. I wasn't able to get layman's code to work with both py-2 and py-3 at the same time. It may need to be coded such that we need to run 2to3 on the code for py-3 installs. Reproducible: Always
I don't have the problem on any of my machines which indicates the ebuild is coming from an overlay. Which overlays do you have installed?
big_daddy layman # layman -l * mgorny [Git ] (git://git.overlays.gentoo.org/dev/mgorny.git ) * multimedia [Git ] (git://gitorious.org/gentoo-multimedia/gentoo-multimedia.git ) * mva [Git ] (git://github.com/msva/mva-overlay ) * science [Git ] (git://git.overlays.gentoo.org/proj/sci.git ) * sunrise [Git ] (git://git.overlays.gentoo.org/proj/sunrise-reviewed.git ) * xfce-dev [Git ] (git://git.overlays.gentoo.org/proj/xfce.git ) big_daddy layman # I'll add some debug try:except pairs to the code to try and trap them. That should make things work a little better and give us more info where the unicode is coming from. We may be able to do a char substitution for the offending string as a temp workaround as well as report it to stderr. What about adding logging to esearch? Might be good to have things like this filed for bug submittal.
It should work fine if we just write esearchdb.py with UTF-8 encoding and put a line like "# -*- coding: UTF8 -*-" at the top. Instead of using str(), use _unicode() like portage typically does: if sys.hexversion >= 0x3000000: _unicode = str else: _unicode = unicode And open the unicode file like this: dbfile = io.open(dbfd, mode="w", encoding="utf_8") dbfile.write(_unicode("# -*- coding: UTF8 -*-\n")) Just use _unicode() instead of str() to wrap any strings that you write to dbfile, and it should work find because the strings that come from portage are all unicode.
With Zac's help. it is now saving the db in unicode in a py2 and py3 compatible way. No matter which python creates the db it will load correctly in either pythons. commit: https://github.com/fuzzyray/esearch/commit/2be2aa2f0f66c6e68acd0ea4b5b49e55305836f2
Released in esearch-1.3