Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 573386 - sys-apps/portage-2.2.27: emerge -a crashes: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte
Summary: sys-apps/portage-2.2.27: emerge -a crashes: UnicodeDecodeError: 'utf-8' codec...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core - Interface (emerge) (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: portage-2.3.0
  Show dependency tree
 
Reported: 2016-01-30 11:31 UTC by Andrew Savchenko
Modified: 2016-03-14 03:11 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge.info,10.36 KB, text/plain)
2016-01-30 11:33 UTC, Andrew Savchenko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Savchenko gentoo-dev 2016-01-30 11:31:11 UTC
Hi,

emerge -a crashes on non-english text in the confirmation string if this text was deleted and replaced with proper variant, e.g.

# emerge -av openrc
...
Would you like to merge these packages? [Yes/No] y
Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.4/emerge", line 50, in <module>
    retval = emerge_main()
  File "/usr/lib64/python3.4/site-packages/_emerge/main.py", line 1185, in emerge_main
    return run_action(emerge_config)
  File "/usr/lib64/python3.4/site-packages/_emerge/actions.py", line 3236, in run_action
    emerge_config.args, spinner)
  File "/usr/lib64/python3.4/site-packages/_emerge/actions.py", line 400, in action_build
    uq.query(prompt, enter_invalid) == "No":
  File "/usr/lib64/python3.4/site-packages/_emerge/UserQuery.py", line 57, in query
    for i in range(len(responses))])+"] ")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte

At the point where "y" input is entered I originally entered a russian letter (forgot to switch my keyboard to english layout), pressed backspace once, wrong letter disappeared, and entered "y" letter.

I suspect that backspace erased only one byte of two-byte character encoding, thus crash happened. If I press backspace twice crash doesn't happen.

Please catch invalid utf-8 conversions and just ask for response once more instead of crashing :)
Comment 1 Andrew Savchenko gentoo-dev 2016-01-30 11:32:22 UTC
$ eix -e portage

sys-apps/portage
     Available versions:  2.2.8-r2 2.2.20.1{tbz2} 2.2.24 2.2.26{tbz2} (~)2.2.27{tbz2} **9999{tbz2} {build doc epydoc +ipc pypy2_0 python2 python3 selinux xattr LINGUAS="ru" PYTHON_TARGETS="pypy pypy2_0 python2_6 python2_7 python3_2 python3_3 python3_4 python3_5"}
     Installed versions:  2.2.27{tbz2}(08:32:35 PM 01/22/2016)(doc ipc -build -epydoc -selinux -xattr LINGUAS="ru" PYTHON_TARGETS="python2_7 python3_4 -pypy -python3_3 -python3_5")
Comment 2 Andrew Savchenko gentoo-dev 2016-01-30 11:33:04 UTC
Created attachment 424240 [details]
emerge --info
Comment 3 Alexander Berntsen (RETIRED) gentoo-dev 2016-02-01 14:13:11 UTC
I can't recreate this in HEAD with Norwegian letters, Hiragana, Katakana, or Kanji.

1. Could you try this in HEAD and see if the issue persists?
2. If so, could you try to recreate it with e.g. a Kanji?
3. If you're unable to recreate it with Kanji, could you provide more info? Like how you switch layouts, which specific Cryllic letter you entered, and other useful info?
Comment 4 Zac Medico gentoo-dev 2016-02-01 17:21:52 UTC
(In reply to Alexander Berntsen from comment #3)
> I can't recreate this in HEAD with Norwegian letters, Hiragana, Katakana, or
> Kanji.

In any case, nothing prevents the user from entering characters that will fail to decode. Under python3, the input function (which returns a unicode string) will raise UnicodeDecodeError. Under python2, raw_input returns bytes, so it won't raise UnicodeDecodeError, but comparison with values from the responses variable can raise UnicodeDecodeError if that variable contains unicode strings (it might because of unicode_literals being enabled in the calling module).
Comment 5 Zac Medico gentoo-dev 2016-02-01 17:58:03 UTC
There's a patch in the following branch:

https://github.com/zmedico/portage/tree/bug_573386

I've posted it for review here:

https://archives.gentoo.org/gentoo-portage-dev/message/fcb0fb00154ce53f67c7bafb482130b7
Comment 6 Andrew Savchenko gentoo-dev 2016-02-01 18:39:21 UTC
Hi,

(In reply to Alexander Berntsen from comment #3)
> I can't recreate this in HEAD with Norwegian letters, Hiragana, Katakana, or
> Kanji.
>
> 1. Could you try this in HEAD and see if the issue persists?

Yes, it does.

> 2. If so, could you try to recreate it with e.g. a Kanji?

I can recreate this with kanji (e.g. "幸").

> 3. If you're unable to recreate it with Kanji, could you provide more info?
> Like how you switch layouts, which specific Cryllic letter you entered, and
> other useful info?

I use xterm, probably this matters. Any russian letter is sufficient to reproduce (e.g. "н"). I switch layouts using Xorg's grp:lwin_toggle XkbOptions switch.
Comment 7 Andrew Savchenko gentoo-dev 2016-02-01 18:39:52 UTC
(In reply to Zac Medico from comment #5)
> There's a patch in the following branch:
> 
> https://github.com/zmedico/portage/tree/bug_573386

Thanks, I'll try it in a while.
Comment 8 Andrew Savchenko gentoo-dev 2016-02-01 21:51:49 UTC
(In reply to Zac Medico from comment #5)
> There's a patch in the following branch:
> 
> https://github.com/zmedico/portage/tree/bug_573386

This patch fixes my issue, thanks.

One little suggestion: replace "None" string by something more informational for an end-user like "Wrong unicode string".
Comment 9 Zac Medico gentoo-dev 2016-02-02 01:36:18 UTC
(In reply to Andrew Savchenko from comment #8)
> (In reply to Zac Medico from comment #5)
> > There's a patch in the following branch:
> > 
> > https://github.com/zmedico/portage/tree/bug_573386
> 
> This patch fixes my issue, thanks.
> 
> One little suggestion: replace "None" string by something more informational
> for an end-user like "Wrong unicode string".

I've updated it to display the string returned by str(errors='replace').

This is in the master branch:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=1929e2bdab686004697c3d0fd999a563db43349c
Comment 10 Zac Medico gentoo-dev 2016-02-02 01:41:57 UTC
(In reply to Zac Medico from comment #9)
> (In reply to Andrew Savchenko from comment #8)
> This is in the master branch:
> 
> https://gitweb.gentoo.org/proj/portage.git/commit/
> ?id=1929e2bdab686004697c3d0fd999a563db43349c

I messed up making a last-moment change to that commit, and this fixes it:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=f292c3519082ad0932ad3a83a19e547c53b80824
Comment 11 Zac Medico gentoo-dev 2016-03-14 03:11:35 UTC
Fixed in 2.2.28.