Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 800149 - app-portage/mirrorselect-2.2.6-r2: Unhandled UnicodeDecodeError on byte 0xf6
Summary: app-portage/mirrorselect-2.2.6-r2: Unhandled UnicodeDecodeError on byte 0xf6
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage Tools Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-03 09:34 UTC by Stefan Huber
Modified: 2021-07-05 15:55 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Huber 2021-07-03 09:34:08 UTC
When I call `mirrorselect -s3 -b10 -D` then I get a UnicodeDecodeError as 0xf6 cannot be decoded as utf-8. IMHO even if such an error occurs on one mirror server, mirrorselect should (i) not abort by an unhandled exception and (ii) probably just ignore this one mirror server.

* Downloading mirrorselect-test files from each mirror... [71 of 196]Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.9/mirrorselect", line 61, in <module>
    MirrorSelect().main(sys.argv)
  File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 380, in main
    urls = self.select_urls(hosts, options)
  File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 329, in select_urls
    selector = Deep(hosts, options, self.output)
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 245, in __init__
    self.deeptest()
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 272, in deeptest
    mytime, ignore = self.deeptime(host, maxtime)
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 356, in deeptime
    f, test_url, early_out = self._test_connection(test_url, url_parts,
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 444, in _test_connection
    f = url_open(r)
  File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 1563, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 1584, in connect_ftp
    return ftpwrapper(user, passwd, host, port, dirs, timeout,
  File "/usr/lib/python3.9/urllib/request.py", line 2405, in __init__
    self.init()
  File "/usr/lib/python3.9/urllib/request.py", line 2414, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python3.9/ftplib.py", line 162, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python3.9/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
  File "/usr/lib/python3.9/ftplib.py", line 234, in getmultiline
    nextline = self.getline()
  File "/usr/lib/python3.9/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 14: invalid start byte


Reproducible: Always
Comment 1 Zac Medico gentoo-dev 2021-07-03 17:03:29 UTC
I wonder what the root cause of the error is though. Was the mirror server in a bad state or what?
Comment 2 Stefan Huber 2021-07-03 17:45:57 UTC
I patched mirrorselect to output the affected mirror:
ftp://141.76.119.131/pub/mirrors/gentoo/distfiles/mirrorselect-test

The thing is that 0xf6 should be an "ö" which is part of the welcome message on this ftp server.

curl -vv ftp://141.76.119.131/pub/mirrors/gentoo/distfiles/mirrorselect-test
*   Trying 141.76.119.131:21...
* Connected to 141.76.119.131 (141.76.119.131) port 21 (#0)
< 220-Welcome to ftp.wh2.tu-dresden.de
< 220-
< 220-
< 220-**** Vergrerung des Mirror-Space von 1TB auf 4TB abgeschlossen ****
[…]

It is the "Vergr[ö][ß]erung des Mirror-Space[…]" Message. I am going to contact the operators once I learn how :)


Still mirrorselect shall not die by that.
Comment 3 Zac Medico gentoo-dev 2021-07-03 19:04:53 UTC
If the mirror is in fact in an operational state, then I would prefer that mirrorselect crash rather that incorrectly behave as though the mirror is not available.
Comment 4 Stefan Huber 2021-07-03 19:20:43 UTC
Some more information to track down the problem:

In mirrorselect/selectors.py the FTP connection is done via urllib.request.urlopen(). The default ftp handler urllib is created with the default arguments of the ftplib.FTP constructor in ftpwraper.init() in urllib/request.py. The FTP class of ftplib sets as default encoding 'utf-8' in ftplib.py. This eventually causes an exception in FTP.readline() when reading the greeting message.

At a first glance, fixing this seems hard. First of all, FTP greeting messages shall be (7-bit) ASCII by RFC 959, so this is technically no bug of mirrorselect. Any attempt to still fix this would imply to make assumptions on the encoding (of non 7-bit ASCII characters). But even if one would make an assumption different to utf-8 then one would need to provide another FTP handler to urllib, and this seems disproportional.

Still, mirrorselect shall not die if any FTP server is using non-ASCII characters in the greeting text. Instead the exception shall be caught, a warning shall be printed and the server shall be ignored IMHO:
Comment 5 Stefan Huber 2021-07-03 19:23:27 UTC
(In reply to Zac Medico from comment #3)
> If the mirror is in fact in an operational state, then I would prefer that
> mirrorselect crash rather that incorrectly behave as though the mirror is
> not available.

How can a crash due to an unhandled exception ever be preferred? At best it shall catch the exception and return with an error status code, but it shall never crash.

But in a distributed system it is a bad idea that one single node instance can misbehave in a way that stops the entire system (here: mirrorselect) to work.
Comment 6 Zac Medico gentoo-dev 2021-07-03 19:42:44 UTC
(In reply to Stefan Huber from comment #5)
> (In reply to Zac Medico from comment #3)
> > If the mirror is in fact in an operational state, then I would prefer that
> > mirrorselect crash rather that incorrectly behave as though the mirror is
> > not available.
> 
> How can a crash due to an unhandled exception ever be preferred? At best it
> shall catch the exception and return with an error status code, but it shall
> never crash.
> 
> But in a distributed system it is a bad idea that one single node instance
> can misbehave in a way that stops the entire system (here: mirrorselect) to
> work.

But if the mirror is in fact operational then what you are suggesting equates to leaving an unfixed bug in mirrorselect, logging the error, and ignoring the mirror.
Comment 7 Stefan Huber 2021-07-03 19:50:13 UTC
(In reply to Zac Medico from comment #6)

> But if the mirror is in fact operational then what you are suggesting
> equates to leaving an unfixed bug in mirrorselect, logging the error, and
> ignoring the mirror.

What I suggest is to keep the system as operable as possible while reporting the erroneous node. (By the way the erroneous node isn't reported at the moment because mirrorselect dies.)

If you have hundreds of HTTP servers that deliver a web service then you would not let the service fail when a single HTTP server misbehaves. You would go on with the rest until you fixed the one.
Comment 8 Stefan Huber 2021-07-03 19:57:35 UTC
(In reply to Zac Medico from comment #6)

> But if the mirror is in fact operational then what you are suggesting
> equates to leaving an unfixed bug in mirrorselect, logging the error, and
> ignoring the mirror.

I shall also add that the mirror is not operational to mirrorselect and the root cause is an illegal use of characters in the greeting message.

The status quo means that mirrorselect does not work at all. My suggestion is to make mirrorselect work again even if an FTP server is non-operational for mirrorselect.
Comment 9 Stefan Huber 2021-07-05 15:55:14 UTC
By the way, the operators of the AG DSN server fixed the greeting text. So this – what I still consider bug :) – is not triggered anymore.