Summary: | app-portage/mirrorselect-2.4.0 -- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 - net-analyzer/netselect output changes | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Gary E. Miller <gem> |
Component: | Current packages | Assignee: | Portage Tools Team <tools-portage> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | gem, infra-bugs, netmon, robbat2 |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.debian.org/136849 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Gary E. Miller
2024-01-13 19:56:55 UTC
Seems to fail frequently. Here is -d9: ```` pi4 ~ # mirrorselect -s 5 -S -R 'North America' -d 9 main(); config_path = /etc/portage/make.conf get_filesystem_mirrors(): config_path = /etc/portage/make.conf get_filesystem_mirrors(): mirrorlist = ['https://mirror.reenigne.net/gentoo/', '\\', 'https://172.83.105.10/gentoo/', '\\', 'https://mirror.clarkson.edu/gentoo/', '\\', 'https://mirrors.mit.edu/gentoo-distfiles/', '\\', 'https://128.153.145.19/gentoo/'] get_filesystem_mirrors(): ignoring non-accessible mirror = \ get_filesystem_mirrors(): ignoring non-accessible mirror = \ get_filesystem_mirrors(): ignoring non-accessible mirror = \ get_filesystem_mirrors(): ignoring non-accessible mirror = \ get_filesystem_mirrors(): fsmirrors = [] using url: https://api.gentoo.org/mirrors/distfiles.xml * Using url: https://api.gentoo.org/mirrors/distfiles.xml * Limiting test to "region=North America" hosts. * Limiting test to https hosts. getlist(): fetching https://api.gentoo.org/mirrors/distfiles.xml * Downloading a list of mirrors... Enabled ssl certificate verification: True, for: https://api.gentoo.org/mirrors/distfiles.xml Connector.connect_url(); headers = {'Accept-Charset': 'utf-8', 'User-Agent': 'Mirrorselect-2.4.0'} Connector.connect_url(); connecting to opener Connector.connect_url() HEADERS = {'Date': 'Sat, 13 Jan 2024 20:02:00 GMT', 'Content-Type': 'text/xml', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Last-Modified': 'Sat, 06 Jan 2024 06:55:02 GMT', 'ETag': 'W/"6598f946-969b"', 'Expires': 'Sat, 06 Jan 2024 08:40:44 GMT', 'Cache-Control': 'max-age=3600', 'Access-Control-Allow-Origin': '*', 'X-77-NZT': 'EgwB1GYuBwHXSQMAAAwBj/QzEwH3mwMAAA', 'X-77-NZT-Ray': '74b3202c04a3896d38eca265d72fac33', 'X-Accel-Expires': '@1705178879', 'X-Accel-Date': '1705175279', 'X-77-Cache': 'HIT', 'X-77-Age': '1764', 'Content-Encoding': 'gzip', 'Server': 'CDN77-Turbo', 'X-Cache-LB': 'HIT', 'X-Age-LB': '841', 'X-77-POP': 'seattleUSWA'} Connector.connect_url() Status_code = 200 New content downloaded for: https://api.gentoo.org/mirrors/distfiles.xml Got 251 mirrors. Extractor(): fetched mirrors, 7 hosts after filtering * Using netselect to choose the top 5 mirrors... netselect(): running "netselect -s5 https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/ https://mirrors.mit.edu/gentoo-distfiles/ https://mirrors.rit.edu/gentoo/ https://mirror.clarkson.edu/gentoo/ https://mirror.servaxnet.com/gentoo/" Done. netselect(): returning [b'https://mirror.reenigne.net/gentoo/', b'https://172.83.105.10/gentoo/\x04', b'https://mirror.clarkson.edu/gentoo/', b'https://mirrors.mit.edu/gentoo-distfiles/', b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19'] and {b'172': b'https://mirror.reenigne.net/gentoo/', b'207': b'https://172.83.105.10/gentoo/\x04', b'261': b'https://mirror.clarkson.edu/gentoo/', b'282': b'https://mirrors.mit.edu/gentoo-distfiles/', b'312': b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19'} Traceback (most recent call last): File "/usr/lib/python-exec/python3.10/mirrorselect", line 55, in <module> MirrorSelect().main(sys.argv) File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 469, in main self.change_config( File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 107, in change_config hosts[i] = hosts[i].decode("utf-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 30: invalid start byte pi4 ~ # ```` One problem appears to be this URL: b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19' https://128.153.145.19/gentoo//xf6v/xf8/xfa/xf6v/x19 https with an IPv4 address?? Bad certificate?? 404?? Why is this in the mirrorlist at all?? Another mirror with a bad cert: https://172.83.105.10/gentoo/ Oddly, mirrorselect was happy with that one and put it in my GENTOO_MIRRORS. Should I file another bug for mirrorselect allowing bad certs? Another bad UTF-8: b'https://128.153.145.19/gentoo/\xf7v' This at least netselect, and possibly also mirrorselect's parsing doing something weird The underlying mirror data doesn't contain *ANY* IPs $ curl https://api.gentoo.org/mirrors/distfiles.xml -sq |grep '<uri' (read the output, i'm not going to repeat it here) netselect in your command output is the key. Given a list of URLs, it should return those same those URLs - it should NOT be returning the underlying IPs However, I reproduced this part: ``` $ netselect -s5 -t 1 https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/ https://mirror.clarkson.edu/gentoo/ 192 https://172.83.105.10/gentoo/ 197 https://mirror.reenigne.net/gentoo/ 220 https://128.153.145.19/gentoo/ ``` 172.83.105.10 is mirror.reenigne.net 128.153.145.19 is mirror.clarkson.edu What's not clear is if this is an intentional change in the behavior of netselect, or a bug introduced at some point in the past. That leads us the extra output on the end: \xf6v\xf8\xfa\xf6v\x19\ I couldn't reproduce this if I called ``` PYTHONPATH=. ./bin/mirrorselect -s 5 -S -R 'North America' -d 9 -o ... * Using netselect to choose the top 5 mirrors... netselect(): running "netselect -s5 https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/ https://mirrors.mit.edu/gentoo-distfiles/ https://mirrors.rit.edu/gentoo/ https://mirror.clarkson.edu/gentoo/ https://mirror.servaxnet.com/gentoo/" Done. netselect(): returning [b'https://mirror.reenigne.net/gentoo/', b'https://172.83.105.10/gentoo/', b'https://mirrors.mit.edu/gentoo-distfiles/', b'https://128.153.145.19/gentoo/', b'https://mirror.clarkson.edu/gentoo/'] and {b'134': b'https://mirror.reenigne.net/gentoo/', b'146': b'https://172.83.105.10/gentoo/', b'255': b'https://mirrors.mit.edu/gentoo-distfiles/', b'367': b'https://128.153.145.19/gentoo/', b'381': b'https://mirror.clarkson.edu/gentoo/'} GENTOO_MIRRORS="https://mirror.reenigne.net/gentoo/ \ https://172.83.105.10/gentoo/ \ https://mirrors.mit.edu/gentoo-distfiles/ \ https://128.153.145.19/gentoo/ \ https://mirror.clarkson.edu/gentoo/" ``` So on that front I don't know, but suspect it's also netselect being weird. netselect itself hasn't changed at the base upstream in *14* years. There are a few patches, but I'm wondering if it makes some bad assumptions about libc behavior that are no longer true. Bad news: I reproduced the weird unicode, but it's definetly a SOMETIMES bug, pointing to weirdness in netselect: netselect(): running "netselect -s50 mirror.leaseweb.com:_044ce454 mirror.kumi.systems:_e98ecbd1 ftp.belnet.be:_24a832b8 mirror.telepoint.bg:_85363425 mirrors.daticum.com:_d9a76195 mirror.init7.net:_c7e45805 mirror.dkm.cz:_860f5c01 mirror.it4i.cz:_46687747 mirrors.dotsrc.org:_d341c0ab mirrors.ircam.fr:_ba12e285 mirrors.soeasyto.com:_70d3fe86 linux.rz.ruhr-uni-bochum.de:_7807a76b ftp.fau.de:_d20af173 ftp.agdsn.de:_7294e1d9 ftp-stud.hs-esslingen.de:_4a06b7ae mirror.eu.oneandone.net:_cdcf10b5 mirror.netcologne.de:_47784534 ftp.halifax.rwth-aachen.de:_17c7163c ftp.gwdg.de:_36afd488 ftp.tu-ilmenau.de:_9eeb5e2b ftp.uni-hannover.de:_cbc0e1cb packages.hs-regensburg.de:_d07183a1 ftp.uni-stuttgart.de:_c023fdd3 ftp.spline.inf.fu-berlin.de:_102a1354 mirror.netzwerge.de:_7c4e6a46 mirror.dogado.de:_d4171a12 quantum-mirror.hu:_f8ec96db gentoo.jss.hu:_727a28db ftp.heanet.ie:_6150fe6c gentoo.mirror.garr.it:_1e633d93 ftp.snt.utwente.nl:_a75c4d1b mirrors.evoluso.com:_4e1313a7 ftp.rnl.tecnico.ulisboa.pt:_cf66a5ba mirrors.ptisp.pt:_2277fcdc mirror1.sox.rs:_3184caa2 ftp.lysator.liu.se:_e7682a56 mirrors.tnonline.net:_6a96f98c mirror.wheel.sk:_08a87aaf repo.ifca.es:_b983eeb5 ftp.linux.org.tr:_6132f956 mirror.bytemark.co.uk:_739d0c3f mirrors.gethosted.online:_e7c68df1 www.mirrorservice.org:_48d9cf82" Raw output b' 101 mirror.leaseweb.com:_044ce454\n 306 mirrors.ircam.fr:_ba12e285\n 322 129.102.1.37:_ba12e285\n 336 193.190.198.27:_24a832b8\n 340 137.226.34.46:_17c7163c\n 359 ftp.fau.de:_d20af173\n 379 mirror.bytemark.co.uk:_739d0c3f\n 386 131.188.12.211:_d20af173\n 391 mirrors.ptisp.pt:_2277fcdc\n 393 mirrors.dotsrc.org:_d341c0ab\n 396 [2001:41c8:20:5e6::150]:_739d0c3f\n 403 ftp-stud.hs-esslingen.de:_4a06b7ae\n 405 mirror1.sox.rs:_3184caa2\n 408 212.110.163.13:_739d0c3f\n 423 130.225.254.116:_d341c0ab\n 450 [2001:6b0:17:f0a0::fd]:_e7682a56*\x01\x04\xf9\n 452 129.143.116.10:_4a06b7ae@\x87\x80a\x13\x7f\n 452 ftp.lysator.liu.se:_e7682a56\n 456 80.68.83.150:_739d0c3f\n 463 141.30.235.39:_7294e1d9\n 470 130.236.254.253:_e7682a56\n 473 gentoo.jss.hu:_727a28db\n 473 130.185.80.122:_2277fcdc\n 481 130.236.254.251:_e7682a56\n 581 ftp.agdsn.de:_7294e1d9\n 618 88.218.137.65:_3184caa2\n 652 194.8.197.22:_47784534\n 1120 155.4.110.241:_6a96f98c\n 1464 mirror.netcologne.de:_47784534\n 1716 mirrors.evoluso.com:_4e1313a7\n' Good news: the underlying host/ip problem has a draft fix at: https://gitweb.gentoo.org/proj/mirrorselect.git/commit/?h=robbat2/netselect-tags It doesn't have the UTF-8 output fixed, so sometimes it will work, and othertimes it will fail with UnicodeDecodeError. |