Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 922058 - app-portage/mirrorselect-2.4.0 -- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 - net-analyzer/netselect output changes
Summary: app-portage/mirrorselect-2.4.0 -- UnicodeDecodeError: 'utf-8' codec can't dec...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Portage Tools Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-13 19:56 UTC by Gary E. Miller
Modified: 2024-01-15 06:14 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gary E. Miller 2024-01-13 19:56:55 UTC
mirrorselect crashes, sometimes:

````
pi4 ~ # mirrorselect -s 5 -S  -R 'North America'
* Using url: https://api.gentoo.org/mirrors/distfiles.xml
* Limiting test to "region=North America" hosts. 
* Limiting test to https hosts. 
* Downloading a list of mirrors...
 Got 251 mirrors.
* Using netselect to choose the top 5 mirrors...Done.
Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.10/mirrorselect", line 55, in <module>
    MirrorSelect().main(sys.argv)
  File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 469, in main
    self.change_config(
  File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 107, in change_config
    hosts[i] = hosts[i].decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 29: invalid continuation byte
````

Reproducible: Sometimes

Steps to Reproduce:
1.mirrorselect -s 5 -S  -R 'North America'
2.
3.



Note this is a Rapsberry Pi 3B+, whose name if pi4.
Because it is the 4th Pi I bought.

````
pi4 ~ # emerge --info  app-portage/mirrorselect
Portage 3.0.61 (python 3.10.13-final-0, default/linux/arm/17.0, gcc-13, glibc-2.38-r9, 5.10.11-v7 armv7l)
=================================================================
                         System Settings
=================================================================
System uname: Linux-5.10.11-v7-armv7l-ARMv7_Processor_rev_4_-v7l-with-glibc2.38
KiB Mem:      995688 total,     66108 free
KiB Swap:    2097148 total,   1991932 free
Timestamp of repository gentoo: Sat, 13 Jan 2024 19:05:12 +0000
sh bash 5.1_p16-r6
ld GNU ld (Gentoo 2.40 p7) 2.40.0
app-misc/pax-utils:        1.3.7::gentoo
app-shells/bash:           5.1_p16-r6::gentoo
dev-build/make:            4.4.1-r1::gentoo
dev-lang/perl:             5.38.2-r1::gentoo
dev-lang/python:           3.10.13::gentoo, 3.11.7::gentoo, 3.12.1::gentoo
dev-lang/rust-bin:         1.71.1::gentoo
dev-util/cmake:            3.27.9::gentoo
dev-util/meson:            1.3.0-r2::gentoo
sys-apps/baselayout:       2.14-r1::gentoo
sys-apps/openrc:           0.48::gentoo
sys-apps/sandbox:          2.38::gentoo
sys-devel/autoconf:        2.71-r6::gentoo
sys-devel/automake:        1.16.5-r1::gentoo
sys-devel/binutils:        2.40-r9::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/gcc:             12.3.1_p20230526::gentoo, 13.2.1_p20230826::gentoo
sys-devel/gcc-config:      2.11::gentoo
sys-devel/libtool:         2.4.7-r1::gentoo
sys-kernel/linux-headers:  6.1::gentoo (virtual/os-headers)
sys-libs/glibc:            2.38-r9::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.namerica.gentoo.org/gentoo-portage
    priority: -1000
    volatile: False
    sync-rsync-extra-opts: --exclude ChangeLog* --delete-excluded
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-max-age: 3

ACCEPT_KEYWORDS="arm"
ACCEPT_LICENSE="*"
CBUILD="armv7a-unknown-linux-gnueabihf"
CFLAGS="-march=armv8-a+crc -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -O2 -pipe"
CHOST="armv7a-unknown-linux-gnueabihf"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=armv8-a+crc -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -O2 -pipe"
DISTDIR="/var/cache/distfiles"
EMERGE_DEFAULT_OPTS="--keep-going --with-bdeps=y --backtrack=20"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2"
GENTOO_MIRRORS="http://gentoo.gossamerhost.com rsync://gentoo.gossamerhost.com/gentoo-distfiles/ http://gentoo.mirrors.tds.net/gentoo ftp://gentoo.netnitco.net/pub/mirrors/gentoo/source/ rsync://mirror.leaseweb.com/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
LINGUAS="en en_US"
MAKEOPTS="-j2 -l3"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--exclude ChangeLog* --delete-excluded"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
PYTHONPATH="/usr/local/lib/python3.10/site-packages/"
SHELL="/bin/bash"
USE="acl adns aio apm arm bash-completion blake2 bzip2 caps cli cpudetection crypt dane dri fontconfig fortran gdbm gold gzip harfbuzz hddtemp http2 iconv ipv6 jpeg kmod ldns lm_sensors lz4 lzma ncurses neon nfs nfsv4 nfsv41 nginx nls openmp pam pcre png readline seccomp smp split-usr sqlite ssh ssl test-rust threads tools truetype tty-helpers udev unicode urandom usb vim-syntax wifi xattr zlib zstd" ADA_TARGET="gnat_2021" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 ntrip navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en en-US" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" NGINX_MODULES_HTTP="access auth_basic autoindex browser charset empty_gif fastcgi geo gzip limit_conn limit_req map memcached proxy referer rewrite scgi spdy split_clients ssi upstream_hash upstream_ip_hash upstream_keepalive upstream_least_conn upstream_zone userid uwsgi" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-1" POSTGRES_TARGETS="postgres15" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10 python3_9" RUBY_TARGETS="ruby31" VIDEO_CARDS="exynos fbdev omap dummy" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS

=================================================================
                        Package Settings
=================================================================

app-portage/mirrorselect-2.4.0::gentoo was built with the following:
USE="ipv6 -test" PYTHON_TARGETS="python3_10 -python3_11"
FEATURES="usersync pkgdir-index-trusted merge-sync ipc-sandbox multilib-strict binpkg-logs xattr pid-sandbox ebuild-locks unknown-features-warn binpkg-dostrip buildpkg-live unmerge-orphans userfetch assume-digests parallel-fetch unmerge-logs fixlafiles distlocks news preserve-libs qa-unresolved-soname-deps usersandbox sandbox binpkg-docompress sfperms config-protect-if-modified network-sandbox userpriv protect-owned"
````
Comment 1 Gary E. Miller 2024-01-13 20:07:51 UTC
Seems to fail frequently.  Here is -d9:

````
pi4 ~ # mirrorselect -s 5 -S  -R 'North America' -d 9
main(); config_path = /etc/portage/make.conf
get_filesystem_mirrors(): config_path = /etc/portage/make.conf
get_filesystem_mirrors(): mirrorlist = ['https://mirror.reenigne.net/gentoo/', '\\', 'https://172.83.105.10/gentoo/', '\\', 'https://mirror.clarkson.edu/gentoo/', '\\', 'https://mirrors.mit.edu/gentoo-distfiles/', '\\', 'https://128.153.145.19/gentoo/']
get_filesystem_mirrors(): ignoring non-accessible mirror = \
get_filesystem_mirrors(): ignoring non-accessible mirror = \
get_filesystem_mirrors(): ignoring non-accessible mirror = \
get_filesystem_mirrors(): ignoring non-accessible mirror = \
get_filesystem_mirrors(): fsmirrors = []
using url: https://api.gentoo.org/mirrors/distfiles.xml
* Using url: https://api.gentoo.org/mirrors/distfiles.xml
* Limiting test to "region=North America" hosts. 
* Limiting test to https hosts. 
getlist(): fetching https://api.gentoo.org/mirrors/distfiles.xml
* Downloading a list of mirrors...
Enabled ssl certificate verification: True, for: https://api.gentoo.org/mirrors/distfiles.xml
Connector.connect_url(); headers = {'Accept-Charset': 'utf-8', 'User-Agent': 'Mirrorselect-2.4.0'}
Connector.connect_url(); connecting to opener
Connector.connect_url() HEADERS = {'Date': 'Sat, 13 Jan 2024 20:02:00 GMT', 'Content-Type': 'text/xml', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Last-Modified': 'Sat, 06 Jan 2024 06:55:02 GMT', 'ETag': 'W/"6598f946-969b"', 'Expires': 'Sat, 06 Jan 2024 08:40:44 GMT', 'Cache-Control': 'max-age=3600', 'Access-Control-Allow-Origin': '*', 'X-77-NZT': 'EgwB1GYuBwHXSQMAAAwBj/QzEwH3mwMAAA', 'X-77-NZT-Ray': '74b3202c04a3896d38eca265d72fac33', 'X-Accel-Expires': '@1705178879', 'X-Accel-Date': '1705175279', 'X-77-Cache': 'HIT', 'X-77-Age': '1764', 'Content-Encoding': 'gzip', 'Server': 'CDN77-Turbo', 'X-Cache-LB': 'HIT', 'X-Age-LB': '841', 'X-77-POP': 'seattleUSWA'}
Connector.connect_url() Status_code = 200
New content downloaded for: https://api.gentoo.org/mirrors/distfiles.xml
 Got 251 mirrors.
Extractor(): fetched mirrors, 7 hosts after filtering
* Using netselect to choose the top 5 mirrors...
netselect(): running "netselect -s5 https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/ https://mirrors.mit.edu/gentoo-distfiles/ https://mirrors.rit.edu/gentoo/ https://mirror.clarkson.edu/gentoo/ https://mirror.servaxnet.com/gentoo/"
Done.

netselect(): returning [b'https://mirror.reenigne.net/gentoo/', b'https://172.83.105.10/gentoo/\x04', b'https://mirror.clarkson.edu/gentoo/', b'https://mirrors.mit.edu/gentoo-distfiles/', b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19'] and {b'172': b'https://mirror.reenigne.net/gentoo/', b'207': b'https://172.83.105.10/gentoo/\x04', b'261': b'https://mirror.clarkson.edu/gentoo/', b'282': b'https://mirrors.mit.edu/gentoo-distfiles/', b'312': b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19'}
Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.10/mirrorselect", line 55, in <module>
    MirrorSelect().main(sys.argv)
  File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 469, in main
    self.change_config(
  File "/usr/lib/python3.10/site-packages/mirrorselect/main.py", line 107, in change_config
    hosts[i] = hosts[i].decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 30: invalid start byte
pi4 ~ # 

````

One problem appears to be this URL:

 b'https://128.153.145.19/gentoo/\xf6v\xf8\xfa\xf6v\x19'
Comment 2 Gary E. Miller 2024-01-13 20:10:00 UTC
https://128.153.145.19/gentoo//xf6v/xf8/xfa/xf6v/x19

https with an IPv4 address??
Bad certificate??
404??

Why is this in the mirrorlist at all??
Comment 3 Gary E. Miller 2024-01-13 20:31:15 UTC
Another mirror with a bad cert:

https://172.83.105.10/gentoo/

Oddly, mirrorselect was happy with that one and put it in my
GENTOO_MIRRORS.  Should I file another bug for mirrorselect
allowing bad certs?

Another bad UTF-8:

 b'https://128.153.145.19/gentoo/\xf7v'
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-01-14 06:07:18 UTC
This at least netselect, and possibly also mirrorselect's parsing doing something weird

The underlying mirror data doesn't contain *ANY* IPs
$ curl https://api.gentoo.org/mirrors/distfiles.xml  -sq |grep '<uri'
(read the output, i'm not going to repeat it here)

netselect in your command output is the key.
Given a list of URLs, it should return those same those URLs - it should NOT be returning the underlying IPs

However, I reproduced this part:
```
$ netselect -s5 -t 1 https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/  https://mirror.clarkson.edu/gentoo/
  192 https://172.83.105.10/gentoo/ 
  197 https://mirror.reenigne.net/gentoo/
  220 https://128.153.145.19/gentoo/
```

172.83.105.10 is mirror.reenigne.net
128.153.145.19 is mirror.clarkson.edu

What's not clear is if this is an intentional change in the behavior of netselect, or a bug introduced at some point in the past.


That leads us the extra output on the end:
\xf6v\xf8\xfa\xf6v\x19\

I couldn't reproduce this if I called 

```
PYTHONPATH=. ./bin/mirrorselect  -s 5 -S  -R 'North America' -d 9   -o
...
* Using netselect to choose the top 5 mirrors...
netselect(): running "netselect -s5 https://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ https://mirror.reenigne.net/gentoo/ https://gentoo.osuosl.org/ https://mirrors.mit.edu/gentoo-distfiles/ https://mirrors.rit.edu/gentoo/ https://mirror.clarkson.edu/gentoo/ https://mirror.servaxnet.com/gentoo/"

Done.

netselect(): returning [b'https://mirror.reenigne.net/gentoo/', b'https://172.83.105.10/gentoo/', b'https://mirrors.mit.edu/gentoo-distfiles/', b'https://128.153.145.19/gentoo/', b'https://mirror.clarkson.edu/gentoo/'] and {b'134': b'https://mirror.reenigne.net/gentoo/', b'146': b'https://172.83.105.10/gentoo/', b'255': b'https://mirrors.mit.edu/gentoo-distfiles/', b'367': b'https://128.153.145.19/gentoo/', b'381': b'https://mirror.clarkson.edu/gentoo/'}

GENTOO_MIRRORS="https://mirror.reenigne.net/gentoo/ \
    https://172.83.105.10/gentoo/ \
    https://mirrors.mit.edu/gentoo-distfiles/ \
    https://128.153.145.19/gentoo/ \
    https://mirror.clarkson.edu/gentoo/"
```

So on that front I don't know, but suspect it's also netselect being weird.

netselect itself hasn't changed at the base upstream in *14* years. There are a few patches, but I'm wondering if it makes some bad assumptions about libc behavior that are no longer true.
Comment 5 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-01-15 06:09:51 UTC
Bad news:
I reproduced the weird unicode, but it's definetly a SOMETIMES bug, pointing to weirdness in netselect:

netselect(): running "netselect -s50 mirror.leaseweb.com:_044ce454 mirror.kumi.systems:_e98ecbd1 ftp.belnet.be:_24a832b8 mirror.telepoint.bg:_85363425 mirrors.daticum.com:_d9a76195 mirror.init7.net:_c7e45805 mirror.dkm.cz:_860f5c01 mirror.it4i.cz:_46687747 mirrors.dotsrc.org:_d341c0ab mirrors.ircam.fr:_ba12e285 mirrors.soeasyto.com:_70d3fe86 linux.rz.ruhr-uni-bochum.de:_7807a76b ftp.fau.de:_d20af173 ftp.agdsn.de:_7294e1d9 ftp-stud.hs-esslingen.de:_4a06b7ae mirror.eu.oneandone.net:_cdcf10b5 mirror.netcologne.de:_47784534 ftp.halifax.rwth-aachen.de:_17c7163c ftp.gwdg.de:_36afd488 ftp.tu-ilmenau.de:_9eeb5e2b ftp.uni-hannover.de:_cbc0e1cb packages.hs-regensburg.de:_d07183a1 ftp.uni-stuttgart.de:_c023fdd3 ftp.spline.inf.fu-berlin.de:_102a1354 mirror.netzwerge.de:_7c4e6a46 mirror.dogado.de:_d4171a12 quantum-mirror.hu:_f8ec96db gentoo.jss.hu:_727a28db ftp.heanet.ie:_6150fe6c gentoo.mirror.garr.it:_1e633d93 ftp.snt.utwente.nl:_a75c4d1b mirrors.evoluso.com:_4e1313a7 ftp.rnl.tecnico.ulisboa.pt:_cf66a5ba mirrors.ptisp.pt:_2277fcdc mirror1.sox.rs:_3184caa2 ftp.lysator.liu.se:_e7682a56 mirrors.tnonline.net:_6a96f98c mirror.wheel.sk:_08a87aaf repo.ifca.es:_b983eeb5 ftp.linux.org.tr:_6132f956 mirror.bytemark.co.uk:_739d0c3f mirrors.gethosted.online:_e7c68df1 www.mirrorservice.org:_48d9cf82"

Raw output b'  101 mirror.leaseweb.com:_044ce454\n  306 mirrors.ircam.fr:_ba12e285\n  322 129.102.1.37:_ba12e285\n  336 193.190.198.27:_24a832b8\n  340 137.226.34.46:_17c7163c\n  359 ftp.fau.de:_d20af173\n  379 mirror.bytemark.co.uk:_739d0c3f\n  386 131.188.12.211:_d20af173\n  391 mirrors.ptisp.pt:_2277fcdc\n  393 mirrors.dotsrc.org:_d341c0ab\n  396 [2001:41c8:20:5e6::150]:_739d0c3f\n  403 ftp-stud.hs-esslingen.de:_4a06b7ae\n  405 mirror1.sox.rs:_3184caa2\n  408 212.110.163.13:_739d0c3f\n  423 130.225.254.116:_d341c0ab\n  450 [2001:6b0:17:f0a0::fd]:_e7682a56*\x01\x04\xf9\n  452 129.143.116.10:_4a06b7ae@\x87\x80a\x13\x7f\n  452 ftp.lysator.liu.se:_e7682a56\n  456 80.68.83.150:_739d0c3f\n  463 141.30.235.39:_7294e1d9\n  470 130.236.254.253:_e7682a56\n  473 gentoo.jss.hu:_727a28db\n  473 130.185.80.122:_2277fcdc\n  481 130.236.254.251:_e7682a56\n  581 ftp.agdsn.de:_7294e1d9\n  618 88.218.137.65:_3184caa2\n  652 194.8.197.22:_47784534\n 1120 155.4.110.241:_6a96f98c\n 1464 mirror.netcologne.de:_47784534\n 1716 mirrors.evoluso.com:_4e1313a7\n'
Comment 6 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-01-15 06:14:13 UTC
Good news:
the underlying host/ip problem has a draft fix at:
https://gitweb.gentoo.org/proj/mirrorselect.git/commit/?h=robbat2/netselect-tags

It doesn't have the UTF-8 output fixed, so sometimes it will work, and othertimes it will fail with UnicodeDecodeError.