In bug 204683 a new format for mirrors3.xml was introduced, with this change to mirrors.dtd: https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/\ mirrors.dtd?id=1895d4e9a65a04de7cfc3ac5cab5b1c2a5a993fd In a nutshell, the countryname attribute was dropped from the mirrorgroup element, and a new countries element with a list of country names added instead. However, distfiles.xml and rsync.xml in api.git follow the previous syntax, leading to an XML validation error with app-emacs/nxml-gentoo-syntax (which uses mirrors.dtd as input). Also, mirrorselect-2.2.6 still uses the old syntax with the countryname attribute and doesn't know about the countries element (IIUC). I am not sure if I understand all the history of this correctly, but presumably mirrors.xml should be updated to reflect the syntax that is currently used by the files in api.git?
Created attachment 763161 [details, diff] Patch for data/dtd.git
From #gentoo-dev discussion: <@ulm> dol-sen: robbat2: looks like bug 204683 was either never implemented, or the syntax was reverted later when switching from mirrors3.xml to distfiles.xml in the api.git repo? <+willikins> ulm: https://bugs.gentoo.org/204683 "New mirrors.xml format"; Mirrors, Feature Request; RESO, FIXE; robbat2:mirror-admin <@ulm> I still see countryname attributes used everywhere, while according to mirrors.dtd they no longer exist <@ulm> i.e. this change: https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/mirrors.dtd?id=1895d4e9a65a04de7cfc3ac5cab5b1c2a5a993fd <@ulm> (I know, that was in 2008 ...) [...] <@robbat2> ulm: it was definetly implemented before: you can see it here: https://gitweb.gentoo.org/proj/www-redesign.git/tree/xml/htdocs/main/en/mirrors3.xml <@robbat2> looks like a3li re-added them when the actual www change happened <@robbat2> https://gitweb.gentoo.org/data/api.git/commit/files/mirrors/distfiles.xml?id=99d8572e563a5fbcc8033e45802fbade86b2ed10 <@robbat2> i think we need to check the www site code to see if they are actually used, and decide then <@robbat2> yep, it's used: https://gitweb.gentoo.org/sites/www.git/tree/_plugins/mirrors.rb#n18 <@robbat2> please make a new bug re-implement the bug 204683 changes on the new site & data So, the countryname attributes are currently used both in infra tooling (mirrors.rb) and in mirrorselect (mirrorparser3.py), and the way forward is to reimplement the 2008 change.
I've fixed the www site code to not use countryname and removed it from the XML files, since it was originally fixed before the site migrated to Jekyll.
Reopening, because the files still don't validate: $ xmllint --noout --dtdvalid ~/git/dtd/mirrors.dtd distfiles.xml distfiles.xml:4: element mirrors: validity error : Element mirrors content does not follow the DTD, expecting (mirrorgroup* , countries), got (mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup ) distfiles.xml:238: element mirrorgroup: validity error : IDREF attribute country references an unknown ID "GR" [...34 similar lines omitted...] Document distfiles.xml does not validate against /local/home/ulm/git/dtd/mirrors.dtd Presumably, a <countries> element is missing: https://gitweb.gentoo.org/proj/www-redesign.git/tree/xml/htdocs/main/en/mirrors3.xml#n825
I think the countries should be dropped from those files, can you please adjust the DTD to say countries* for now?
That can't really be done, because the country attribute is an IDREF: <!ATTLIST mirrorgroup region CDATA #REQUIRED country IDREF #REQUIRED> I am going to drop countries from the DTD altogether, and change the country attribute (back) to CDATA.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/data/dtd.git/commit/?id=bb6e63ff76ea7066f8340df51a64b3ae09a72385 commit bb6e63ff76ea7066f8340df51a64b3ae09a72385 Author: Ulrich Müller <ulm@gentoo.org> AuthorDate: 2022-01-22 10:28:38 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2022-01-24 06:43:12 +0000 mirrors.dtd: Update to reflect syntax of mirror files in api.git This partially reverts the following commit (originally in CVS, now in archive/proj/gentoo.git): commit d4bee3ba7e32a7e41639e06f31791b7ed66637a6 Author: Xavier Neys <neysx@gentoo.org> Date: Sun Jan 13 16:16:21 2008 +0000 #204683 a few more tweaks to the new mirrors DTD & XML Bug: https://bugs.gentoo.org/831807 Signed-off-by: Ulrich Müller <ulm@gentoo.org> mirrors.dtd | 30 ++++-------------------------- 1 file changed, 4 insertions(+), 26 deletions(-)
This seems to have broken mirrorselect since it is still using countryname [1]: mirrorselect -i -o * Using url: https://api.gentoo.org/mirrors/distfiles.xml * Downloading a list of mirrors... Got 195 mirrors. Traceback (most recent call last): File "/usr/lib/python-exec/python3.9/mirrorselect", line 61, in <module> MirrorSelect().main(sys.argv) File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 380, in main urls = self.select_urls(hosts, options) File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 327, in select_urls selector = Interactive(hosts, options, self.output) File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 521, in __init__ self.interactive(hosts, options) File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 546, in interactive for (url, args) in sorted(hosts, key = lambda x: File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 547, in <lambda> (x[1]['country'].lower(), x[1]['name'].lower()) ): AttributeError: 'NoneType' object has no attribute 'lower' [1] https://gitweb.gentoo.org/proj/mirrorselect.git/tree/mirrorselect/mirrorparser3.py#n61
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/data/api.git/commit/?id=531993f328d484de6f2ce04fe15ac2f7058074b3 commit 531993f328d484de6f2ce04fe15ac2f7058074b3 Author: Ulrich Müller <ulm@gentoo.org> AuthorDate: 2022-01-24 19:23:28 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2022-01-24 19:23:28 +0000 Revert "mirrors: drop country-name" Apparently this change has broken mirrorselect. Reverting for now, until the change from 2008 will be reimplemented in mirrorselect. This reverts commit ce38f0772a8dbc67dbeefb014327c22e4e8d8f77. Bug: https://bugs.gentoo.org/831807#c8 Signed-off-by: Ulrich Müller <ulm@gentoo.org> files/mirrors/distfiles.xml | 76 ++++++++++++++++++++++----------------------- files/mirrors/rsync.xml | 36 ++++++++++----------- 2 files changed, 56 insertions(+), 56 deletions(-)
(In reply to Ben Kohler from comment #8) > This seems to have broken mirrorselect since it is still using countryname I have reverted the change in distfiles.xml and rsync.xml for now. @tools-portage: This looks like a bug in mirrorselect. It shouldn't rely on the countryname attribute which was always optional: https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/mirror.dtd?id=186c6edecfe9be7d1840eac3a57c61623170226c
FWIW when this was crashing, simply editing mirrorparser3.py from "country": mirrorgroup.get("countryname"), to "country": mirrorgroup.get("country"), is enough to get it to run ok.
Peter, could I tempt you into taking a look at fixing this on the mirrorselect side?