Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 831807

Summary: distfiles.xml and rsync.xml (in api.git) do not follow the syntax specified by mirrors.dtd (in dtd.git)
Product: Mirrors Reporter: Ulrich Müller <ulm>
Component: Feature RequestAssignee: Mirror Admins <mirror-admin>
Status: CONFIRMED ---    
Severity: normal CC: bkohler, plevine457, sam, tools-portage
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=204683
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Patch for data/dtd.git

Description Ulrich Müller gentoo-dev 2022-01-22 10:26:10 UTC
In bug 204683 a new format for mirrors3.xml was introduced, with this change to mirrors.dtd:
https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/\
mirrors.dtd?id=1895d4e9a65a04de7cfc3ac5cab5b1c2a5a993fd

In a nutshell, the countryname attribute was dropped from the mirrorgroup element, and a new countries element with a list of country names added instead.

However, distfiles.xml and rsync.xml in api.git follow the previous syntax, leading to an XML validation error with app-emacs/nxml-gentoo-syntax (which uses mirrors.dtd as input).

Also, mirrorselect-2.2.6 still uses the old syntax with the countryname attribute and doesn't know about the countries element (IIUC).

I am not sure if I understand all the history of this correctly, but presumably mirrors.xml should be updated to reflect the syntax that is currently used by the files in api.git?
Comment 1 Ulrich Müller gentoo-dev 2022-01-22 10:30:37 UTC
Created attachment 763161 [details, diff]
Patch for data/dtd.git
Comment 2 Ulrich Müller gentoo-dev 2022-01-22 15:13:00 UTC
From #gentoo-dev discussion:

<@ulm> dol-sen: robbat2: looks like bug 204683 was either never implemented, or the syntax was reverted later when switching from mirrors3.xml to distfiles.xml in the api.git repo?
<+willikins> ulm: https://bugs.gentoo.org/204683 "New mirrors.xml format"; Mirrors, Feature Request; RESO, FIXE; robbat2:mirror-admin                                                        
<@ulm> I still see countryname attributes used everywhere, while according to mirrors.dtd they no longer exist
<@ulm> i.e. this change: https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/mirrors.dtd?id=1895d4e9a65a04de7cfc3ac5cab5b1c2a5a993fd
<@ulm> (I know, that was in 2008 ...)
[...]
<@robbat2> ulm: it was definetly implemented before: you can see it here: https://gitweb.gentoo.org/proj/www-redesign.git/tree/xml/htdocs/main/en/mirrors3.xml
<@robbat2> looks like a3li re-added them when the actual www change happened
<@robbat2> https://gitweb.gentoo.org/data/api.git/commit/files/mirrors/distfiles.xml?id=99d8572e563a5fbcc8033e45802fbade86b2ed10
<@robbat2> i think we need to check the www site code to see if they are actually used, and decide then
<@robbat2> yep, it's used: https://gitweb.gentoo.org/sites/www.git/tree/_plugins/mirrors.rb#n18
<@robbat2> please make a new bug re-implement the bug 204683 changes on the new site & data

So, the countryname attributes are currently used both in infra tooling (mirrors.rb) and in mirrorselect (mirrorparser3.py), and the way forward is to reimplement the 2008 change.
Comment 3 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2022-01-23 06:03:53 UTC
I've fixed the www site code to not use countryname and removed it from the XML files, since it was originally fixed before the site migrated to Jekyll.
Comment 4 Ulrich Müller gentoo-dev 2022-01-23 06:31:44 UTC
Reopening, because the files still don't validate:

$ xmllint --noout --dtdvalid ~/git/dtd/mirrors.dtd distfiles.xml
distfiles.xml:4: element mirrors: validity error : Element mirrors content does not follow the DTD, expecting (mirrorgroup* , countries), got (mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup mirrorgroup )
distfiles.xml:238: element mirrorgroup: validity error : IDREF attribute country references an unknown ID "GR"
[...34 similar lines omitted...]
Document distfiles.xml does not validate against /local/home/ulm/git/dtd/mirrors.dtd

Presumably, a <countries> element is missing:
https://gitweb.gentoo.org/proj/www-redesign.git/tree/xml/htdocs/main/en/mirrors3.xml#n825
Comment 5 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2022-01-23 22:20:03 UTC
I think the countries should be dropped from those files, can you please adjust the DTD to say countries* for now?
Comment 6 Ulrich Müller gentoo-dev 2022-01-24 06:46:09 UTC
That can't really be done, because the country attribute is an IDREF:

<!ATTLIST mirrorgroup region CDATA #REQUIRED
                      country IDREF #REQUIRED>

I am going to drop countries from the DTD altogether, and change the country attribute (back) to CDATA.
Comment 7 Larry the Git Cow gentoo-dev 2022-01-24 06:47:29 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/data/dtd.git/commit/?id=bb6e63ff76ea7066f8340df51a64b3ae09a72385

commit bb6e63ff76ea7066f8340df51a64b3ae09a72385
Author:     Ulrich Müller <ulm@gentoo.org>
AuthorDate: 2022-01-22 10:28:38 +0000
Commit:     Ulrich Müller <ulm@gentoo.org>
CommitDate: 2022-01-24 06:43:12 +0000

    mirrors.dtd: Update to reflect syntax of mirror files in api.git
    
    This partially reverts the following commit (originally in CVS, now in
    archive/proj/gentoo.git):
    
    commit d4bee3ba7e32a7e41639e06f31791b7ed66637a6
    Author: Xavier Neys <neysx@gentoo.org>
    Date:   Sun Jan 13 16:16:21 2008 +0000
    
        #204683 a few more tweaks to the new mirrors DTD & XML
    
    Bug: https://bugs.gentoo.org/831807
    Signed-off-by: Ulrich Müller <ulm@gentoo.org>

 mirrors.dtd | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)
Comment 8 Ben Kohler gentoo-dev 2022-01-24 17:20:09 UTC
This seems to have broken mirrorselect since it is still using countryname [1]:

mirrorselect -i -o
* Using url: https://api.gentoo.org/mirrors/distfiles.xml
* Downloading a list of mirrors...
 Got 195 mirrors.
Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.9/mirrorselect", line 61, in <module>
    MirrorSelect().main(sys.argv)
  File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 380, in main
    urls = self.select_urls(hosts, options)
  File "/usr/lib/python3.9/site-packages/mirrorselect/main.py", line 327, in select_urls
    selector = Interactive(hosts, options, self.output)
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 521, in __init__
    self.interactive(hosts, options)
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 546, in interactive
    for (url, args) in sorted(hosts, key = lambda x:
  File "/usr/lib/python3.9/site-packages/mirrorselect/selectors.py", line 547, in <lambda>
    (x[1]['country'].lower(), x[1]['name'].lower()) ):
AttributeError: 'NoneType' object has no attribute 'lower'



[1] https://gitweb.gentoo.org/proj/mirrorselect.git/tree/mirrorselect/mirrorparser3.py#n61
Comment 9 Larry the Git Cow gentoo-dev 2022-01-24 19:25:41 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/data/api.git/commit/?id=531993f328d484de6f2ce04fe15ac2f7058074b3

commit 531993f328d484de6f2ce04fe15ac2f7058074b3
Author:     Ulrich Müller <ulm@gentoo.org>
AuthorDate: 2022-01-24 19:23:28 +0000
Commit:     Ulrich Müller <ulm@gentoo.org>
CommitDate: 2022-01-24 19:23:28 +0000

    Revert "mirrors: drop country-name"
    
    Apparently this change has broken mirrorselect. Reverting for now,
    until the change from 2008 will be reimplemented in mirrorselect.
    
    This reverts commit ce38f0772a8dbc67dbeefb014327c22e4e8d8f77.
    
    Bug: https://bugs.gentoo.org/831807#c8
    Signed-off-by: Ulrich Müller <ulm@gentoo.org>

 files/mirrors/distfiles.xml | 76 ++++++++++++++++++++++-----------------------
 files/mirrors/rsync.xml     | 36 ++++++++++-----------
 2 files changed, 56 insertions(+), 56 deletions(-)
Comment 10 Ulrich Müller gentoo-dev 2022-01-24 19:33:41 UTC
(In reply to Ben Kohler from comment #8)
> This seems to have broken mirrorselect since it is still using countryname

I have reverted the change in distfiles.xml and rsync.xml for now.

@tools-portage: This looks like a bug in mirrorselect. It shouldn't rely on the countryname attribute which was always optional:
https://gitweb.gentoo.org/archive/proj/gentoo.git/commit/xml/htdocs/dtd/mirror.dtd?id=186c6edecfe9be7d1840eac3a57c61623170226c
Comment 11 Ben Kohler gentoo-dev 2022-01-24 19:36:50 UTC
FWIW when this was crashing, simply editing mirrorparser3.py from

"country": mirrorgroup.get("countryname"),

to

"country": mirrorgroup.get("country"),

is enough to get it to run ok.
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-06 13:42:44 UTC
Peter, could I tempt you into taking a look at fixing this on the mirrorselect side?