Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 204683 - New mirrors.xml format
Summary: New mirrors.xml format
Status: RESOLVED FIXED
Alias: None
Product: Mirrors
Classification: Unclassified
Component: Feature Request (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Mirror Admins
URL:
Whiteboard:
Keywords:
Depends on: 204291
Blocks:
  Show dependency tree
 
Reported: 2008-01-06 23:31 UTC by Robin Johnson
Modified: 2022-01-22 10:26 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-06 23:31:47 UTC
So we're moving to a new mirrors.xml format, to make maintenance and parsing easier.
Comment 1 Alex Howells (RETIRED) gentoo-dev 2008-01-06 23:34:23 UTC
We are?! Cool! Any further information? :P
Comment 2 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-06 23:35:29 UTC
Antarus has been putting in some nice work for us, on a new mirrors format.
I'm going to put the new XML in the tree as mirrors3.xml (antarus's original name). When mirrorselect is updated, we can move it over to mirrors.xml.
In the meantime, when updating CVS for mirror changes, please update both files.
Comment 3 Shyam Mani (RETIRED) gentoo-dev 2008-01-07 00:05:28 UTC
Would it be a nice idea to update rsync mirrors in here as well? We can choose to not display the information to the public if we wanted to, but basically the current way of doing things is pretty bad. I'll give you more details over IRC.
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-07 00:50:23 UTC
Ok, the distfiles/source mirrors data is fully in CVS now.

neysx: Hi, I included you here just to get your opinion on an integration issue.
In writing the XSL for this, should we consider <mirrors> as our root element, or can/should we make it a subset of data under GuideXML's mainpage - thus have a mirrors.xml that doesn't contain the data itself, but is rather the GuideXML, and it includes the mirrors3.xml data when it builds the output? For this, we can either change the <uri> element to <mirroruri> if we have to, or we can add the attributes from it to the main <uri> element. Ditto name to mirrorname.

fox2mike: It's a DTD, so I think we follow the same format but put it in a separate file. This should only be the public half of the information however, I don't want to block it like we do with userinfo.xml.
Comment 5 Xavier Neys (RETIRED) gentoo-dev 2008-01-07 22:27:02 UTC
(In reply to comment #2)
> Antarus has been putting in some nice work for us, on a new mirrors format.
> I'm going to put the new XML in the tree as mirrors3.xml (antarus's original
> name). When mirrorselect is updated, we can move it over to mirrors.xml.
> In the meantime, when updating CVS for mirror changes, please update both
> files.

You'll have two files, one xml that contains the data and one GuideXML document. You want to keep mirrors.xml for the GuideXML doc and use something else for the data because it'll require fewer changes.


(In reply to comment #4)
> Ok, the distfiles/source mirrors data is fully in CVS now.
> 
> neysx: Hi, I included you here just to get your opinion on an integration
> issue.
> In writing the XSL for this, should we consider <mirrors> as our root element,

Use the mirrors3.xml mirror list almost as it is so that it can be used by mirrorselect or any other application that needs the list without having to parse GuideXML or HTML. I'll add a <mirrors> GuideXML tag to source it so that it can be used in a document without having to hardcode content in an XSL. I can generate a chapter per mirrorgroup/@region and a section per mirrorgroup[@region]/@country with IDs that allow #region or #country in URLs, e.g. www.g.o/mirrors.xml#germany
It'll need an attribute to select which mirrors should be displayed, e.g.
<mirrors select="full|partial"/>

Comments/questions on current dtd/xml:
. mirror.dtd should be renamed to mirrors.dtd (nitpicking, I know)
. comment in DTD has "uri link" instead of "uri protocol"
. do you really need ipv4/6 attributes on mirrorgroup? Not used in xml, no example in dtd, it is easy to test whether a group is ipvX-only or has any ipvX with XSL should we need the info.
. countryname, could be removed, you already have the country code, country names can be listed elsewhere, e.g. in a countries tag.
e.g. <mirrorgroup region="Asia" country="SG"> and at end of file
<countries>
 <country code="SG">Singapore</country>
 <country code="TW">Taiwan</country>
...
</countries>
making <!ELEMENT mirrors (mirrorgroup+,countries)> 
<!ELEMENT countries (country+)>
<!ELEMENT country (#PCDATA)>
<!ATTLIST country code ID #REQUIRED>
and
<!ATTLIST mirrorgroup region CDATA #REQUIRED
                      country IDREF #REQUIRED>
in the DTD.
. <!ELEMENT mirror (name, uri*)>
can you have a mirror without any uri?, I'd say uri+
. You might want to add an active attribute to make it easier to hide/suspend troublesome mirrors, deleting one is easy, putting it back is more work :)
I know you can use comments, but if you ever want to check whether inactive mirrors are back or fixed, this makes it easier to select them rather than having to parse some comments.
. Use a single attribute instead of ipv4/ipv6=y|n. At the moment, you can have ipv4=n and ipv6=n, and with ipv6=y, it's not clear whether the editor forgot to set ipv4=n or whether the mirror is ipv4&6. Better use ipv="4|6|46" (or any other value 4/6, 4+6..., avoid 4&6).
. Protocol is specified twice, once in the attribute, once in the uri, I find it more explicit like that, just FYI.
. How do you want the actual mirrors3.xml to be served? Raw xml, i.e. redirect to mirrors3.xml?passthru=1, or better imo, a slightly transformed version to help applications parse the data with e.g. with country name put back in mirrorgroup and copied on each mirror, inactive ones filtered out, grouped by ? , sorted on ? comments out for apps that can't parse them out properly...
More formats are *trivial* do do, e.g . plain text, csv, :-separated like /etc/passwd...

You might even find it easier to use some plain text format for mirrorselect :)

See for instance
http://www.gentoo.org/proj/en/devrel/roll-call/devlist.xml
http://www.gentoo.org/proj/en/devrel/roll-call/devlist.xml?mode=xml
http://www.gentoo.org/proj/en/devrel/roll-call/devlist.xml?mode=kml
http://www.gentoo.org/proj/en/devrel/roll-call/devlist.xml?mode=yaml

BTW, even though it sounds like Austrailia, it's spelt Australia and KZ is in Asia.

Please comment. I can take over for the next round and prepare the XSLs.
Comment 6 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-08 14:10:21 UTC
(In reply to comment #5)
> You'll have two files, one xml that contains the data and one GuideXML
> document. You want to keep mirrors.xml for the GuideXML doc and use something
> else for the data because it'll require fewer changes.
+1 on that. We're going to be generating that datafile from some other ones in infra anyway, the others want to store some private fields like contacts.


> Use the mirrors3.xml mirror list almost as it is so that it can be used by
> mirrorselect or any other application that needs the list without having to
> parse GuideXML or HTML. I'll add a <mirrors> GuideXML tag to source it so that
> it can be used in a document without having to hardcode content in an XSL. I
> can generate a chapter per mirrorgroup/@region and a section per
> mirrorgroup[@region]/@country with IDs that allow #region or #country in URLs,
> e.g. www.g.o/mirrors.xml#germany
> It'll need an attribute to select which mirrors should be displayed, e.g.
> <mirrors select="full|partial"/>
That part sounds good to me.
full|partial, distfiles|portage are the options that are coming.
(portage rsync mirrors are going to be in a seperate datafile).

> Comments/questions on current dtd/xml:
> . mirror.dtd should be renamed to mirrors.dtd (nitpicking, I know)
> . comment in DTD has "uri link" instead of "uri protocol"
+1 on both of these, I'll commit in a moment.

> . do you really need ipv4/6 attributes on mirrorgroup? Not used in xml, no
> example in dtd, it is easy to test whether a group is ipvX-only or has any ipvX
> with XSL should we need the info.
Not that I can see. Fixed.

> . countryname, could be removed, you already have the country code, country
> names can be listed elsewhere, e.g. in a countries tag.
[snip example]
Sure if we want to do that, it makes less duplication. I only added it for now as I was using it for my prototype XSL - that specifically tried to print out the output as close to the old version as possible.

> . <!ELEMENT mirror (name, uri*)>
> can you have a mirror without any uri?, I'd say uri+
Changed to uri+, but also mirror* and mirrorgroup*, so we can have an empty mirrorgroup while a mirror is temporarily unavailable.

> . You might want to add an active attribute to make it easier to hide/suspend
> troublesome mirrors, deleting one is easy, putting it back is more work :)
> I know you can use comments, but if you ever want to check whether inactive
> mirrors are back or fixed, this makes it easier to select them rather than
> having to parse some comments.
I'll add an active attribute, but the file will never be written out with active=n mirrors from the private infra copy.

> . Use a single attribute instead of ipv4/ipv6=y|n. At the moment, you can have
> ipv4=n and ipv6=n, and with ipv6=y, it's not clear whether the editor forgot to
> set ipv4=n or whether the mirror is ipv4&6. Better use ipv="4|6|46" (or any
> other value 4/6, 4+6..., avoid 4&6).
ipv4=y ipv6=n is the default that covers 99% of the mirrors.
In XPath, isn't it easier to select mirror/uri[@ipv6='y'] than try to detect 6 as a substring or have to or them together? They are separate properties.

> . Protocol is specified twice, once in the attribute, once in the uri, I find
> it more explicit like that, just FYI.
There was a mirror that was https:// before, and it also lets you take the content of the uri node directly easily.

> . How do you want the actual mirrors3.xml to be served? Raw xml, i.e. redirect
> to mirrors3.xml?passthru=1, or better imo, a slightly transformed version to
> help applications parse the data with e.g. with country name put back in
> mirrorgroup and copied on each mirror, inactive ones filtered out, grouped by ?
> , sorted on ? comments out for apps that can't parse them out properly...
> More formats are *trivial* do do, e.g . plain text, csv, :-separated like
> /etc/passwd...
It was going to be no comments anyway, as those are only going to live in the gentoo-infra repo copy of the data (that we will use to build the main one.).
- only active mirrors
- XML is the default format, but doing other outputs as you say, isn't hard. TSV (CSV but using tabs, as commas do exist in the data), YAML and JSON are the easy ones that come to mind.

> BTW, even though it sounds like Austrailia, it's spelt Australia and KZ is in
> Asia.
Fixed.

> Please comment. I can take over for the next round and prepare the XSLs.
 I've been working on one already. I'll try to ping you on IRC about it.
Comment 7 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-08 14:11:12 UTC
Messed up and took out the other bug accidently, sorry.
Comment 8 Xavier Neys (RETIRED) gentoo-dev 2008-01-13 16:19:17 UTC
(In reply to comment #6)
> > . countryname, could be removed, you already have the country code, country
> > names can be listed elsewhere, e.g. in a countries tag.

Done.

> [snip example]
> Sure if we want to do that, it makes less duplication. I only added it for now
> as I was using it for my prototype XSL - that specifically tried to print out
> the output as close to the old version as possible.

Text book example of <xsl:key>
You'll most likely want the same in your upstream XML and use an identity transform to copy the countries to mirrors3.xml

> > . <!ELEMENT mirror (name, uri*)>
> > can you have a mirror without any uri?, I'd say uri+
> Changed to uri+, but also mirror* and mirrorgroup*, so we can have an empty
> mirrorgroup while a mirror is temporarily unavailable.

Changed to mirror+, if the list is empty, we should not care anymore.
Changed to mirrorgroup+, empty groups should be filtered out upstream. Please do not generate empty mirrorgroups. Revert to * if you must but I will not test for empty ones.

> I'll add an active attribute, but the file will never be written out with
> active=n mirrors from the private infra copy.

Removed. I do not want an attribute that is never used. I was not aware of the upstream xml.

> > . Use a single attribute instead of ipv4/ipv6=y|n. At the moment, you can have
> > ipv4=n and ipv6=n, and with ipv6=y, it's not clear whether the editor forgot to
> > set ipv4=n or whether the mirror is ipv4&6. Better use ipv="4|6|46" (or any
> > other value 4/6, 4+6..., avoid 4&6).
> ipv4=y ipv6=n is the default that covers 99% of the mirrors.
> In XPath, isn't it easier to select mirror/uri[@ipv6='y'] than try to detect 6
> as a substring or have to or them together? They are separate properties.

Testing on 6 would be trivial. One or two attributes is the same to me, I just won't test (@ipv4='n' and @ipv6='n')

> > . How do you want the actual mirrors3.xml to be served? Raw xml, i.e. redirect
> > to mirrors3.xml?passthru=1, or better imo, a slightly transformed version to
> > help applications parse the data

You didn't say. IMO mirrorselect shoud not access the raw xml with ?passthru=1 but rather receive a transformed one that remains consistent should we change the internal format.

> > More formats are *trivial* do do, e.g . plain text, csv, :-separated like
> > /etc/passwd...
> - XML is the default format, but doing other outputs as you say, isn't hard.
> TSV (CSV but using tabs, as commas do exist in the data), YAML and JSON are the
> easy ones that come to mind.

Easy enough whenever the need for a different format arises.

> > Please comment. I can take over for the next round and prepare the XSLs.
>  I've been working on one already. I'll try to ping you on IRC about it.

Not very pingable these days. Please mail or attach what you have.
Comment 9 Xavier Neys (RETIRED) gentoo-dev 2008-01-17 22:24:27 UTC
http://www.gentoo.org/main/en/mirrors2.xml is the new mirror list, data comes from mirrors3.xml
Links can now point straight to regions or countries, e.g. http://www.gentoo.org/main/en/mirrors2.xml#Finland

If you can confirm mirrorselect uses mirrors.xml?passthru=1, I can redirect the current mirrors.xml to mirrors2.xml without disrupting it.

http://www.gentoo.org/main/en/mirrors3.xml serves the XML data, other formats can be tought of later on.

Important: do not forget to add the processing instruction to mirrors3.xml when you transform your upstream XML. If you need help with the upstream XSL, just ask.
Besides, make the new mirrorselect use mirrors3.xml, *not* mirrors3.xml?pasthru=1

Enjoy!
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-01-17 22:46:08 UTC
neysx++: that looks absolutely stunning.

One very small glitch I see, for mirrors that have ipv6=y, rendering in mirrors2.xml can you have:
ipv4=y ipv6=y => "IPv4 & IPv6"
ipv4=n ipv6=y => "IPv6 only"

As the current display doesn't tell you if it's an IPv6 only address.
Comment 11 Xavier Neys (RETIRED) gentoo-dev 2008-01-17 23:05:54 UTC
(In reply to comment #10)
> neysx++: that looks absolutely stunning.

Thanks.

> One very small glitch I see, for mirrors that have ipv6=y, rendering in
> mirrors2.xml can you have:
> ipv4=y ipv6=y => "IPv4 & IPv6"
> ipv4=n ipv6=y => "IPv6 only"
> 
> As the current display doesn't tell you if it's an IPv6 only address.

Sure, I only just noticed the "ipv6 only" in mirrors.xml
Comment 12 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-12-28 01:47:54 UTC
Ok, this bug looks like we resolved it a while ago.