Created attachment 866176 [details, diff] 0001-extractor.py-parse-proto-from-the-uri.patch Currently, the XML "protocol" attribute of a mirror's "uri" tag in distfiles.xml is used to denote whether a URL is FTP, HTTP, or HTTPS. The attribute's value almost invariably repeats the URL's protocol name. By defaulting to the URL's protocol prefix in the absence of a "protocol" attribute, virtually all instances of the attribute can be removed. Currently, all entries of distfiles.xml follow such a convention with the exception of "ftp.lysator.liu.se" which uses "http://ftp.lysator.liu.se" for HTTP, FTP, and RSYNC. This appears to be in error, though, as the URL is invalid when tested with rsync and "ftp://" works fine when tested in a browser.
I think I'd feel better if we used urllib.parse.
Created attachment 866650 [details, diff] 0001-extractor.py-parse-proto-from-the-uri.patch Something like this?
one side effect seems to be that mirrorselect -s4 -S -o returns only 1 address.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/mirrorselect.git/commit/?id=ac250b126b8de24276cd4e9bdc4afab14a9c41e7 commit ac250b126b8de24276cd4e9bdc4afab14a9c41e7 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-08-06 23:29:25 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-08-06 23:29:42 +0000 extractor.py: cleanup py2 compat Bug: https://bugs.gentoo.org/911183 Signed-off-by: Sam James <sam@gentoo.org> mirrorselect/mirrorparser3.py | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) https://gitweb.gentoo.org/proj/mirrorselect.git/commit/?id=fe715a306754a4df2d05a3a24034d015c16377bf commit fe715a306754a4df2d05a3a24034d015c16377bf Author: Peter Levine <plevine457@gmail.com> AuthorDate: 2023-07-24 22:40:02 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-08-06 23:28:34 +0000 extractor.py: parse proto from the uri The protocol can be parsed from the URI so we can get rid of the protocol tag altogether. Bug: https://bugs.gentoo.org/911183 Suggested-by: Florian Schmaus <flow@gentoo.org> Suggested-by: Sam James <sam@gentoo.org> Signed-off-by: Peter Levine <plevine457@gmail.com> Signed-off-by: Sam James <sam@gentoo.org> mirrorselect/mirrorparser3.py | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
Thanks, looks good.
(In reply to Toralf Förster from comment #3) > one side effect seems to be that > > mirrorselect -s4 -S -o > > returns only 1 address. Peter, could you look at this bit?
(In reply to Toralf Förster from comment #3) > one side effect seems to be that > > mirrorselect -s4 -S -o > > returns only 1 address. Hmm. If you look at the "protocol=" sections of https://api.gentoo.org/mirrors/distfiles.xml, you will see that none are "https" except for "https://ftp.agdsn.de", even if the URI begins with "https://". In other words, it's working as expected and the distfiles.xml is incorrect. If there's a "protocol=", it trusts it. Otherwise, it parses from the URI. It could be resolved by correcting the "protocol=" sections or, with this patch, removing them entirely.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1b1ed59070124efda1fa7c738b038aeb006d5b85 commit 1b1ed59070124efda1fa7c738b038aeb006d5b85 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-08-27 00:06:22 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-08-27 00:06:43 +0000 app-portage/mirrorselect: add 2.4.0 Bug: https://bugs.gentoo.org/911183 Signed-off-by: Sam James <sam@gentoo.org> app-portage/mirrorselect/Manifest | 1 + app-portage/mirrorselect/mirrorselect-2.4.0.ebuild | 52 ++++++++++++++++++++++ app-portage/mirrorselect/mirrorselect-9999.ebuild | 1 + 3 files changed, 54 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/data/api.git/commit/?id=c7a6f7fc9611722f20de7b1c9aaad5009311f5b0 commit c7a6f7fc9611722f20de7b1c9aaad5009311f5b0 Author: Peter Levine <plevine457@gmail.com> AuthorDate: 2023-08-31 01:18:07 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-08-31 01:50:22 +0000 distfiles: fix https protocol attributes where appropriate [sam: The plan, per the PR (https://github.com/gentoo/api-gentoo-org/pull/289#issuecomment-1694526475) is to remove the HTTPS attributes a while after new mirrorselect has been stabled + propagates into release media.] Bug: https://bugs.gentoo.org/911183 Signed-off-by: Peter Levine <plevine457@gmail.com> Closes: https://github.com/gentoo/api-gentoo-org/pull/289 Signed-off-by: Sam James <sam@gentoo.org> files/mirrors/distfiles.xml | 112 ++++++++++++++++++++++---------------------- 1 file changed, 56 insertions(+), 56 deletions(-)