Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 909747 - [Tracker] >=dev-lang/python-3.11.4 urllib parser fix breaks tests with non-compliant input data
Summary: [Tracker] >=dev-lang/python-3.11.4 urllib parser fix breaks tests with non-co...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Python Gentoo Team
URL: https://github.com/python/cpython/iss...
Whiteboard:
Keywords: Tracker
Depends on: 909553 909567 910742
Blocks:
  Show dependency tree
 
Reported: 2023-07-06 02:11 UTC by Ninpo
Modified: 2023-07-24 04:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ninpo 2023-07-06 02:11:21 UTC
In Python 3.11.4, urllib's parser received a fix that makes it correctly parse [ and ] in a url string, in that it checks that [] contains a properly formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ] after the // part of the URL should be passed as %5B and %5D if they are not intended to contain an IP address, e.g. if part of a password. They are otherwise invalid characters and any tests using them should be checking that they fail.

Reproducible: Always

Steps to Reproduce:
Example test from a randomly affected package, yarl:
    def test_weird_user3(self):
        u = URL("//[some]@host")
        assert u.scheme == ""
        assert u.user == "[some]"
        assert u.password is None
        assert u.host == "host"
        assert u.path == "/"
        assert u.query_string == ""
        assert u.fragment == ""
Actual Results:  
ValueError: 'some' does not appear to be an IPv4 or IPv6 address

Expected Results:  
These tests passed previously, however this was due to improper handling of [ and ] in urllib.

Many of the tests can be fixed by passing in %5B and %5D as follows:

    def test_weird_user3(self):
        u = URL("//%5Bsome%5D@host")
        assert u.scheme == ""
        assert u.user == "[some]"
        assert u.password is None
        assert u.host == "host"
        assert u.path == "/"
        assert u.query_string == ""
        assert u.fragment == ""

However tests such as this one are just plain bad data:
def test_ipv6_zone():
    url = URL("http://[fe80::822a:a8ff:fe49:470c%тест%42]:123")
    assert url.raw_host == "fe80::822a:a8ff:fe49:470c%тест%42"
    assert url.host == url.raw_host

Hostnames should not contain [ or ] they are reserved characters for IP addresses.
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2023-07-06 12:32:57 UTC
(In reply to Ninpo from comment #0)
> In Python 3.11.4, urllib's parser received a fix that makes it correctly
> parse [ and ] in a url string, in that it checks that [] contains a properly
> formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ]

FWICS, only IPv6 addresses are valid in brackets.
Comment 2 Ninpo 2023-07-06 14:39:55 UTC
(In reply to Michał Górny from comment #1)
> (In reply to Ninpo from comment #0)
> > In Python 3.11.4, urllib's parser received a fix that makes it correctly
> > parse [ and ] in a url string, in that it checks that [] contains a properly
> > formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ]
> 
> FWICS, only IPv6 addresses are valid in brackets.

Yeah the ValueError kicked out is a touch ambiguous with that.