Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 909747

Summary: [Tracker] >=dev-lang/python-3.11.4 urllib parser fix breaks tests with non-compliant input data
Product: Gentoo Linux Reporter: Ninpo <ninpo>
Component: Current packagesAssignee: Python Gentoo Team <python>
Status: CONFIRMED ---    
Severity: normal CC: mgorny
Priority: Normal Keywords: Tracker
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://github.com/python/cpython/issues/103848
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on: 909553, 909567, 910742    
Bug Blocks:    

Description Ninpo 2023-07-06 02:11:21 UTC
In Python 3.11.4, urllib's parser received a fix that makes it correctly parse [ and ] in a url string, in that it checks that [] contains a properly formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ] after the // part of the URL should be passed as %5B and %5D if they are not intended to contain an IP address, e.g. if part of a password. They are otherwise invalid characters and any tests using them should be checking that they fail.

Reproducible: Always

Steps to Reproduce:
Example test from a randomly affected package, yarl:
    def test_weird_user3(self):
        u = URL("//[some]@host")
        assert u.scheme == ""
        assert u.user == "[some]"
        assert u.password is None
        assert u.host == "host"
        assert u.path == "/"
        assert u.query_string == ""
        assert u.fragment == ""
Actual Results:  
ValueError: 'some' does not appear to be an IPv4 or IPv6 address

Expected Results:  
These tests passed previously, however this was due to improper handling of [ and ] in urllib.

Many of the tests can be fixed by passing in %5B and %5D as follows:

    def test_weird_user3(self):
        u = URL("//%5Bsome%5D@host")
        assert u.scheme == ""
        assert u.user == "[some]"
        assert u.password is None
        assert u.host == "host"
        assert u.path == "/"
        assert u.query_string == ""
        assert u.fragment == ""

However tests such as this one are just plain bad data:
def test_ipv6_zone():
    url = URL("http://[fe80::822a:a8ff:fe49:470c%тест%42]:123")
    assert url.raw_host == "fe80::822a:a8ff:fe49:470c%тест%42"
    assert url.host == url.raw_host

Hostnames should not contain [ or ] they are reserved characters for IP addresses.
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2023-07-06 12:32:57 UTC
(In reply to Ninpo from comment #0)
> In Python 3.11.4, urllib's parser received a fix that makes it correctly
> parse [ and ] in a url string, in that it checks that [] contains a properly
> formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ]

FWICS, only IPv6 addresses are valid in brackets.
Comment 2 Ninpo 2023-07-06 14:39:55 UTC
(In reply to Michał Górny from comment #1)
> (In reply to Ninpo from comment #0)
> > In Python 3.11.4, urllib's parser received a fix that makes it correctly
> > parse [ and ] in a url string, in that it checks that [] contains a properly
> > formatted IPv4 or IPv6 address, as per the URL spec. Intended literal [ or ]
> 
> FWICS, only IPv6 addresses are valid in brackets.

Yeah the ValueError kicked out is a touch ambiguous with that.