https://blogs.gentoo.org/ago/2020/07/04/gentoo-tinderbox/ Issue: dev-python/beautifulsoup4-4.11.1 fails tests. Discovered on: amd64 (internal ref: ci)
Created attachment 769607 [details] build.log build log and emerge --info
Error(s) that match a know pattern: E AssertionError: assert '<foo>СТУФ</foo>' == '<foo>‘’“”</foo>' E AssertionError: assert '<foo>СТУФ</foo>' == '<foo>‘’“”</foo>' E AssertionError: assert '<foo>СТУФ</foo>' == '<foo>‘’“”</foo>' E AssertionError: assert 'jkÃ??lm' == '‘’foo“”' FAILED bs4/tests/test_dammit.py::TestEntitySubstitution::test_smart_quote_substitution FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_ascii FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_html_entities FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_unicode FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_xml_entities
From build.log the tinderbox has dev-python/charset_normalizer-2.0.12:0 installed. charset_normalizer finds possible language matches for the short strings with smart quotes, as seen in comment 2. Extending the strings slightly causes charset_normalizer to detect the encoding as Windows-1250, which is valid but Beautiful Soup will only do quote substitution when the encoding is detected as Windows-1252. Suggested fix: add dev-python/chardet as a test dependency.
Heh, I wish I had read your comment before I spent my time figuring that out myself ;-). Either way, it's fixed in: commit c804a3c3b5e127623237af64d6a3bbe1c50a3d76 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2022-05-11 20:06:23 +0200 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2022-05-11 22:11:09 +0200 dev-python/beautifulsoup4: Enable py3.11 Signed-off-by: Michał Górny <mgorny@gentoo.org>