Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 837377 - dev-python/beautifulsoup4-4.11.1 fails tests
Summary: dev-python/beautifulsoup4-4.11.1 fails tests
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Python Gentoo Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-09 11:05 UTC by Agostino Sarubbo
Modified: 2022-05-15 06:46 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log (build.log,137.50 KB, text/plain)
2022-04-09 11:05 UTC, Agostino Sarubbo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Agostino Sarubbo gentoo-dev 2022-04-09 11:05:46 UTC
https://blogs.gentoo.org/ago/2020/07/04/gentoo-tinderbox/

Issue: dev-python/beautifulsoup4-4.11.1 fails tests.
Discovered on: amd64 (internal ref: ci)
Comment 1 Agostino Sarubbo gentoo-dev 2022-04-09 11:05:48 UTC
Created attachment 769607 [details]
build.log

build log and emerge --info
Comment 2 Agostino Sarubbo gentoo-dev 2022-04-09 11:05:49 UTC
Error(s) that match a know pattern:


E       AssertionError: assert '<foo>СТУФ</foo>' == '<foo>‘’“”</foo>'
E       AssertionError: assert '<foo>СТУФ</foo>' == '<foo>&lsquo;&rsquo;&ldquo;&rdquo;</foo>'
E       AssertionError: assert '<foo>СТУФ</foo>' == '<foo>&#x2018;&#x2019;&#x201C;&#x201D;</foo>'
E       AssertionError: assert 'jk&Atilde;??lm' == '&lsquo;&rsquo;foo&ldquo;&rdquo;'
FAILED bs4/tests/test_dammit.py::TestEntitySubstitution::test_smart_quote_substitution
FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_ascii
FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_html_entities
FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_unicode
FAILED bs4/tests/test_dammit.py::TestUnicodeDammit::test_smart_quotes_to_xml_entities
Comment 3 Chris Mayo 2022-05-09 18:33:03 UTC
From build.log the tinderbox has dev-python/charset_normalizer-2.0.12:0 installed.
charset_normalizer finds possible language matches for the short strings with smart quotes, as seen in comment 2. Extending the strings slightly causes charset_normalizer to detect the encoding as Windows-1250, which is valid but Beautiful Soup will only do quote substitution when the encoding is detected as Windows-1252.

Suggested fix: add dev-python/chardet as a test dependency.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-05-15 06:46:54 UTC
Heh, I wish I had read your comment before I spent my time figuring that out myself ;-).

Either way, it's fixed in:

commit c804a3c3b5e127623237af64d6a3bbe1c50a3d76
Author:     Michał Górny <mgorny@gentoo.org>
AuthorDate: 2022-05-11 20:06:23 +0200
Commit:     Michał Górny <mgorny@gentoo.org>
CommitDate: 2022-05-11 22:11:09 +0200

    dev-python/beautifulsoup4: Enable py3.11
    
    Signed-off-by: Michał Górny <mgorny@gentoo.org>