http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup parses arbitrarily invalid XML- or HTML-like substance into a tree representation. It provides methods and Pythonic idioms that make it easy to search and modify the tree. A well-formed XML/HTML document will yield a well-formed data structure. An ill-formed XML/HTML document will yield a correspondingly ill-formed data structure. If your document is only locally well-formed, you can use this library to find and process the well-formed part of it. The BeautifulSoup class has heuristics for obtaining a sensible parse tree in the face of common HTML errors. Beautiful Soup has no external dependencies. It works with Python 2.2 and up. Reproducible: Always Steps to Reproduce: 1. 2. 3.
Created attachment 71145 [details] beautifulsoup-2.1.1.ebuild
- #IUSE="doc" -> IUSE=""
* Testing BeautifulSoup installation: # mkdir -p /usr/local/portage/dev-python/beautifulsoup # cd /usr/local/portage/dev-python/beautifulsoup # wget http://bugs.gentoo.org/attachment.cgi?id=71145 -O beautifulsoup-2.1.1.ebuild # sed 's/^#IUSE.*/IUSE=""/' beautifulsoup-2.1.1.ebuild -i # cd # emerge beautifulsoup --digest -av ALL Correct! * Testing modules help: $ ipython : import BeautifulSoupTests : help (BeautifulSoupTests) OK : import BeautifulSoup : help (BeautifulSoup) FAIL: TypeError: cannot concatenate 'str' and 'NullType' objects
Sorry, all is right! * Testing modules help: $ ipython : from BeautifulSoup import BeautifulSoup : help (BeautifulSoup) OK
It should be in portage, it's a very well parser html/xml. In addition python community recommend its use instead of htmllib.HTMLParser and HTMLParser.HTMLParser
(In reply to comment #3) > : help (BeautifulSoup) > > FAIL: TypeError: cannot concatenate 'str' and 'NullType' objects Python's help code explodes trying to document BeautifulSoup.Null, which has a somewhat peculiar type. Not really BeautifulSoup's fault, so committed it with some cleanup.