The attached patch makes epkginfo use expat instead of sax to parse metadata.xml files, resulting in a big speed gain as expat is an non-validating XML parser.
Created attachment 210154 [details] Proposed patch
Thanks for the patch. However, epkginfo is now a wrapper that calls equery meta to produce the output. Unfortunately, equery meta appears to have the same performance as epkginfo. So equery meta will need to be investigated to see if it can be sped up in a similar fashion.
equery meta makes use of cElementTree, which is considered quite fast. During my testing of meta and epkginfo, equery meta was consistently faster or about the same speed, despite doing more work. On some queries epkginfo sometimes took up to 60 seconds to complete... not sure what was going on there (I don't think it was xml related). On my current system, equery m always returns in under a second. metadata.xml files are generally quite small and we're only working on one at a time, not parsing a whole tree of xml files, so I doubt speed would increase noticeably by changing the parser in meta. However, one interesting possibility would be to use the lxml.etree library provided by dev-python/lxml. It implements the ElementTree API, so there would be no refactoring needed, but may provide increased performance [1]. Something like: try: from lxml import etree except ImportError: import xml.etree.cElementTree as etree in gentoolkit/metadata.py If anyone is interested in testing performance benefits of this, go ahead. -Doug [1] http://codespeak.net/lxml/performance.html
The benefit from the proposed patch to 0.3.0_rc7 was in using a non-validating parser (expat) instead of a validating one (sax). As you said, it was only parsing a small XML file and so most of the time was spent on downloading needed files for validation. As it's now using etree, this issue is moot.