Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 293096

Summary: app-portage/gentoolkit-0.3.0_rc8 - Investigate if it is possible to speed up equery meta
Product: Portage Development Reporter: Dror Levin (RETIRED) <spatz>
Component: ToolsAssignee: Portage Tools Team <tools-portage>
Status: VERIFIED NEEDINFO    
Severity: enhancement    
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Proposed patch

Description Dror Levin (RETIRED) gentoo-dev 2009-11-13 18:31:20 UTC
The attached patch makes epkginfo use expat instead of sax to parse metadata.xml files, resulting in a big speed gain as expat is an non-validating XML parser.
Comment 1 Dror Levin (RETIRED) gentoo-dev 2009-11-13 18:31:44 UTC
Created attachment 210154 [details]
Proposed patch
Comment 2 Paul Varner (RETIRED) gentoo-dev 2010-01-07 17:03:35 UTC
Thanks for the patch. 

However, epkginfo is now a wrapper that calls equery meta to produce the output. Unfortunately, equery meta appears to have the same performance as epkginfo.  So equery meta will need to be investigated to see if it can be sped up in a similar fashion.
Comment 3 Douglas Anderson 2010-01-14 22:46:11 UTC
equery meta makes use of cElementTree, which is considered quite fast. During my testing of meta and epkginfo, equery meta was consistently faster or about the same speed, despite doing more work. On some queries epkginfo sometimes took up to 60 seconds to complete... not sure what was going on there (I don't think it was xml related). On my current system, equery m always returns in under a second.

metadata.xml files are generally quite small and we're only working on one at a time, not parsing a whole tree of xml files, so I doubt speed would increase noticeably by changing the parser in meta.

However, one interesting possibility would be to use the lxml.etree library provided by dev-python/lxml. It implements the ElementTree API, so there would be no refactoring needed, but may provide increased performance [1].

Something like:

try:
  from lxml import etree
except ImportError:
  import xml.etree.cElementTree as etree

in gentoolkit/metadata.py

If anyone is interested in testing performance benefits of this, go ahead.

-Doug

[1] http://codespeak.net/lxml/performance.html
Comment 4 Dror Levin (RETIRED) gentoo-dev 2010-01-14 23:03:50 UTC
The benefit from the proposed patch to 0.3.0_rc7 was in using a non-validating parser (expat) instead of a validating one (sax). As you said, it was only parsing a small XML file and so most of the time was spent on downloading needed files for validation. As it's now using etree, this issue is moot.