Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 179206

Summary: Easy update_dbentry optimization
Product: Portage Development Reporter: Jason Lai <jason.lai>
Component: CoreAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: minor CC: ferringb
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 181949    

Description Jason Lai 2007-05-20 11:52:45 UTC
After profiling emerge, I noticed that count() was getting called quite a few times. Changing two lines in update_dbentry seems to produce a noticeable speedup.

In portage_update.py in update_dbentry() change (both occurances):
  if mycontent.count(old_value):
to
  if old_value in mycontent:

I'm using portage 2.1.2.2 but this appears to be relevant to the current SVN as well. The new code should behave identically to the existing code.

Reproducible: Always

Steps to Reproduce:
1. time emerge -p file
2. apply patch
3. time emerge -p file

Actual Results:  
Emerge is particularly slow on my system, but it improved dependency checking time considerably, from 35 seconds to 25 seconds. It may help on faster systems as well.


Python 2.4.
Comment 1 Brian Harring (RETIRED) gentoo-dev 2007-05-20 12:05:00 UTC
offhand, this code path should only run as needed for emerge ops; testing wise, you need to verify also that the speed up you're seeing isn't via kernel caching; basically pre-run the ops till you get stable result, then patch, rerun avging.

Either way, the count usage there is daft as jason pointed out...
Comment 2 Jason Lai 2007-05-20 12:44:05 UTC
It turns out that the "str in str" construct is only included in Python 2.3 and above. However, "if mycontent.find(old_value) != -1:" (the -1 is important!) works on Python 2.2 and seems to run even faster than "str in str", at least for me.
Comment 3 Brian Harring (RETIRED) gentoo-dev 2007-05-20 14:05:03 UTC
(In reply to comment #2)
> It turns out that the "str in str" construct is only included in Python 2.3 and
> above.
Prior being has_key, rather then __contains__
I hate has_key. :)

Meanwhile, portage is >=2.3; if you look through the code, quite a bit of 'in' usage, so wouldn't worry about that.

Meanwhile, didn't answer the question of why this codepath is getting invoked multiple times- should be some form of mtime checks to do it only when required iirc.
Comment 4 Zac Medico gentoo-dev 2007-05-20 20:17:41 UTC
(In reply to comment #0)
> In portage_update.py in update_dbentry() change (both occurances):
>   if mycontent.count(old_value):
> to
>   if old_value in mycontent:

Thanks, that's in svn r6560.

(In reply to comment #3)
> Meanwhile, didn't answer the question of why this codepath is getting invoked
> multiple times- should be some form of mtime checks to do it only when required
> iirc.

As a temporary workaround for bug 122089, update_dbentry is used for ebuilds that do not exist in the portage tree.  It's done in memory, so the updates are not persistent.  Generally, it's only a corner case because usually there is a live ebuild in there portage tree that the metadata is pulled from.  In the future, we'll use metadata directly from the vdb, so the "in memory" approach will cease to be feasible.

I was planning to handle *DEPEND updates in the vdb via and emaint module, but now that *DEPEND from the vdb is cached in /var/cache/edb/vdb_metadata.pickle, it may be feasible to update the whole vdb on the fly (based on the $PORTDIR/profiles/updates mtime checks).  I'll try that and see how the performance measures up.
Comment 5 Zac Medico gentoo-dev 2007-05-25 09:09:21 UTC
This has been released in 2.1.2.8.