90444 – dblink.getcontents() speedup

Bug 90444 - dblink.getcontents() speedup

Summary: dblink.getcontents() speedup

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Core (show other bugs)
Hardware:	All Linux

Importance:	High enhancement
Assignee:	Portage team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	133068
	Show dependency tree

Reported:	2005-04-25 15:41 UTC by TGL
Modified:	2006-05-14 20:13 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
dblink_getcontents--remove_normpath_calls.patch (dblink_getcontents--remove_normpath_calls.patch,482 bytes, patch) 2005-04-25 15:42 UTC, TGL	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description TGL 2005-04-25 15:41:15 UTC

While profiling a script that reads CONTENTS files using 'dblink.getcontents()', i've seen that it was spending quite some time in ~400k calls to 'os.path.normpath()' (one per referenced element). I think there is nothing to normalize here (paths in CONTENTS files are already normal by construction) and thus this calls could be avoided. 

The one-line patch i will attach does this. It's effect is easy to check, for instance with equery:

Before:
% time equery belongs /bin/bash > /dev/null
real    0m13.583s
user    0m13.215s
sys     0m0.249s

After:
% time equery belongs /bin/bash > /dev/null
real    0m6.526s
user    0m6.218s
sys     0m0.246s


Reproducible: Always
Steps to Reproduce:

Comment 1 TGL 2005-04-25 15:42:48 UTC

Created attachment 57225 [details, diff]
dblink_getcontents--remove_normpath_calls.patch

(patch is against cvs HEAD)

Comment 2 Brian Harring (RETIRED) gentoo-dev

2005-04-25 16:05:13 UTC

This modification allows for /{2,} in file paths in contents, which is invalid (won't hurt anything, but is invalid).

We need to be defensive both in writing, and reading from contents imo.

Comment 3 TGL 2005-04-25 16:42:00 UTC

"normpath()" is just about removing '.', '..' and redundant '/', but it won't garantee anything else about the paths. So i don't think it adds any safeguard in the reading of CONTENTS file. The only thing that may need it is the 'root' variable, since it comes from the user, but that could be done once for all where the global is defined.

Comment 4 Jason Stubbs (RETIRED) gentoo-dev

2005-04-28 21:05:35 UTC

What is the need to be defensive in reading?

Comment 5 Jason Stubbs (RETIRED) gentoo-dev

2005-07-28 07:24:40 UTC

Putting a hold on feature requests for portage as they are drowning out the 
bugs. Most of these features should be available in the next major version of 
portage. But for the time being, they are just drowning out the major bugs and 
delaying the next version's progress. 
 
Any bugs that contain patches and any bugs for etc-update or dispatch-conf can 
be reopened. Sorry, I'm just not good enough with bugzilla. ;)

Comment 6 TGL 2005-07-28 07:44:42 UTC

Reopening since this one is not a feature request, but a simple improvement
proposal with its patch.

Comment 7 Alec Warner (RETIRED) archtester

2006-05-11 18:29:41 UTC

(In reply to comment #4)
> What is the need to be defensive in reading?
> 

technically the contents could be corrupt, however I think it would be up to the called to fix any errors in the contents, so I'm for removing it..

Comment 8 Zac Medico gentoo-dev

2006-05-14 20:13:55 UTC

It's difficult to justify a sacrifice of robustness for performance.  The contents could be slightly malformed and a sacrifice in robustness here could potentially cause a detectable file collision to go undetected.  In svn r3357 the code now uses both os.path.join and os.path.normpath in favor of maximum robustness.