While profiling a script that reads CONTENTS files using 'dblink.getcontents()', i've seen that it was spending quite some time in ~400k calls to 'os.path.normpath()' (one per referenced element). I think there is nothing to normalize here (paths in CONTENTS files are already normal by construction) and thus this calls could be avoided. The one-line patch i will attach does this. It's effect is easy to check, for instance with equery: Before: % time equery belongs /bin/bash > /dev/null real 0m13.583s user 0m13.215s sys 0m0.249s After: % time equery belongs /bin/bash > /dev/null real 0m6.526s user 0m6.218s sys 0m0.246s Reproducible: Always Steps to Reproduce:
Created attachment 57225 [details, diff] dblink_getcontents--remove_normpath_calls.patch (patch is against cvs HEAD)
This modification allows for /{2,} in file paths in contents, which is invalid (won't hurt anything, but is invalid). We need to be defensive both in writing, and reading from contents imo.
"normpath()" is just about removing '.', '..' and redundant '/', but it won't garantee anything else about the paths. So i don't think it adds any safeguard in the reading of CONTENTS file. The only thing that may need it is the 'root' variable, since it comes from the user, but that could be done once for all where the global is defined.
What is the need to be defensive in reading?
Putting a hold on feature requests for portage as they are drowning out the bugs. Most of these features should be available in the next major version of portage. But for the time being, they are just drowning out the major bugs and delaying the next version's progress. Any bugs that contain patches and any bugs for etc-update or dispatch-conf can be reopened. Sorry, I'm just not good enough with bugzilla. ;)
Reopening since this one is not a feature request, but a simple improvement proposal with its patch.
(In reply to comment #4) > What is the need to be defensive in reading? > technically the contents could be corrupt, however I think it would be up to the called to fix any errors in the contents, so I'm for removing it..
It's difficult to justify a sacrifice of robustness for performance. The contents could be slightly malformed and a sacrifice in robustness here could potentially cause a detectable file collision to go undetected. In svn r3357 the code now uses both os.path.join and os.path.normpath in favor of maximum robustness.