Summary: | app-portage/portage-utils-0.3.1: qfile no longer understand dirs symlinks | ||
---|---|---|---|
Product: | Portage Development | Reporter: | TGL <tom.gl> |
Component: | Tools | Assignee: | Portage Utils Team <portage-utils> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
qfile-fix-realpath-checks.patch
qfile-optimized-fix-for-realpath-checks.patch |
Description
TGL
2010-05-09 15:58:08 UTC
Created attachment 230867 [details, diff]
qfile-fix-realpath-checks.patch
Patch against CVS HEAD.
i dont think that's the way we want to address this. the old code was resolving things by accident because of the new root logic being added. i disabled that as an optimization. that block of code does a lot more than just resolve symlinks. i also dislike not being able to query for symlinks themselves. i guess we need a new option to control symlink behavior, and then key off of that in a new "else" case where it *only* does a realpath() ... not the ROOT related checks too. (In reply to comment #2) > the old code was resolving things by accident because of the new root logic being added. That's not true. I've introduced this "dir_name == realpath(dirname(CONTENTS))" check with bug #130004 (r1.30), and the goal was exactly that (not missing actual matches because of directories symlinks). I've then improved this check so that it also behaves correctly when "ROOT != /" in bug #142217 (r1.35), but it was not by accident that the "ROOT == /" was still correctly handled. > i disabled that as an optimization. that block of code does a lot more > than just resolve symlinks. I know what it does. Disabling it gives incorrect results, and is also not really a significant optimization. Remember that you only enter this block when the basenames of the CONTENTS entry and of target file do match, but not their dirnames. This is a rare corner case. On my system, the worst occurence is when searching for a file named "README.bz2" (802 entries in my VDB). Here are some benchs of this worst case: === without the attached patch === % sync; echo 3 > /proc/sys/vm/drop_caches % time ./q file /usr/share/doc/vlc-1.0.6/README.bz2 media-video/vlc (/usr/share/doc/vlc-1.0.6/README.bz2) real 0m13.696s user 0m0.152s sys 0m0.296s % time ./q file /usr/share/doc/vlc-1.0.6/README.bz2 media-video/vlc (/usr/share/doc/vlc-1.0.6/README.bz2) real 0m0.190s user 0m0.160s sys 0m0.024s === with the attached patch === % sync; echo 3 > /proc/sys/vm/drop_caches % time ./q file /usr/share/doc/vlc-1.0.6/README.bz2 media-video/vlc (/usr/share/doc/vlc-1.0.6/README.bz2) real 0m14.622s user 0m0.120s sys 0m0.356s % time ./q file /usr/share/doc/vlc-1.0.6/README.bz2 media-video/vlc (/usr/share/doc/vlc-1.0.6/README.bz2) real 0m0.182s user 0m0.156s sys 0m0.020s And here is an other bench with many files: % find /usr/lib* -type f -o -type l > /tmp/libs.list % wc -l /tmp/libs.list 70815 /tmp/libs.list === without the attached patch === % sync; echo 3 > /proc/sys/vm/drop_caches % time ./q file -o -f /tmp/libs.list | wc -l 29788 real 1m47.681s user 1m25.441s sys 0m1.308s % time ./q file -o -f /tmp/libs.list | wc -l 29788 real 1m23.295s user 1m22.545s sys 0m0.700s === with the attached patch === % sync; echo 3 > /proc/sys/vm/drop_caches % time ./q file -o -f /tmp/libs.list | wc -l 15137 real 1m55.669s user 1m27.021s sys 0m3.804s % time ./q file -o -f /tmp/libs.list | wc -l 15137 real 1m26.454s user 1m23.217s sys 0m3.132s Sure, your version is a bit faster here, but at the price of 50% false positives in its output. > i also dislike not being able to query for symlinks themselves. Huh? You can query for symlinks themselves, that's exactly what "qfile /path/to/the/symlink" does. What you can't do is automagically querying for its target. But I must be misunderstanding something here, because you know that already (bug #317471). > i guess we need a new option to control symlink behavior, and then key off of > that in a new "else" case where it *only* does a realpath() ... not the ROOT > related checks too. I don't understand what you're suggesting. Could you give an example of this alternative behavior? I've had an idea to reduce the slight performances penalty introduced by the "realpath(dirname(CONTENTS))" check I want to reactivate: reuse previously calculated realpath when the dirname has not yet changed. Thanks to CONTENTS files being sorted, it can save a good part of the "realpath()" calls. Here are the figures, to compare with the ones from my previous comment. === with new patch === % sync; echo 3 > /proc/sys/vm/drop_caches % time ./q file -o -f /tmp/libs.list | wc -l 15137 real 1m55.341s user 1m27.741s sys 0m1.596s % time ./q file -o -f /tmp/libs.list | wc -l 15137 real 1m25.486s user 1m24.477s sys 0m0.948s Created attachment 234585 [details, diff]
qfile-optimized-fix-for-realpath-checks.patch
The patch implementing the little optimization explained in my previous comment.
your assuming that i keep track of all this. i dont use/develop portage-utils every day anymore which means i forget history. every once in a while i pound through the bugs since people do use these utilities. there is a performance regression with the root code handling due to the constant alloc/string copies (xasprintf) which i fixed. that code branch looks like it existed only to support a ROOT which most commonly is nothing. you should probably add a comment as to the purpose of the branch if you dont want it being changed again. seems to work with portage-utils-0.11. if you're still seeing issues, please check out latest cvs. |