Hi Solar, ~2 weeks ago you've sent a request to portage-utils@ for a qfile enhancement: adding a "-f file" option, so that the list of files to query are read from a file instead of the command line args (with "-f -" for stdin). You've also asked for a bug report, to track progress. I'm sorry for the late reply, i've been busy this last weeks, and offline most of the time. Anyway, i have finally found time to implement the feature yesterday. I can't send a patch now tho, since i didn't have a recent original file to diff with... I'll download one from cvsweb today, and will attach a patch here next time i get an internet access, somewhere next week.
Thanks TGL..
Here is (finally) the patch for --from-file support. Sorry for sending that late again, i've been offline longer than expected. The patch is a bit big, because it also reorganize some of the qfile code: - the qfile(...) function now takes a structure as argument, instead of the numerous arguments it was using before. This struct (qfile_args_t) holds all the various arrays that are needed (the basenames of query items, their dirnames, etc.) - whereas before all this arrays where prepared in qfile_main(...), this is now done in a new separate function, which is meant to fill the qfile_args_t struct from a list of query items (argv). - there are also new functions to create and free such structures. This helps keeping the code a bit more readable i think, although it is a bit longer than before. And i've also moved a few "free(something)" lines around, to make valgrind happy. About --from-file support, the way it works is obvious: instead of using argv for query arguments, it first builds a list by doing some fgets on the input file. The only trick is that there is a limit on the number of lines which are read from this file and treated at one time. This was required for handling huge "find ... | qfile -f -" queries, so that memory consumption stays bounded. To give you a rought idea, handling 100 000 files at a time would have consumed ~20MB RAM, meaning that a "find / | qfile -o -f -" on my system would have consumed ~100MB. Also, when using some too long lists of query items, performances drop because of bad caches usage. After doing a few benchs (for which i will attach the results), i've choosed a default limit of 5 000. It keeps memory consumption very low, has reasonable performances here, and also has the benefit of displaying some results at a regular rate when doing a huge --orphans query (because with --orphans, results for a group of query items are all displayed at one time, when the search ends). Anyway, this default value can be changed with a new option (--max-args) option, if anyone cares. Talking about performances, i've noted that there has been a change in CVS (and 0.1.22) a few months ago which makes reading the vdb CONTENTS files quite slow: http://sources.gentoo.org/viewcvs.py/gentoo-projects/portage-utils/main.c?r1=1.124&r2=1.125 This has a big influence on what a correct value for the above mentioned --max-args limit is. My 5 000 value is based on an optimized version of this code chunk, for which i have opened bug #160725. Finally, i've also written a new section about --from-file usage for the qfile.1 manpage.
Created attachment 105843 [details, diff] qfile--from-file.patch Adds --from-file/-f (and --max-args/-m) support for qfile.
Created attachment 105845 [details, diff] man--move-config-orphans-script.patch A small patch for qfile-02-orphans.include, which removes an example script that i've moved to a new man section.
Created attachment 105849 [details] man/include/qfile-04-from-file.include A new qfile.1 section about the --from-file option.
Created attachment 105855 [details] bench.sh For what it's worth, the script i've used to bench "qfile -f file -m XXX". Usage is as follow: - create some "something.list" files in the current directory, which are lists of various sizes. I've used: # echo -e "/usr/bin/vi\n/usr/bin/vim" > 00-very-short.list # find /bin > 01-bin.list # find /usr/bin > 02-usr-bin.list # find /usr/share/man > 03-usr-share-man.list # find /usr/lib > 04-usr-lib.list - launch the script, and go make some coffee if you have some lists with tenths of thousands entries. It will produce a "bench.log" file.
Created attachment 105859 [details] bench.log.with-vdb-contents-whitespace-crap Here are some benchs results with the current main.c code for reading CONTENTS files (before bug #160725).
Created attachment 105867 [details] bench.log.improved-vdb-contents-whitespace-crap Here are some benchs results with the slow chunk of main.c rewritten (after bug #160725).
Created attachment 105869 [details] bench.log.without-vdb-contents-whitespace-crap And for reference, here are benchs results with the offending chunk removed (ie., similar to 0.1.21).
Created attachment 105873 [details] bench.log.short-lists Finally, here is a quick bench i made to check that using too high --max-args values doesn't kill perfs of small queries like it does on big ones.
I'm going to commit this with minor changes to the option name.
cvs ci -m "- qfile -f file support. TGL bug #158829" ? scripts /var/cvsroot/gentoo-projects/portage-utils/qfile.c,v <-- qfile.c new revision: 1.39; previous revision: 1.38 /var/cvsroot/gentoo-projects/portage-utils/man/mkman.sh,v <-- man/mkman.sh new revision: 1.10; previous revision: 1.9 /var/cvsroot/gentoo-projects/portage-utils/man/qfile.1,v <-- man/qfile.1 new revision: 1.21; previous revision: 1.20 /var/cvsroot/gentoo-projects/portage-utils/man/include/qfile-02-orphans.include,v <-- man/include/qfile-02-orphans.include new revision: 1.2; previous revision: 1.1 /var/cvsroot/gentoo-projects/portage-utils/man/include/qfile-04-from.include,v <-- man/include/qfile-04-from.include initial revision: 1.1 /var/cvsroot/gentoo-projects/portage-utils/man/include/qfile-99-authors.include,v <-- man/include/qfile-99-authors.include initial revision: 1.1