Once in a while I find that I have several GB of kernel source, and decide to clean up a bit. The problem is that unmerging packages with lots of files is horribly slow. Manually deleting a kernel tree is several times faster than unmerging it. A strace of emerge shows things that I don't fully understand. During unmerging it appears that files are copied to files named like /var/lib/portage/prelink-checksum.tmp.25978, where the content of the files being deleted is copied. Lock files are created, and I don't really understand what prelink has to do with all this either. Reproducible: Always Steps to Reproduce: 1. emerge -u world 2. emerge -P kernel-dev-source Expected Results: Finish in a reasonable amount of time
When portage unmerges a package, it does a few checks (such as mtime and md5) on each file to make sure it is the same file that was installed. As for prelink, many people use it to speed up the execution of their binaries. This process changes the files' MD5 (and possibly the mtime). On unmerge, portage runs prelink with a certain parameter to get the MD5 of the file *before* it was prelinked, which would match up with what portage recorded if the file hasn't been modified beyond prelinking. So, yes, portage can be painfully slow unmerging packages with lots of files, but it is necessary to ensure that files aren't deleted that shouldn't be.
Aha, that makes more sense now. But why does it check every file? Surely this can be optimized a bit. Like running it only on libraries, for instance. I suppose one problem is that portage doesn't seem to distinguish different kinds of files.
How to identify *cleanly* what needs to be checked? Mass file calls? nasty, fork/exec. Some integration of libmagic? Etc.
Possibility of prelink --undo piped to the md5 code by chance? Would avoid the intermediate file at least (potentially speeding it up).
Couldn't it check only/exclude certain paths? After all, libraries are supposed to be in specific places. If one somehow ends up in /usr/src that should be treated as a bug anyway.
(In reply to comment #4) > Possibility of prelink --undo piped to the md5 code by chance? Would avoid the intermediate file at least (potentially speeding it up). There is a prelink -o --undo-output=FILE option that could be used to direct the output to /dev/stdout. For non-elf files, prelink --undo returns exit status 1. (In reply to comment #5) > Couldn't it check only/exclude certain paths? Maybe it could use /etc/prelink.conf.
Created attachment 65504 [details, diff] optimization for in place checksum of files rejected by prelink This patch uses prelink --undo-output=prelink_tmpfile in order to avoid extra copying. Files rejected by prelink are checksummed in place.
Zac, speed improvement via the patch? Offhand, don't see any issues with it (haven't done anything but read it over though). Aside from that, nice usage of spawn (broken up args rather then forcing bash to do it).
Created attachment 65568 [details] checksum benchmark utility Example usage: find /usr/src/linux -type f | time ./checksum.py -p (In reply to comment #8) > Zac, speed improvement via the patch? I have measured a 7.75% decrease in time when this benchmark was performed on the 2.6.12 kernel sources (no elf binaries so prelink --undo rejects every file). Not much difference, but measurable.
*** This bug has been marked as a duplicate of 23851 ***