My last emerge of the gentoo-sources package has taken an extremely long time. Running strace on the emerge process told me that emerge was busy forking processes for "prelink --verify" which mostly died very soon after they were started. Quite a lot of process creation overhead there. I assume that I hadn't observed such behaviour in the past because I probably added prelink-checksums to my FEATURES only recently. So much for what I observed. Now what to do about this? I can think of several possible improvements, and we might need a little of each. First suggestion: only run prelink on ELF files. In bug #369227 I contributed to a patch for qcheck to handle prelink for its checksums. That patch does have a look at the file first, and only runs prelink if the magic bytes of the file indicate that it (very probably) is an ELF executable or shared object file. Latest versions of the patch uses structures from linux headers, but there is an early version which was looking at the raw bytes, which might be more suitable for python and portage. To be more precise, look at attachment 277091 [details, diff] and port that to python. Read the first 17 bytes of the file into a bytes list called magic and then check for magic[0] == '\177' && magic[1] == 'E' && magic[2] == 'L' && magic[3] == 'F' && (magic[16] == 2 /* exe */ || magic[16] == 3 /* so */) As we will be reading the file for its magic in any case, rading those bytes should be possible with virtually no penalty at all. Will try to come up with a patch. Second suggestion: Do not undo prelink for newly compiled files. Usually portage compiles files from source, installs them to a temporary image directory, and checksums them from there. Prelink is applied only after that. So by the time the checksums are computed, the file is guaranteed not to be prelinked. In such a setup, calling "prelink --verify" is pointless. Ebuilds should never try to prelink stuff themselves. The only exception that I can think of is quickpkg, which generates a binary package from a live system snapshot. Those files might have been prelinked, in which case portage might want to un-prelink them for checksumming. So one option would be to ignore the prelink-checksums FEATURE for packages compiled from source, and only honour it for binary packages. Another solution would be for quickpkg to undo prelink for the files it packages. Third suggestion: Clarify documentation. Currently the documentation in "man make.conf" reads: "This feature is useful only if [...] accurate checksums (despite prelinking) are needed [...] for checking the integrity of installed files". It was this documentation which made me enable the flag in the first place, despite the fact that I had successfully used integrity checks in the past. There is no indication that for all operations not involving quickpkg, the flag still isn't required, even if you use prelink and want to do integrity checks, the latter e.g. using "qcheck --prelink". I think that the documentation should mention that this only affects packages built from prelinked files, e.g. using quickpkg, and that users not using such processes don't need the flag either.
elf.h is from the C library, not linux headers we don't really have any C code at all in portage, so it probably better to use scanelf to locate the ELFs. something like: scanelf -BR -E ET_DYN,ET_EXEC "${D}"
Created attachment 290927 [details, diff] Look for magic bytes This patch looks for magic bytes directly from within python. Things could be even more efficient if we could make do with a single opening of the file. But I fear that that might involve a change of API, from file names to file objects, which could possibly break other things out there.
(In reply to comment #1) > elf.h is from the C library, not linux headers OK, my mistake, thanks for the clarification. > we don't really have any C code at all in portage, so it probably better to use > scanelf to locate the ELFs. something like: > scanelf -BR -E ET_DYN,ET_EXEC "${D}" Using scanelf would mean opening all files once for scanelf and a second time for checksumming. And the second pass would only be after the first pass had finished. So for large packages and small caches, this could mean that files would have to be read from disk over again. Not a solution I fancy. As inspecting a few bytes doesn't require C code, I'd rather do this in portage, even if the magic bytes look---well---magic without their macro names.
Created attachment 290933 [details, diff] Look for magic bytes (v2) Second iteration of the patch, after comments from zmedico on IRC.
Created attachment 290943 [details, diff] Look for magic bytes (v3) This time I actually tested the patch. I had though I'd done so before, but it had been using the system python modules, not those from my git working tree.
Created attachment 290945 [details, diff] Documentation change (v1) First stab at a change of the man page text. Not completely happy with this myself, but better than nothing, so unless someone has a better wording, I'd still propose this for merging.
(In reply to comment #5) > Created attachment 290943 [details, diff] > Look for magic bytes (v3) > > This time I actually tested the patch. I had though I'd done so before, but it > had been using the system python modules, not those from my git working tree. Thanks, this is in git: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=8b41c9adc96e7b2482be7a43e08f582cca96f5ca
(In reply to comment #6) > Created attachment 290945 [details, diff] > Documentation change (v1) > > First stab at a change of the man page text. Not completely happy with this > myself, but better than nothing, so unless someone has a better wording, I'd > still propose this for merging. Thanks, that's it git too: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=7dd07ac115fd787d34b79d626644c1feb3291e4b
This is fixed in 2.1.10.32 and 2.2.0_alpha72.