Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 369227 - app-portage/portage-utils-0.5 qcheck: handle prelink
Summary: app-portage/portage-utils-0.5 qcheck: handle prelink
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal enhancement (vote)
Assignee: Portage Utils Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-29 21:33 UTC by Martin von Gagern
Modified: 2011-10-03 17:14 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
First patch (bug369227-qcheck-prelink.patch,6.04 KB, patch)
2011-05-30 17:08 UTC, Martin von Gagern
Details | Diff
Second patch (bug369227-qcheck-prelink.patch,5.61 KB, patch)
2011-05-30 19:02 UTC, Martin von Gagern
Details | Diff
Third patch (bug369227.patch,6.53 KB, patch)
2011-06-15 09:41 UTC, Martin von Gagern
Details | Diff
Fourth patch (bug369227.patch,6.48 KB, patch)
2011-06-15 10:16 UTC, Martin von Gagern
Details | Diff
Fifth patch (bug369227.patch,8.34 KB, patch)
2011-06-24 13:32 UTC, Martin von Gagern
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2011-05-29 21:33:35 UTC
It would be nice to have an option for qcheck to perform a prelink undo step if the MD5 digest fails and if the file is likely to be prelinked, e.g. if its magic bytes indicate an ELF file.
Comment 1 SpanKY gentoo-dev 2011-05-30 02:09:03 UTC
pretty sure portage already takes care of this

at any rate, i dont think invoking prelink blindly is a good idea.  `qcheck` shouldnt modify binaries on a fs.
Comment 2 Martin von Gagern 2011-05-30 15:22:22 UTC
(In reply to comment #1)
> pretty sure portage already takes care of this

portage itself does, when calculating checksums for unmerge, but qcheck doesn't.

$ qcheck sys-devel/flex
Checking sys-devel/flex-2.5.35_p10 ...
 MD5-DIGEST: /usr/bin/flex
  * 36 out of 37 files are good
$ grep ' /usr/bin/flex' /var/db/pkg/sys-devel/flex-2.5.35_p10/CONTENTS 
obj /usr/bin/flex b3083dc0dff357a86020d0c71ce31673 1290164438
$ md5sum /usr/bin/flex
b0a21e96db51fd2f15f693f4693b0bcd  /usr/bin/flex
$ /usr/sbin/prelink -u -o- /usr/bin/flex | md5sum -
b3083dc0dff357a86020d0c71ce31673  -

> at any rate, i dont think invoking prelink blindly is a good idea.  `qcheck`
> shouldnt modify binaries on a fs.

It wouldn't have to; it could undo the prelink to stdout as outlined above.
Comment 3 Martin von Gagern 2011-05-30 17:08:12 UTC
Created attachment 275203 [details, diff]
First patch

This patch does the trick for me. If you don't like the way I've changed the API in md5_sha1_sum.c, feel free to tweak as you see fit. If you want any other adjustments, let me know what bothers you, or adjust things to your liking.
Comment 4 SpanKY gentoo-dev 2011-05-30 17:25:02 UTC
Comment on attachment 275203 [details, diff]
First patch

ok, as long as things can be done without modifying the fs, i'm ok with implementing an optional flag

as for the patch:
 - don't hardcode the path to prelink.  in the setup phase, just run `prelink --verbose` to verify its existence.  or dont bother at all ... i'm ok with that.
 - your "else" style is incorrect.  you want to do: "} else {" on one line.
 - undo_prelink in qcheck_main needs to be indented
 - the bracket on "is_elf" needs to be on a newline
 - the prelink logic ignores the dfd which isnt going to scale.  put it into hash_file() if you can only support the full filename right now.
 - use "%#x" instead of "0x%x" in printf strings
Comment 5 Martin von Gagern 2011-05-30 19:02:08 UTC
Created attachment 275211 [details, diff]
Second patch

(In reply to comment #4)

I've adjusted the patch to conform to your coding style, and fixed the issue about prelink not taking dfd into account.

I've also caught two more issues while running a complete "qcheck --all" on my system:

- Some packages install relocatable objects (*.o), which trigger a prelink error message. We should avoid feeding these to prelink, by checking a few more bytes of magic to ensure we are dealing with an executable or a shared object.

- The debug objects generated from FEATURES=splitdebug will cause prelink to silently exit with a non-zero exit code. In this case we should fall back to hashing the file as it is.

In both cases we should not print our own error message in case prelink returns non-zero, in order to avoid cluttering the output. Prelink should know best what kinds of error it considers worthy of an error message.

If adjusted the patch to deal with these problems as well. So only one point remains from your list:

>  - don't hardcode the path to prelink.  in the setup phase, just run `prelink
> --verbose` to verify its existence.  or dont bother at all ... i'm ok with
> that.

Prelink is in /usr/sbin, which normal users won't have on their path. So I can't do an execlp, unless I accept the fact that normal users on prelinked systems won't be able to run qcheck without jumping through hoops. So in order to call it, I will need the hardcoded path. And if I use it there, then using it in the access call is only consistent.

I'm not sure what "setup phase" you refer to, but it sounds like a compile-time thing, which is really a bad idea unless you want to start declaring USE flags and conditional dependencies for this new feature. I hope you don't.
Comment 6 SpanKY gentoo-dev 2011-05-30 23:13:51 UTC
not searching $PATH by default isnt going to fly.  you've got two options:
 - append /usr/sbin:/sbin to $PATH when undo_prelink is set
 - first execute `prelink` and then fall back to `/usr/sbin/prelink` when it fails

that's not what i meant by setup, but it doesnt matter if you pick one of the two options i listed above.
Comment 7 Nikos Chantziaras 2011-06-15 06:06:47 UTC
I don't think you need to manually undo the actual prelink.  prelink has an "--md5" option:

  --md5      For verify print MD5 sum of original to standard
             output instead of content
Comment 8 Martin von Gagern 2011-06-15 09:38:52 UTC
(In reply to comment #7)
> I don't think you need to manually undo the actual prelink.  prelink has an
> "--md5" option:
> 
>   --md5      For verify print MD5 sum of original to standard
>              output instead of content

That option only works with (and automatically enables) the --verify option, which will cause prelink to not only undo the prelinking but to redo it again. This takes A LOT more time and will cause permission problems when run as a normal user. So I'd rather not do that.
Comment 9 Martin von Gagern 2011-06-15 09:41:44 UTC
Created attachment 277091 [details, diff]
Third patch

Now we search for the prelink binary in the environment PATH first and in /sbin:/usr/sbin second. That way we still get to check the binary for access permissions, and can take the system path into account nevertheless.
Comment 10 Martin von Gagern 2011-06-15 10:16:14 UTC
Created attachment 277097 [details, diff]
Fourth patch

Forgot to tabify my code.
Comment 11 SpanKY gentoo-dev 2011-06-17 05:51:07 UTC
Comment on attachment 277097 [details, diff]
Fourth patch

i'm still not liking this PATH logic.  how about:
 - punt find_sbin_program()
 - change hash_prelink_pipe() to call putenv("PATH=/sbin:/usr/sbin") when execl() fails and errno == ENOENT (or whatever you get when it fails), and then do the execl() once more

some other things:
 - the call to open() needs to use O_CLOEXEC
 - we're already using <elf.h> in qlop.c for __linux__ systems, so we can probably use that here too and put all the prelink logic behind __linux__
 - the partial read logic probably should be factored out into a helper
Comment 12 Martin von Gagern 2011-06-17 08:26:26 UTC
(In reply to comment #11)
> i'm still not liking this PATH logic.

What's your main reason for this dislike? The size of the involved code? It's somewhat unfortunate that there appears to be no glibc function to factor out this path search and return the result without executing it.

> how about:
>  - punt find_sbin_program()
>  - change hash_prelink_pipe() to call putenv("PATH=/sbin:/usr/sbin") when
> execl() fails and errno == ENOENT (or whatever you get when it fails),
> and then do the execl() once more

Main reasons why I don't like this approach is the fact that by switching to execlp, I won't be able to actually see the full path of the binary. It feels somewhat inefficient to have a full PATH search at every execution. Afaik execlp doesn't do caching the way e.g. bash does. So it would really iterate over all items in the PATH for every elf file it wants to check.

I'm also not particularly fond of executing prelink in order to detect its presence or absence, but that's not a real problem.

In the above, you almost certainly want to APPEND to the path, not replace it. After all, other modifications in the future might want to call other programs. And we'd need to store the information that we've appended to the path, in order to avoid adding to the path in every step in case that prelink isn't available (any more).

What I don't like about putenv is that it will affect other calls to exec*p* as well. Not a problem in the current code, as there are none (afaics), but that might change.

> some other things:
>  - the call to open() needs to use O_CLOEXEC

Agreed.

>  - we're already using <elf.h> in qlop.c for __linux__ systems, so we can
> probably use that here too and put all the prelink logic behind __linux__

OK, I could use that to read the header and access its fields. No strong opinion on this, but will implement it for the next iteration of the patch.

>  - the partial read logic probably should be factored out into a helper

Not sure what exactly you would like to see factored, and what the signature should be. As I see things, the partial read is closely interwoven with the pipe call as well as an early close & return.
Comment 13 SpanKY gentoo-dev 2011-06-22 23:13:45 UTC
i generally believe PATH parsing to be unportable and wrong.  the full path to a util should be wholly irrelevant.  has nothing to do with the actual code you've thrown at the issue.

you're right that execlp() will search PATH on every exec, but i'm OK with that since the prelink operation itself is pretty fat.  do you have numbers to show that this is more than system noise relative to time taken by prelink ?

yes, putenv() would blow away the active path, so it'd need to append.  we wouldnt need to track *what* was appended, just that we did an append.

as for the read helper, a lot of projects make something like read_safe() which handles partial reads (EINTR and such).  here's one i wrote for a diff project:
ssize_t read_retry(int fd, void *buf, size_t count)
{
    ssize_t ret = 0, temp_ret;
    while (count > 0) {
        /*
         * clear errno to avoid confusing output with higher layers and
         * short reads (which aren't technically errors)
         */
        errno = 0;
        temp_ret = read(fd, buf, count);
        if (temp_ret > 0) {
            ret += temp_ret;
            buf += temp_ret;
            count -= temp_ret;
        } else if (temp_ret == 0) {
            break;
        } else {
            if (errno == EINTR)
                continue;
            ret = -1;
            break;
        }
    }
    return ret;
}

another option would be to use fopen() and then fread() as fread() itself takes care of the partial read.  and then you could use fileno() to extract the fd.
Comment 14 Martin von Gagern 2011-06-24 13:32:57 UTC
Created attachment 278003 [details, diff]
Fifth patch

OK, here goes. Prelink is now executed via execlp. is_elf uses structures from elf.h. And the read_retry is implemented mostly as suggested.

When you first wrote about "partial read", I had the "read part of the file then rewind" issue in mind, not the "read given number of bytes even if syscall reads only a part of it" scenario. Thanks for clarifying this and suggesting code.

I must confess I'm not sure about all the portability aspects here. On the one hand, we only have elf.h on linux, right? So we will only ever do all this prelink stuff on linux if we rely on that header. On the other hand, you repeatedly raised concerns about portability of my way of executing prelink. If my path walking code works all right on Linux, isn't that enough? I tried to keep portability in mind this time around, up to the point of allowing different searchpath separators. On the other hand I doubt that "%PATH%;/sbin;/usr/sbin" will likely make much sense on any future version of Windows. The same goes for opening "/dev/null". So how far do we want to take portability there?

By the way, I don't have numbers about the performance impact caused by the repeated path search for the prelink binary. Probably not a real issue, and outweighted by system caches and other external factors in any case.
Comment 15 SpanKY gentoo-dev 2011-10-03 00:24:25 UTC
after wasting an embarrassing amount of time figuring out why `qcheck -a` was throwing MD5 errors for an inordinate number of packages (and re-emerging them), i found one which happened to have a binpkg and i realized prelink screwed me.

so i took your latest patch, reworked it a bit more to generalize it further to the point where the hash framework takes a callback to tweak the fd, and the prelink code provides a hash_cb_prelink_undo() helper which takes care of setting up the prelink pipe and returning the stdout side of it.  the prelink code also operates only on the fd's and not the actual filename (otherwise it'd break when using hash_fd_at and the files were all opened relatively).

this will come in handy when (if?) we add prelink support to the qmerge code.  or maybe the next crazy thing where we want to modify the hash stream on the fly.  it also speeds up things a bit because we don't have to wait for prelink to write *all* of its contents into the pipe before we start reading it and hashing the results.  and on some systems, this could be a significant memory burden ...

so please give portage-utils-0.6 a spin and see if it works for you.

http://sources.gentoo.org/gentoo-projects/portage-utils/qcheck.c?r1=1.47&r2=1.48
http://sources.gentoo.org/gentoo-projects/portage-utils/libq/libq.c?r1=1.24&r2=1.25
http://sources.gentoo.org/gentoo-projects/portage-utils/libq/md5_sha1_sum.c?r1=1.7&r2=1.8
http://sources.gentoo.org/gentoo-projects/portage-utils/libq/prelink.c?rev=1.1
Comment 16 Martin von Gagern 2011-10-03 08:11:59 UTC
(In reply to comment #15)
> after wasting an embarrassing amount of time figuring out why `qcheck -a` was
> throwing MD5 errors for an inordinate number of packages (and re-emerging
> them), i found one which happened to have a binpkg and i realized prelink
> screwed me.

I'm somewhat glad that you're more likely to value my patch after this experience. ;-)

> the prelink
> code also operates only on the fd's and not the actual filename (otherwise
> it'd break when using hash_fd_at and the files were all opened relatively).

That looks like a definite improvement. Is /dev/stdin portable enough for this, though?

> this will come in handy when (if?) we add prelink support to the qmerge code. 
> or maybe the next crazy thing where we want to modify the hash stream on the
> fly.

For that you'd want some way to chain callbacks, so that you can have that crazy thing AND prelink at the same time. Which means you'd probably rather pass a NULL-terminated array of callbacks, and do the "fd = cb(fd, filename);" thing for each of its elements. As you wouldn't need a default callback in that case, you might as well allow passing NULL as the callback array, as having to create an empty array feels kind of weird.

>  it also speeds up things a bit because we don't have to wait for prelink
> to write *all* of its contents into the pipe before we start reading it and
> hashing the results.  and on some systems, this could be a significant memory
> burden ...

I don't see my patch doing any of this either. So why exactly do you need the monitor process? Just to implement the fallback to "cat", right? Doubling the number of forked processes just for that seems a bit drastic, even if it does make the code look more elegant in other places.

> so please give portage-utils-0.6 a spin and see if it works for you.

Will do as soon as it hits my regular sync schedule, i.e. probably some time today.
Comment 17 SpanKY gentoo-dev 2011-10-03 16:27:09 UTC
`prelink` is inherently linux-specific, so i only need worry about those systems.  and on linux, all systems should have /dev/stdin.  if they don't, they're broken, and i don't care about them ;).

i might need chained callbacks ... but i'll worry about that when it comes up.

hmm, you're right, yours starts reading the pipe before the call to waitpid().  i misread it the first time around.

i still need the double fork to get correct behavior with some files.  for example, if you run `prelink` on some .debug files, prelink will exit(1) without writing any data.  so the exec() succeeded, which means that grandchild no longer exists, but the grandparent still wants the data.  thus the parent (monitor) process takes care of falling back to `cat` if prelink failed.  this is needed in order to support the "passthru" mode.  i agree that it's a bit undesirable to fork twice, but i prefer that due to the simplicity gained higher up in the code.
Comment 18 Martin von Gagern 2011-10-03 17:14:50 UTC
(In reply to comment #15)
> so please give portage-utils-0.6 a spin and see if it works for you.

It does. Thanks!