You might want to get in touch with Mandriva's devs. They've been working on some of this stuff for the past couple of releases. It might be a good opportunity to collaborate instead of fixing this on our own.
@Diego, I think you might be interested in this bug.
Indeed I am! Thanks Rémi, I'll check this out and see to get in touch with somebody at Mandriva next week :)
(In reply to comment #2)
> Indeed I am! Thanks Rémi, I'll check this out and see to get in touch with
> somebody at Mandriva next week :)
Great. Would you please reassign this bug appropriately, then? I was thinking toolchain@ but I'm not sure.
Copy from bug 246757 (with more sane names):
I am testing packages for their symbol linking in shared libraries. I need to
build some of the packages on a system without the linux like dynamic loading
behavior (so no weak type of dynamic linking where we only look at the symbols
if everything is loaded by any of the different elf files). I
have the situation that a library/program must know where it needs to search
for a symbol.
And they are not only possible --as-needed bugs bug also bad behavior which
needs to be workarounded by other packages. Let's say library libmain is using
library libuseful but doesn't link against it. Now program prog_z wants to use library libmain. So natural way would be that program prog_z links against libmain and everything is fine. But in this situation it must link against libmain and libuseful - even when prog_z is not using libuseful
Now lets think about the problem that libmain noticed that libuseful is completely broken and implements it in a more sane way. But prog_z is still linked against libuseful - now what do you think which symbol will be used for the implementation of previous faulty functions from libuseful? I would bet on murphy and would say that the faulty functions will still be used by prog_z.
Or the other way around when libmain uses the new functions of library libextra but don't link against it. Yes, your binary for prog_z will be instant broken even if the usage of libmain's api should be transparent for prog_z.
What should be done? Relative easy: it must be checked if it is a gentoo-only
problem (patch or misusage of the ebuild system) or if it is in upstream. When
it is in upstream then inform them that they forgot to link their library x
against lib y and show them how they can test it. If they release a patch for it
then integrate a patch for it in your ebuilds.
But their exist of course the possible situation that it was build that way and
must be used in that way for a good reason.
My gut feeling tells me this should be reassigned to QA, but I think this really is something everyone should be involved with. => email@example.com :)
Created attachment 171799 [details]
Example project with broken buildsystem
A build on a normal i386 gentoo system without special setting will run "fine":
cc -fPIC -shared main.c -o libmain.so -Wl,-rpath,/home/test/libtest
cc -fPIC -shared useful.c -o libuseful.so -Wl,-rpath,/home/test/libtest
cc prog_z.c -o prog_z -rdynamic libmain.so libuseful.so -Wl,-rpath,/home/test/libtest
When we simulate stricter linking it will break:
$ make LDFLAGS="-Wl,-z,-defs -Wl,--no-undefined"
cc -fPIC -shared -Wl,-z,-defs -Wl,--no-undefined main.c -o libmain.so -Wl,-rpath,/home/test/libtest
/tmp/ccW1jr9I.o: In function `main_function':
main.c:(.text+0xa): undefined reference to `useful_function'
collect2: ld returned 1 exit status
make: *** [libmain.so] Error 1
Created attachment 171800 [details, diff]
Fix for example project
This small patch fixes the buildsystem of the example project
$ make LDFLAGS="-Wl,-z,-defs -Wl,--no-undefined"
cc -fPIC -shared -Wl,-z,-defs -Wl,--no-undefined useful.c -o libuseful.so -Wl,-rpath,/home/test/libtest
cc -fPIC -shared -Wl,-z,-defs -Wl,--no-undefined main.c -o libmain.so libuseful.so -Wl,-rpath,/home/test/libtest
cc -Wl,-z,-defs -Wl,--no-undefined prog_z.c -o prog_z -rdynamic libmain.so -Wl,-rpath,/home/test/libtest
Robert, is there a way for ld to only _warn_ about undefined references instead of an error?
This way, we could see the full build and all potential undefined refs instead of just the first one.
Could be worth it, what do you think?
Yes, it would be worth. I saw that their are gentoo QA checks for different compiler warnings (like implicit declaration of functions for example). I will check if their is a possible way to make that undefined reference output of ld non-fatal.
As far as I can see, binutils 2.19 does not have it available as a warning; could be worth requesting such a warning upstream.
Sry, but I must correct you. Try
make LDFLAGS="-Wl,--no-undefined -Wl,--warn-unresolved-symbols
with my test project.
It is in binutils since 2.15
Oh I missed the combined use, I did know about --warn-unresolved-symbol (which alone does not warn about this) but I never tried combining the two of them. Quite useful indeed.
Great, it's definitely worth making a QA test for this. And since it needs to be added to the users LDFLAGS, we'll likely not get a flood of bugs :)
Do we ping Zac?
I can add a build like check, but first I need some sample output of what the warning looks like.
Created attachment 172129 [details]
listing of warnings
Only the output?
My test project for example will give following warning:
/tmp/cc4UXMlj.o: In function `main_function':
main.c:(.text+0xa): warning: undefined reference to `useful_function'
Other examples are from the log of alsa-libs and openjpeg
i dont think that's a good idea. having undefined refs in a shared library is not necessarily a bug, and some packages do this on purpose with their plugins or libraries.
actually, an even better case is where a plugin cannot define all references. for example, if a program itself loads a plugin on the fly, and the plugin uses symbols which are only available in the program. theres nothing to link against in order to satisfy those symbol references.
Then perhaps we should do what Mandriva is currently trying to do : set the default symbol visibility to none.
Anyhow, this is a really big QA topic, I'm not sure bugzilla is the best place to discuss all this :)
how is this a "really big QA topic" ? name one package where these undefined references in reality caused a problem. every bug linked here will not show up at runtime (mostly because it's undefined references in plugins, not a shared library itself).
if packages arent actually breaking, then this is merely an educational exercise.
as for visibility, i dont follow. what is this "none" you speak of ? there is no such gcc visibility. plus, some links would be useful ...
The cases where this is going to be a problem are the ones that are going to be reported as --as-needed failures, actually, which includes, for instance, allegro (bug #247256) which I reported today.
When I suggested using --no-undefined, I was sincerely thinking of using it as a way to show upstream the way, to make sure they don't introduce further bugs after fixing --as-needed, rather than an exercise to put in use in the tree.
Certainly, it takes much more source-digging than reporting --as-needed failures since you have to decide on a case by case basis whether the undefined symbols are an issue or not.
For instance just watching a couple of bugs still open here attached, the alsa-lib one (#246715) is certainly not fatal (alsa-lib itself links to libdl); while it would be a very good idea to get upstream to fix their buildsystem so that it actually links to libdl, it's not something we should spend time on, IMHO.
On the other hand bug #246727 shows that openjpeg fails to link to libm, which _is_ an issue, which should be solved; packages linking to openjpeg are most likely to fail when built with forced --as-needed, unless they use libm themselves so it works incidentally.
So yeah, it takes quite more time to identify which failures/warnings are issues and which ones are "false positive", like glibc or zsh.
yes, the openjpeg bug could be problematic. but otherwise, your comment shows that this is indeed a minor issue. i dont consider --as-needed to be a front & center issue that people should be falling over themselves to fix.
(In reply to comment #20)
> how is this a "really big QA topic" ?
I meant "big" as in scope, not as in priority.
As for the visibility, I mean "hidden". That's an interesting goal because it reduces the number of exported symbols and thus reduces run-time linking time. However, this needs to be explicitly supported by libraries. So this is a whole other issue.
yes, forcing visibility globally is broken. if you want to start an initiative to "clean up library symbols", then more power to you as i'm sure there are a ton of people who'd appreciate it, but that is way more work than we have man power for.
Zac: the other thing to keep in mind is that there is no passive message portage can search for. the logs provided show messages that only exist when a link failure occurs. and when said link failure occurs, the emerge fails already. simply outputting a slightly nicer message in that case seems pretty redundant. the only way you'd be able to actively detect things is to do ELF symbol resolution the same was as the linker: scan every ELF for undefined symbols, load and scan every shared object referred to by the DT_NEEDED tag (as well as PT_INTERP), and do this recursively.
in other words, a ton of overhead for negligible return as well as plenty of false positives (which is the worst thing we can do to a developer -- make them research a "bug" which is no bug at all).
No actually there is the passive message (and I also didn't know about it before, and thought could only be a failure message).
But I agree, it's a bit of a problem since it's going to ask people to look for bugs if the warning is there for everybody; and if one has to add the warning flags to its flags or its spec file, then he also can take care of adding a bashrc snipped to find out the packages more quickly.
It seems that you haven't read everything - it doesn't fail. It is a warning
Okay, I'll just remove dev-portage from CC for now and you guys can ping me if you find some sort of QA test that doesn't have lots of false positives.
yes, the flag in question was mentioned before i noticed this bug
while it makes the process simpler, it still doesnt address the last point i raised. if QA team wishes to perform audits and do some research, then great. but dumping bugs that arent bugs on end developers is bad.
certainly a page covering this process would be good. or perhaps integrating it into the current as-needed document. that way we have accepted and documented methods to assist people in the audit procedure and once a potential real bug has been deduced, the page exists to pass knowledge on to the developer and/or maintainer who will be actually fixing / reviewing things.
I'll see to add a paragraph about this to the --as-needed documentation tomorrow then.
(I'm on qa@)