Home | Docs | Forums | Lists | Bugs | Planet | Store | GMN | Get Gentoo!
View Bug Activity | Format For Printing | XML | Clone This Bug
So - for some time I've been working on a nice linker acceleration. This gives the biggest win for dlopened libraries - as used extensively by OO.o / KParts / GNOME etc. some measurements of the previous speedups available here: http://sourceware.org/ml/binutils/2005-10/msg00436.html The current patches are more conservative, but still provide a measurable win. Of course - prelinking provides a better speedup for the non-dlopened case; but if prelinking is not used this will also provide a nice win. Patches are here: http://go-oo.org/~michael/glibc-bdirect-2.diff http://go-oo.org/~michael/binutils-bdirect-2.diff With the current setup, to get maximum usefulness from -Bdirect it's necessary to re-compile much of the system with it since it has a cascading effect from bottom to top. To examine the results: $ readelf -y -W foo.so will show the .direct linkage table. To actually use direct linkage (and aid debugging) you need to $ export LD_BIND_DIRECT=1 It is important *not* to compile some libraries & packages with -Bdirect - glibc would be a good example that extensively uses the ELF interposing features (that -Bdirect overrides) in several areas - eg. pthread_ library selection etc. ie. adding -Wl,-Bdirect to a glibc build itself is likely to completely hose the system ;-) OTOH - lots of other stuff doesn't rely on this; eg. the GNOME stack, things above Qt (in general) and all of OO.o - where the wins are real. It'd be great to get this into wider use - the impact is hopefully small and the potential benefit large & I know Gentoo users love to try new things to get some extra speed :-)
As this seems to be non-harmful to other stuff, could this be implemented through an additional use flag? Also just want to add this would be a big win for OOo on Gentoo, cutting the startup time in half.
Just updated the binutils patch to handle an interesting copy-reloc case I was unaware of making it far more robust. In case anyone wanted it the old version was: http://go-oo.org/~michael/binutils-bdirect-2.old.diff
Did you just read my mind, the post on the mailinglist, or just generally think in a good way? ;) Some questions however: Does this conflict with the prelink solution, or can they nicely coexist? What are the situations you should look for when deciding if something does (not) require -Wl,-Bdirect ? Is it possible to script such a detection? (Please note that I'm not very hot on the internals of the ELF format, though I'm not a stranger to breaking things in order to learn, I prefer not to have that thrust upon unsuspecting users ;) And lastly, what would be a (Workable ;) measure for performance here? What metrics and tests did you use on the original post to the mailinglist?
glibc patch: - elf/dl-close.c: is this a bugfix unrelated to the rest of the patch ? - the .direct/DT_DIRECT stuff wont actually be utilized unless LD_BIND_DIRECT is set right ? so if the user builds their glibc with it .direct support, it wont break anything until they set that env var ? binutils patch: - bfd/elflink.c: second hunk is useless whitespace change i have no problem adding binutils patch to binutils-2.16.1 now since it's harmless w/out the glibc change, but i dont want to keep updating it every few days ...
This looks very useful to hardened, where we avoid prelink (prelink undoes the randomisation of load addresses we go to some effort to obtain).
Spider: thanks for your interest :-) > Does this conflict with the prelink solution, or can they nicely coexist? there should be no conflict here. > What are the situations you should look for when deciding if something does > (not) require -Wl,-Bdirect ? Is it possible to script such a detection? probably not - or at least - perhaps :-) we can write a simple tool that uses the objdump -T <foo.so> output to detect duplicate *defined* symbols between libraries. [ and I guess non-data-object duplicate symbols in apps ;-] to see which libs really genuinely use interposing (some of course may do without meaning to ;-). To see the glibc pthread_ foo just do an objdump -T libc.so | grep pthread_cond & the same for libpthread.so > And lastly, what would be a (Workable ;) measure for performance here? What > metrics and tests did you use on the original post to the mailinglist? My metrics are based on 'speedprof' runs of OO.o startup, also adding 'gettimeofday' calls at top/tail of C++ 'dlopen' syscalls [ which force a ton of relocations to be performed ]. Also based on common sense / investigation of what's going on - there is a paper (which is slightly stale) with some of the background at http://go-oo.org/~michael/OOoStartup.pdf SpanKY: - thanks for your review > glibc patch: > - elf/dl-close.c: is this a bugfix unrelated to the rest of the patch ? Nah - it frees the list of dt_needed libraries we allocate later in the patch to do the direct indirection through; it's an integral part of it. > - the .direct/DT_DIRECT stuff wont actually be utilized unless LD_BIND_DIRECT > is set right ? Correct - reduces the risk etc. > so if the user builds their glibc with it .direct support, it > wont break anything until they set that env var ? Quite right; similarly as you say the binutils change is harmless without the glibc change & the env. var being turned on. So - hopefully the risk is low all around :-) > binutils patch: > - bfd/elflink.c: second hunk is useless whitespace change Yes, sorry - I'm doing some other performance work in there & this stuff tends to drift in. > i have no problem adding binutils patch to binutils-2.16.1 now since it's > harmless w/out the glibc change, but i dont want to keep updating it every > few days ... Sure - well, it's currently in quite a good shape I think, and the above change is rather an abberation; I don't anticipate daily changes ;-) Kevin: yes it's potentially an alternative to prelink - but I'd get shot for suggesting that that is a legitimate use-case by Ulrich/Jakub so ... prelink is wonderful you know ... ;-)
(In reply to comment #5) > This looks very useful to hardened, where we avoid prelink (prelink undoes the > randomisation of load addresses we go to some effort to obtain). For the case of hardened it's my understanding that prelink only has no real advantage. -Wl,-O1 was more effective to reduce startup times for hardened users with only the ELF being slightly larger in size. But I don't think prelink itself makes ASLR any less effective.
(In reply to comment #7) > (In reply to comment #5) > > This looks very useful to hardened, where we avoid prelink (prelink undoes the > > randomisation of load addresses we go to some effort to obtain). > > For the case of hardened it's my understanding that prelink only has no > real advantage. -Wl,-O1 was more effective to reduce startup times for > hardened users with only the ELF being slightly larger in size. > But I don't think prelink itself makes ASLR any less effective. it's true that the randomization in PaX overrides prelinked library bases so but it's easy to allow it by disabling ASLR on the given binary. however i don't understand the benefit of -Bdirect over prelink. in particular, to me it seems that the sole claimed benefit is that it can be applied to dlopen'd libraries whereas prelink cannot. which is not true of course, there's nothing to prevent one from running 'prelink <exe> <list of all libs including those opened via dlopen>'. so what exactly is the problem with prelink/openoffice?
throwing in my 2 cents here, prelink is in my opinion horrible, having to rerun a time consuming process after every update is time consuming. And having something that's easily applied to the whole system would in my opinion be a much cleaner solution.
binutils-2.16.1-r1 now in portage with patch
How does -Wl,-Bdirect compare to -Wl,-O1? Can/Should they be used together?
In your original mail to the binutils list you said that performance actually got worse after patching glibc without recompiling anything with -Bdirect. Is that still the case? >Times in ms to fully loaded: > >Old glibc: 3968, 3978, 3983 Avg: 3980 >Just new glibc: 4224, 4238, 4250 Avg: 4240 [260ms slower - hmm] >all -Bdirected: 2148, 2168, 2215 Avg: 2180 [1800ms faster - 45%]
Pax: > however i don't understand the benefit of -Bdirect over prelink. in > particular, to me it seems that the sole claimed benefit is that it can be > applied to dlopen'd libraries whereas prelink cannot. which is not true of > course, there's nothing to prevent one from running 'prelink <exe> <list of > all libs including those opened via dlopen>'. so what exactly is the > problem with prelink/openoffice? My understanding based on E-mails with Jakub is that prelink is not useful for dlopen situations - and indeed fundamentally cannot cover all such situations - since you can dlopen arbitrary component libraries - which cannot be known at prelink time. neuron: > throwing in my 2 cents here, prelink is in my opinion horrible, Amen SpanKY: thanks so much :-) I'll write a tool next week if I can to detect all genuine uses of interoposing to get a list of packages to exclude from -Wl,-Bdirect. Sebastian: > How does -Wl,-Bdirect compare to -Wl,-O1? Can/Should they be used together? -Wl,-O1 somewhat accelerates the lookup of a symbol in a library's hash table. -Wl,-Bdirect can dramatically reduce the number of those lookups => they are complimentary. Simon: > In your original mail to the binutils list you said that performance actually > got worse after patching glibc without recompiling anything with -Bdirect. > Is that still the case? I havn't re-run measurements since then; however since then I've added some __builtin_expect annotation that should take the bdirect stuff out of the common case (no -Bdirect) code path, so it has less impact; not re-measured it though.
(In reply to comment #13) > Pax: > > > however i don't understand the benefit of -Bdirect over prelink. in > > particular, to me it seems that the sole claimed benefit is that it can be > > applied to dlopen'd libraries whereas prelink cannot. which is not true of > > course, there's nothing to prevent one from running 'prelink <exe> <list of > > all libs including those opened via dlopen>'. so what exactly is the > > problem with prelink/openoffice? > > My understanding based on E-mails with Jakub is that prelink is not useful > for dlopen situations - and indeed fundamentally cannot cover all such > situations - since you can dlopen arbitrary component libraries - which cannot > be known at prelink time. your understanding is correct - as far as the very generic case is concerned where prelink is fed with binaries and is left to figure out all dependent libraries based on DT_NEEDED (as it can't otherwise predict/analyse all execution paths to figure out dlopen usage). however that is not the case with openoffice (or any particular package), you (the developer) know exactly what dependent libraries you can potentially open via dlopen, therefore you can simply tell prelink to take them into account as well. just as an illustration: prelink -c /dev/null -v -n -q /bin/cat /usr/lib/libsctp.so.1.0.2 /lib Laying out 3 libraries in virtual address space 41000000-50000000 Assigned virtual address space slots for libraries: /lib/ld-linux.so.2 41000000-410174f4 /lib/tls/libc.so.6 4101a000-41133c3c /usr/lib/libsctp.so.1.0.2 41136000-4113800c i just picked a random library to prelink it with /bin/cat. as you can see there's no problem in laying it out in the address space. so the question stays: what is the real problem with prelink/openoffice? has anyone tried to prelink it at all (with an explicit library list similarly to the example above)?
(In reply to comment #0) > It is important *not* to compile some libraries & packages with -Bdirect - glibc > would be a good example that extensively uses the ELF interposing features (that > -Bdirect overrides) in several areas - eg. pthread_ library selection etc. ie. > adding -Wl,-Bdirect to a glibc build itself is likely to completely hose the > system ;-) > I know you said you will write a tool to check, but knowing some Gentoo users, the first thing they are going to do, is stick -Bdirect into LDFLAGS and rebuild world. So I would suggest a few things before we actually add the remaining bits to glibc: 1) Get that tool done 2) Maybe the more important one, as it should prevent hosing the system, is get glibc's configure to strip -Bdirect from LDFLAGS.
we also need to filter -Wl,-Bdirect from CFLAGS and CXXFLAGS for glibc.
Perhaps we should adjust 'filter-ldflags' in flag-o-matic to remove linker flags from CFLAGS/CXXFLAGS as well.
So - I have a tool: http://go-oo.org/ooo-build/bin/finterpose that tries to identify genuine uses of interposing - of course, it's hard to detect plugins which don't require interposing and just have 'pam_foo_baa' in lots of modules deliberately. Looking over my /opt/gnome/lib I see no problems with this, runing it on /lib - shows what we know already - that libc can't use -Bdirect and that pam implements a load of plugins. Am running over /usr/lib/*.so now as well ...
Does this mean that things using plugin architectures like xine, vlc, transcode would be a problem for -Bdirect?
flameyes: no on the contrary it's good for apps that use plugins, they get faster. The thing that's hard to detect is whether something is a plugin, an LD_PRELOAD shim or a genuine interposing user :-)
I was wondering about the fact that the script finds the same symbol duplicated in some xine and vlc plugins, not sure if that would have been a problem (not the symbols of the interface but more some duplicate ones internally). Good to hear for me anyway, as always more of multimedia stuff is becoming plugin-based lately ;)
http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl1f?a=view#chapter3-15 might be useful to people trying to get a handle on all this. "Interposition can still be achieved in a direct binding environment, on a per-object basis, if an object is identified as an interposer. Any object loaded using the environment variable LD_PRELOAD or created with the link-editor's -z interpose option, is identified as an interposer. When the runtime linker searches for a directly bound symbol, it first looks in any object identified as an interposer before it looks in the object that supplies the symbol definition." Does the glibc patch honour the above behaviour? If so perhaps we could affirmatively set that for objects that need it rather than filtering "-B direct".
(In reply to comment #22) > http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl1f?a=view#chapter3-15 > might be useful to people trying to get a handle on all this. > > "Interposition can still be achieved in a direct binding environment, on a > per-object basis, if an object is identified as an interposer. Any object loaded > using the environment variable LD_PRELOAD or created with the link-editor's -z > interpose option, is identified as an interposer. When the runtime linker > searches for a directly bound symbol, it first looks in any object identified as > an interposer before it looks in the object that supplies the symbol definition." > > Does the glibc patch honour the above behaviour? If so perhaps we could > affirmatively set that for objects that need it rather than filtering "-B direct". > Err, so is this going to affect sandbox ? Have to admit my elf knowledge is more from a user than implementation POV.
(In reply to comment #17) > Perhaps we should adjust 'filter-ldflags' in flag-o-matic to remove linker flags > from CFLAGS/CXXFLAGS as well. Right, but im just thinking that if we can get the hardasses over from libc-alpha to accept this in time, that it should do the right thing from glibc side at some stage ...
(In reply to comment #23) > Err, so is this going to affect sandbox ? Have to admit my elf knowledge is > more from a user than implementation POV. Well, if the glibc support for '-B direct' doesn't honour the idea that LD_PRELOAD implies "INTERPOSE", it may be necessary to link the sandbox with '-z interpose' (which is just a flag the run-time linker (/lib/ld-linux.so.2) uses to decide whether to do direct binding or not). If the run-time linker doesn't support the INTERPOSE flag either then perhaps it would be a good idea to add that first. Best to wait for Michael to confirm one way or another - I'm not 100% sure I've understood everything properly... If I have, then it's not necessary to avoid '-B direct' on glibc completely, just to set '-z interpose' on whichever libraries are intended to be interposers.
I have compiled a gentoo system from scratch using nxsty
I have compiled a gentoo system from scratch using nxsty´s glibc overlay (http://forums.gentoo.org/viewtopic-t-376943.html) which supports -Bdirect and filters it of course. My system is running really well all programs start up faster, it is a better speed boost than prelink, at least on the load time. I have installed, kde-3.5 xdtv avidemux and lots of other software, openoffice-bin loads a lot faster. No regressions detected. this is my emerge info Portage 2.0.53 (default-linux/amd64/2005.1, gcc-4.0.2-20051110, glibc-2.3.6-r1, 2.6.15-rc5LN64 x86_64) ================================================================= System uname: 2.6.15-rc5LN64 x86_64 AMD Athlon(tm) 64 Processor 2800+ Gentoo Base System version 1.6.13 dev-lang/python: 2.3.5-r2, 2.4.2 sys-apps/sandbox: 1.2.12 sys-devel/autoconf: 2.13, 2.59-r6 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1 sys-devel/binutils: 2.16.1-r1 sys-devel/libtool: 1.5.20 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -march=athlon64 -ftree-loop-ivcanon -pipe -fno-ident" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O2 -march=athlon64 -ftree-loop-ivcanon -pipe -fno-ident -fvisibility-inlines-hidden -fno-enforce-eh-specs" DISTDIR="/mnt/data/distfiles" FEATURES="autoconfig distlocks sandbox sfperms strict" GENTOO_MIRRORS="http://mirror.switch.ch/ftp/mirror/gentoo" LDFLAGS="-Wl,-O1 -Wl,-Bdirect -Wl,-z,now" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="amd64 X a52 aac acml alsa aotuv aqua_theme arts audiofile avi berkdb bitmap-fonts bzip2 cairo cdr crypt cups curl dbus dga dts dv dvb dvd dvdr dvdread eds emboss encode esd exif expat fam ffmpeg firefox foomaticdb fortran gif glut gmp gnome gpm gstreamer gtk gtk2 hal idn ieee1394 imagemagick imlib ipv6 java jpeg kde kdeenablefinal lcms lm_sensors lzw lzw-tiff mad mjpeg mng motif mp3 mpeg ncurses nls nptl nptlonly nvidia offensive ogg oggvorbis opengl optimize oss pam pcre pdflib perl png python qt quicktime readline recode rtc samba scanner sdl spell ssl tcpd threads tiff truetype truetype-fonts type1-fonts udev usb userlocales v4l v4l2 vcd vorbis x264 xine xml2 xpm xv xvid zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LC_ALL, LINGUAS
unless upstream built openoffice-bin with -Bdirect, there should be no speedup with that package
(In reply to comment #26) > My system is running really well all programs start up faster, it is a better > speed boost than prelink, at least on the load time. > I have installed, kde-3.5 xdtv avidemux and lots of other software, > openoffice-bin loads a lot faster. could you show us actual timing info (old vs. new system)?
Hello everyone. I just recompiled my system with -Bdirect and have some positive news to report: Everything on my system compiled without issue and all of the Apps that I normally use load noticibly faster. Rebooting into my system also works fine. I clocked the following openoffice load times: Before -Bdirect: 0m2.530s After -Bdirect: 0m1.153s I know I should clock this one, but Maple10 loads atleast twice as fast than before. It used to take more than 10 seconds to load, now it takes less than five. BTW, all load times are based on at least a second instance of opening an app. Here is my emerge info: Portage 2.0.53 (default-linux/x86/2005.1, gcc-3.4.4, glibc-2.3.6-r1, 2.6.14-archck5 i686) ================================================================= System uname: 2.6.14-archck5 i686 AMD Athlon(tm) 64 Processor 3000+ Gentoo Base System version 1.12.0_pre11 ccache version 2.4 [enabled] dev-lang/python: 2.3.5, 2.4.2 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1 sys-devel/binutils: 2.16.1-r1 sys-devel/libtool: 1.5.20-r1 virtual/os-headers: 2.6.11-r3 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-march=athlon-xp -mtune=athlon-xp -msse3 -m3dnow -pipe -O3 -fweb -frename-registers -fforce-addr -fomit-frame-pointer -ftracer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=athlon-xp -mtune=athlon-xp -msse3 -m3dnow -pipe -O3 -fweb -frename-registers -fforce-addr -fomit-frame-pointer -ftracer -fvisibility-inlines-hidden" DISTDIR="/in/portage" FEATURES="autoconfig ccache distlocks sandbox sfperms strict userpriv usersandbox" GENTOO_MIRRORS="http://gentoo.mirrors.tds.net/gentoo/ http://ccccom.com" LDFLAGS="-Wl,-Bdirect" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 X a52 aac aalib alsa amuled apache2 apm arts audiofile avi berkdb bitmap-fonts bzip2 cdr crypt cups curl divx4linux dvd dvdr dvdread eds emboss encode esd ethereal exif expat fam fame ffmpeg flac foomaticdb fortran gdbm gif glut gmp gnome gpm gstreamer gtk gtk2 idn imagemagick imlib ipv6 java jpeg jpeg2k kde lcms libg++ libwww mad matroska mikmod mjpeg mng motif mozilla mp3 mpeg ncurses nls nptl ogg oggvorbis openal opengl oss pam pcre pdflib perl png python qt quicktime rar readline remote remote-gui sdl spell sse sse2 ssl svga tcltk tcpd theora tiff truetype truetype-fonts type1-fonts udev unicode usb vorbis win32codecs wxwindows xml2 xmms xv xvid zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LC_ALL, LINGUAS Also, this may be relevent: I run a RAID0 config on reiserfs 3.6. I also have overclocked my cpu to 1980 MHz from 1800.
Martin - wrt. the tool - it is http://go-oo.org/ooo-build/bin/finterpose - of course this also requires filter files / suppressions for known plug-ins which don't use interposing but have common entry-point names :-) either way - using this over my system - I've detected a ton of places where
Martin - wrt. the tool - it is http://go-oo.org/ooo-build/bin/finterpose - of course this also requires filter files / suppressions for known plug-ins which don't use interposing but have common entry-point names :-) either way - using this over my system - I've detected a ton of places where þere are potential bugs related to bogus interposing & ~no valid uses outside of glibc => ie. we spend all this CPU time on a linking feature that people use only to create bugs ;-) Kevin - that's a great link; I hadn't seen the -z interpose functionality - that's an interesting solution to the problem. Of course - wrt. glibc it prolly won't help that much in practice - since it's at the bottom of the lib stack anyway. Best just not to -Bdirect it until we have better tools that can flag the few interposed symbols. Andre/Miguel/SpanKY - if Andreas has packaged a recent(ish) ooo-build you'll see that during the configure it checks for -Wl,-Bdirect in the linker & uses it if it's there. Please do check though: 'readelf -y /usr/lib/ooo*/program/libsvx*' if there is actually a '.direct' section. If there is - it would be great if you could do 2 things for me: a) export RTL_LOGFILE=/tmp/startup.nopid ; rm /tmp/$RTL_LOGFILE ; oowriter ; cat $RTL_LOGFILE | grep } | grep 'OpenCli' That should give you a reliable machine measured startup time metric - worth doing 3+ runs & averaging with and without export LD_BIND_DIRECT=1. I'd be interested in the results from that & some idea of Hardware (CPU & cache-size). b) grab my 'go-faster' tool: http://go-oo.org/ooo-build/patches/test/redirect-bdirect.c compile & run that tool over your .so's [ warning - back them up first - that tool can mangle .so in pathalogical cases ], run that over your OO.o .so's and repeat the timings - I'd be most interested in the results. [ it marks _ZThn symbols as non-vague ]
a) biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001375 1 } PERFORMANCE - DesktopOpenClients_Impl() biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001411 1 } PERFORMANCE - DesktopOpenClients_Impl() biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001398 1 } PERFORMANCE - DesktopOpenClients_Impl() With export LD_BIND_DIRECT=1 biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001367 1 } PERFORMANCE - DesktopOpenClients_Impl() biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001353 1 } PERFORMANCE - DesktopOpenClients_Impl() biter@Cago-Ma3oxuct ~ $ sh oowriter-bench.sh 001363 1 } PERFORMANCE - DesktopOpenClients_Impl() My CPU is AMD64 3000+ with 512KB L2 cache. I am overclocking it from 1800 Mhz to 1980. b) redirect-bdirect.c does not compile for me, with this error message: biter@Cago-Ma3oxuct ~ $ g++ redirect-bdirect.c redirect-bdirect.c: In function `int redirect_symbols(bfd*, asection*, bfd_byte*, int (*)(const char*))': redirect-bdirect.c:31: error: invalid conversion from `void*' to `asymbol**' I have not been able to compile nxsty's memcpy.c either, so I might not be a fault of your's, but mine.
it isnt a c++ file so you shouldnt be using `g++` use `gcc`
I did, but its even worse: gcc redirect-bdirect.c /tmp/cc6nozi3.o: In function `main': redirect-bdirect.c:(.text+0x2be): undefined reference to `bfd_openr' redirect-bdirect.c:(.text+0x329): undefined reference to `bfd_check_format_matches' redirect-bdirect.c:(.text+0x383): undefined reference to `bfd_get_section_by_name' redirect-bdirect.c:(.text+0x3c6): undefined reference to `bfd_malloc_and_get_section' redirect-bdirect.c:(.text+0x44e): undefined reference to `bfd_close' collect2: ld returned 1 exit status Anyway, this stuff should not fill up this bug report. You can contact me on irc on freenode oh channel #Ma3oxuct.
Andrey - your with/without numbers are identical within statistical error - which makes me suspect that perhaps you had LD_BIND_DIRECT=1 set anyway for your system; any chance you can repeat with $ unset LD_BIND_DIRECT first ? :-) Also - the version of redirect-bdirect.c I download has this helpful comment at the top: * Compile with: * gcc -Wall -lbfd -o redirect redirect.c I suggest you try using that ;-)
(In reply to comment #30) > Andre/Miguel/SpanKY - if Andreas has packaged a recent(ish) ooo-build you'll see > that during the configure it checks for -Wl,-Bdirect in the linker & uses it if > it's there. The current version in portage is not very up-to-date, as it based on the ooo-build-2.0-branch, for obvious reasons. If you want the latest stuff, use the 2.0.1-RC4 ebuild from my overlay, this is based on ooo-build-src680.143.0: http://dev.gentoo.org/~suka/overlay/
With unset LD_BIND_DIRECT: 001406 1 } PERFORMANCE - DesktopOpenClients_Impl() 001448 1 } PERFORMANCE - DesktopOpenClients_Impl() 001387 1 } PERFORMANCE - DesktopOpenClients_Impl() 001380 1 } PERFORMANCE - DesktopOpenClients_Impl() 001774 1 } PERFORMANCE - DesktopOpenClients_Impl() b) Well it turns out that it was a problem between keyboard and chair after all, and hopefully this is the last time such a problem shall arise. I ran over all .so files with this script (only four had "direct sections") and here are the timings: With LD_BIND_DIRECT unset: 001397 1 } PERFORMANCE - DesktopOpenClients_Impl() 001824 1 } PERFORMANCE - DesktopOpenClients_Impl() 001389 1 } PERFORMANCE - DesktopOpenClients_Impl() 001407 1 } PERFORMANCE - DesktopOpenClients_Impl() 001399 1 } PERFORMANCE - DesktopOpenClients_Impl() with LD_BIND_DIRECT set: 001379 1 } PERFORMANCE - DesktopOpenClients_Impl() 001359 1 } PERFORMANCE - DesktopOpenClients_Impl() 001378 1 } PERFORMANCE - DesktopOpenClients_Impl() 001622 1 } PERFORMANCE - DesktopOpenClients_Impl() Here is the script I used to ./redirect all the files: #!/bin/bash dir=/usr/lib/openoffice/program/*.so for file in $(ls $dir); do ./redirect $file done I am going to start compiling your overlay, Andreas and see the results that I get then (without -Bdirect and then with -Bdirect; hopefully this will work by just unsetting LDFLAGS before compiling ooffice?).
Andrey - there is no speedup there as you can see :-) clearly without compiling with -Wl,-Bdirect you'll not get any win. However - I've not seen OO.o start so fast ever before :-) and given that the faster the CPU - the greater the proportion of the time spent linking - when you get it to work, it should give a nice win. Your redirect script looks great - but will have no effect until there is a .direct section there.
Micheal, I have determined that my ooffice build is not new enough to utilize -bdirect. But I still want to test -bdirect and provide you with, hopefully the results that you are looking for. I first need to compile the latest openoffice, since it is what can utilize -bdirect. Andreas, your openoffice overlay does not work for me: emerge -pv openoffice These are the packages that I would merge, in order: Calculating dependencies ...done! [ebuild U ] app-office/openoffice-2.0.1_rc4 [2.0.0] +curl +eds +gnome +gtk +java -kde -ldap +mozilla +xml2 +zlib 0 kB [1] It fails with the following error: ------------------------------ Making: ../../../unxlngi6.pro/slo/webdavprovider.obj g++ -Wreturn-type -fmessage-length=0 -c -I. -I. -I../inc -I../../../inc -I../../../unx/inc -I../../../unxlngi6.pro/inc -I. -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solver/680/unxlngi6.pro/inc/stl -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solver/680/unxlngi6.pro/inc/external -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solver/680/unxlngi6.pro/inc -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solenv/unxlngi6/inc -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solenv/inc -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/res -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solver/680/unxlngi6.pro/inc/stl -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solenv/inc/Xp31 -I/opt/blackdown-jdk-1.4.2.03/include -I/opt/blackdown-jdk-1.4.2.03/include/linux -I/opt/blackdown-jdk-1.4.2.03/include/native_threads/include -I/usr/include -I. -I../../../res -I. -Os -fno-strict-aliasing -Wuninitialized -I/usr/include/libxml2 -I/var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/solver/680/unxlngi6.pro/inc/external/neon -I/usr/include/libxml2 -pipe -march=athlon-xp -mtune=athlon-xp -msse3 -m3dnow -pipe -O2 -fweb -frename-registers -fforce-addr -fno-strict-aliasing -Wno-ctor-dtor-privacy -fvisibility-inlines-hidden -fexceptions -fno-enforce-eh-specs -fpic -DLINUX -DUNX -DVCL -DGCC -DC341 -DINTEL -DGXX_INCLUDE_PATH=/usr/lib/gcc/i686-pc-linux-gnu/3.4.5/include/g++-v3 -DCVER=C341 -D_USE_NAMESPACE -DNPTL -DGLIBC=2 -DX86 -D_PTHREADS -D_REENTRANT -DNEW_SOLAR -D_USE_NAMESPACE=1 -DSTLPORT_VERSION=400 -DHAVE_GCC_VISIBILITY_FEATURE -D__DMAKE -DUNIX -DCPPU_ENV=gcc3 -DSUPD=680 -DPRODUCT -DNDEBUG -DPRODUCT_FULL -DOSL_DEBUG_LEVEL=0 -DOPTIMIZE -DEXCEPTIONS_ON -DCUI -DSOLAR_JAVA -DSRC680 -DSHAREDLIB -D_DLL_ -DMULTITHREAD -o ../../../unxlngi6.pro/slo/webdavprovider.o /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx In file included from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:42, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVSession.hxx:56, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVResourceAccess.hxx:72, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavcontent.hxx:58, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx:50: /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:40:47: neon/ne_session.h: No such file or directory /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:43:44: neon/ne_utils.h: No such file or directory /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:46:57: neon/ne_basic.h: No such file or directory /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:49:66: neon/ne_props.h: No such file or directory In file included from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:42, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVSession.hxx:56, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVResourceAccess.hxx:72, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavcontent.hxx:58, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx:50: /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:52: error: `ne_session' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:53: error: `ne_status' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:54: error: `ne_server_capabilities' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:56: error: `ne_propname' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonTypes.hxx:57: error: `ne_prop_result_set' does not name a type In file included from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVSession.hxx:56, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVResourceAccess.hxx:72, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavcontent.hxx:58, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx:50: /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:64: error: `NeonPropName' has not been declared /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:64: error: ISO C++ forbids declaration of `rName' with no type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:69: error: expected `,' or `...' before '&' token /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVProperties.hxx:69: error: ISO C++ forbids declaration of `NeonPropName' with no type In file included from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVResourceAccess.hxx:81, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavcontent.hxx:58, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx:50: /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:39:25: neon/ne_uri.h: No such file or directory In file included from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/DAVResourceAccess.hxx:81, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavcontent.hxx:58, from /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/webdavprovider.cxx:50: /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:70: error: `ne_uri' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:71: error: `ne_uri' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:72: error: `ne_uri' does not name a type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:74: error: expected `,' or `...' before '*' token /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:74: error: ISO C++ forbids declaration of `ne_uri' with no type /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:79: error: expected `,' or `...' before '*' token /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav/NeonUri.hxx:79: error: ISO C++ forbids declaration of `ne_uri' with no type dmake: Error code 1, while making '../../../unxlngi6.pro/slo/webdavprovider.obj' '---* tg_merge.mk *---' ERROR: Error 65280 occurred while making /var/tmp/portage/openoffice-2.0.1_rc4/work/ooo-build-src680.143.0/build/src680-m145/ucb/source/ucp/webdav make: *** [stamp/build] Error 1 !!! ERROR: app-office/openoffice-2.0.1_rc4 failed. !!! Function src_compile, Line 205, Exitcode 2 !!! Build failed !!! If you need support, post the topmost build error, NOT this status message.
opening up an gnome-terminal having two tabs: one with --- $ export | grep LD declare -x LD_BIND_DIRECT="1" --- and one without gives me a nice diffrence for evolution--force-shutdown: The one without -Bdirect enabled gives the standard output about things getting shutted down, however the other one gives me --- 19333: broken: off end of map 0x1b8aa --- where the last address differs but nothing else. This happends with a lot of programs after a emerge world -ev. Gentoo Base System version 1.12.0_pre11 Portage 2.0.53 (default-linux/x86/2005.1, gcc-4.0.2, glibc-2.3.6-r1, 2.6.14-gentoo-r4 i686) ================================================================= System uname: 2.6.14-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz dev-lang/python: 2.4.2 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1 sys-devel/binutils: 2.16.1-r1 sys-devel/libtool: 1.5.20 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-march=pentium4 -pipe -O2 -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /opt/glftpd/etc /opt/glftpd/ftp-data /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/splash /etc/terminfo /etc/env.d" CXXFLAGS="-march=pentium4 -pipe -O2 -fomit-frame-pointer -fvisibility-inlines-hidden -fno-enforce-eh-specs" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distlocks sandbox sfperms strict" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" LC_ALL="sv_SE.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--sort-common -z combreloc -Wl,--enable-new-dtags -Wl,-Bdirect" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/overlays/portage /usr/local/overlays/gentopia" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 X a52 aac acpi alsa asf audiofile avi bash bash-completion berkdb bitmap-fonts bmp browserplugin bzip2 cairo cdr crypt cups curl dbus dri dts dvd dvdr eds emboss encode esd evo exif expat fam firefox flac fortran freetype gcj gd gdbm gif gimpprint glitz glut gmp gnome gpm gstreamer gtk gtk2 gtkhtml gxl hal howl idn imagemagick imlib ipv6 java jikes joystick jpeg lcms libg++ libwww lm_sensors mad matroska mikmod mmx mng mono moznocompose moznoirc moznomail mp3 mpeg ncurses network nls nptl nptlonly ntp nvidia offensive ogg oggvorbis openal opengl pam pcre pdflib perl pic png ppds python quicktime readline real recode rtc sdl smp sox spell sqlite sse sse2 ssl svg symlink tcltk tcpd tetex theora threads tiff truetype truetype-fonts type1-fonts udev unicode usb userlocales utf8 vorbis win32codecs wxwindows xinetd xml2 xosd xprint xscreensaver xv xvid zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LINGUAS
Created an attachment (id=74762) [edit] pkglist.txt question do I need to apply this patch to build ooo with bdirect support ? http://www.go-oo.org/patches/src680/speed-bdirect.diff I've tryed to rebuild a whole system in a chroot with Bdirect and without that patch but with no much luck. readelf -y /usr/lib/openoffice/program/libsvx680li.so <nil /> readelf -y /usr/lib/libIDL-2.so [261: rows ...] confirmed from the result below: startup.1.nopid:004086 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.1nobd.nopid:004081 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.2.nopid:004136 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.2nobd.nopid:004091 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.3.nopid:004147 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.3nobd.nopid:004116 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.4.nopid:004126 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.4nobd.nopid:004079 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.5.nopid:004090 1 } PERFORMANCE - DesktopOpenClients_Impl() startup.5nobd.nopid:004148 1 } PERFORMANCE - DesktopOpenClients_Impl() if someone is interested the attach contain the list of packages of the system, the rows starting with "0" are compiled with no LDFLAGS <code> rm pkglist.txt for i in $(ls -d /var/db/pkg/*/*) ; do echo "$(bzcat $i/environment.bz2 | grep -c Bdirect) | $i" \ >> pkglist.txt done sort -n --output=pkglist.txt pkglist.txt </code>
If the questions about prelink vs -Bdirect are still valid, I think I can elaborate more on some of the parts of it, as I tried to analyze their effect on KDE: Strictly technically speaking, in an ideal world there would be absolutely no benefit of -Bdirect over prelink. Both prelink and -Bdirect are approaches reducing symbol lookups for relocations, in different ways (which means they also should be able to perfectly coexist, prelink taking precedence). Prelink tries to avoid relocations altogether by analysing the binary and all its dependencies and modifying them to be put in an exact place in the address space. -Bdirect does the analysis at link time, so for every library it can find symbols only in its dependencies, which breaks interposing. That's a deliberate design decision, as for majority of symbols no interposing is needed and for the rest -Bdirect can be selectively disabled. Since prelink knows the whole situation, it can avoid symbol lookup processing altogether, it only has to fix symbols that point to different places when a library is used by different binaries (i.e. conflicts resulting from interposing). -Bdirect still does symbol lookups, but it knows in which library to start for every symbol, so for every symbol there should be lookup done only in one library instead of searching all of them until the symbol is found. The CPU cost in prelink's case reduces to almost zero, while with -Bdirect its significantly reduced (roughly said, let's assume there are 50 libraries used and every symbol is found after searching 25 of them on average in the normal case, then -Bdirect reduces the time spend relocating to 1/25=4%). Beside the CPU cost there's also memory cost. Relocation processing modifies the binary in memory, which causes copy-on-write (COW) of pages where the results of such modifications are written. This can easily result in a megabyte of memory per process wasted by this. While -Bdirect doesn't avoid it, since prelink already modifies binaries it theoretically could avoid this completely. So much for theory. I hope there aren't any significant omissions or mistakes (I'm sure Michal could describe -Bdirect better :) ). In practice however there are various issues with prelink: - One has to run the prelink tool (which shouldn't happen that often for the normal user, but anyway). - It increases disk fragmentation, resulting in worse I/O (and there's no usable defrag tool). - The memory savings aren't that great - since in practice there are always conflicts that prelink has to fix, and since there are usually many conflicts with applications with extensive relocation processing, currently prelink only somewhat reduces the memory impact of relocations. I have some numbers at http://ktown.kde.org/~seli/performance/prelink_vs_bdirect.txt describing the effect on KDE (it's cut from a larger text so some parts may feel slightly out of place). This could be improved in gcc/prelink/applications but I have no idea how much effort that would require. - Prelink doesn't work with dlopen. However comment #14 is correct here, it could. It could find proper address slots for all libraries, maybe with some help from the applications. The only other reason why prelink currently doesn't work with dlopen I know of is that since in the dlopen case prelink cannot know the whole situation like when prelinking binaries, it cannot solve binding of symbols from the DSO to the binary (i.e. interposing more or less). But interposing needs special care with -Bdirect anyway, so the same care could be taken for prelinking (in fact, since prelink does the analysis later than -Bdirect, it should be slightly better). In short, prelinked dlopen could work roughly like this: There would be a special flag for the linker to mark some DSOs at link time as suitable for this (the same way one has to use -Wl,-Bdirect). Prelink would find proper address slots even for these DSOs and would prelink them as they are. When doing a dlopen of such DSO, instead of doing normal relocation processing the prelinked processing would be done. Interposing problems could be detected the same way like with -Bdirect and could be recorded as special conflicts (they'd actually require symbol lookup, so -Bdirect would be useful here even with prelink). I have patched my glibc to turn off relocation processing altogether for dlopen of prelinked DSOs, thus more or less faking prelink for dlopen, and KDE was usable with it (after fixing some DSOs with undefined symbols). "Usable" as in "it started up without crashing", I didn't do extensive checking but I expect I would have run into some interposing problems soon. To somewhat sum it up: In practice, at the present state of things, -Bdirect is about as good as prelink, depending on which benefits one wants to trade for which disadvantages. For KDE it currently might be simpler to go with -Bdirect and still keep kdeinit. Most of prelink's problems seem to be fixable, but given that prelink is a more complex solution I have no idea how much effort that would require. While I think prelink should be the right solution in the ideal world, -Bdirect can be simply a practical solution for this reality (or of course they could be combined).
With this additional information it more and more seems that interposing is not that good an idea. After all, it has security problems, it has namespace problems, it has speed problems, etc. What are the actual benefits of interposing except for hacks related to LD_PRELOAD. It does give power by allowing for example the gentoo sandbox to work, but is it really what one wants? One example of interposing breaking things is related to berkeley db. This package has many "incompatible" versions. It is also often included as a dependency of libraries like apr, openldap and others. If I would like to write an application that uses two different libraries that link to two different db versions, that creates problems when linking. The libraries are not going to play nice with eachother (in short things break) because the symbol names are duplicated. This is especially problematic for a source distribution, where this makes stability hard to achieve in an ever changing user environment. From that perspective I think that direct linking is a great advantage that would solve real bugs. (And at the same time speed things up).
that's really not a question for us to handle it's in the ELF spec so if you want information on why the ELF designers thought it was a good idea, then you should read the ELF specs: http://www.uclibc.org/cgi-bin/viewcvs.cgi/trunk/docs/?rev=12948 and the lists where the spec was designed at any rate, ive found LD_PRELOAD to be quite useful as a debug mechanism
IMO whilst interposing is useful, it is definitely something that shouldn't be allowed willy-nilly on a production system (similarly LD_DEBUG). Personally I've always thought LD_PRELOAD a dangerous facility; note it already has restrictions builtin to protect suid binaries. As I mentioned in comment #22, Solaris overrides direct linkage for libraries loaded via LD_PRELOAD and for those marked with the INTERPOSE flag, which seems like a good idea. This would give us the best of both worlds - direct linkage without loss of flexibility. On hardened we could even insist on interposers having the INTERPOSE flag set and skip the LD_PRELOAD handling so that the sysadmin can control what can be preloaded (by controlling which libraries have th interpose flag set). The loader just needs to search identified interposers before processing direct linkages.
Grief bugzilla gets a little unwieldy with a lot of comments - clearly we need some threading mechanism ;-) Peter: wrt. evolution & a warning - any warning like that is not a good sign :-) it's possible it's a bogus <= in the comparison there, though using the 1st entry of the map is highly unlikely. Can you generate a log with LD_DEBUG=all for the failing case & attach it (bzipped) ? Evolution in fact does some funky thinks - wrt. included static versions of this & that so, perhaps this causes problems. Francesco: yes you need the speed-bdirect.diff patch applied - quite right. Lubos: thanks for your write-up, I learned even more about prelink. Paul: yes interposing is almost never a good idea - however it's easy to support LD_PRELOAD without supporting full interposing - we just don't do it yet ;-) I keep meaning to understand the l_searchlist generation well enough to do that efficiently, but I'm beggining to think we should just dup the preload list per map & have done with it ;-) simple & reliable. I can hack that up later this week perhaps. SpanKY: of course we need to support LD_PRELOAD, but -Bdirect turns interposing off - & whatever the designers say it's extremely badly performing and almost always mis-used & a source of bugs - so my tooling says [ and I havn't started examining binaries ;-]. Kevin: you're quite right - of course, to emulate Solaris' -z interpose behavior we need to have a separate, global interposers list which we currently don't have, (partly perhaps because of the library dependency issue there), but quite doable.
Created an attachment (id=75104) [edit] output Michael: As said before: it is not only evolution, almost every command does this, including nano, su, and so on. Here is however the output of "evolution --force-shutdown && LD_DEBUG=all evolution --force-shutdown > Desktop/ld_debug_evolution 2>&1 " bzipped. If there is something else you like, just shout.
Peter - this is very odd: 28169: relocation processing: /usr/lib/libcamel-1.2.so.0 (lazy) 28169: dynamic symbol index 200 from '/usr/lib/libcamel-1.2.so.0' for camel_charmap base direct 0x428d8 start 0x453aa000 28169: broken: off end of map 0x42a68 is printed from: direct = D_PTR (undef_map, l_info[VERSYMIDX(DT_DIRECT)]); ... idx = *ref - symtab; if (__builtin_expect ((GLRO(dl_debug_mask) & DL_DEBUG_DIRECT) != 0, 0)) _dl_debug_printf ("dynamic symbol index %u from '%s' for %s base direct 0x%x start 0x%x\n", idx, undef_map->l_name ? undef_map->l_name : "<noname>", undef_name ? undef_name : "<undef>", (int) direct, (int) undef_map->l_map_start); direct += idx * 2; if (direct >= undef_map->l_map_end || direct <= undef_map->l_map_start) _dl_debug_printf ("broken: off end of map 0x%x\n", (int) direct); which looks entirely proper. It *looks* as if the l_info is not being translated correctly on load - which is rather strange - that is done by this fragment: --- glibc-pristine/elf/dynamic-link.h 2005-11-17 17:48:13.000000000 +0000 +++ glibc-2.3/elf/dynamic-link.h 2005-10-19 21:01:06.000000000 +0100 @@ -94,6 +94,7 @@ ADJUST_DYN_INFO (DT_PLTGOT); ADJUST_DYN_INFO (DT_STRTAB); ADJUST_DYN_INFO (DT_SYMTAB); + ADJUST_DYN_INFO (VERSYMIDX(DT_DIRECT)); # if ! ELF_MACHINE_NO_RELA ADJUST_DYN_INFO (DT_RELA); # endif Are you certain that part of the patch applied correctly ?
I am using nxsty's overlay from http://forums.gentoo.org/viewtopic-t-376943-postdays-0-postorder-asc-start-0.html and as far as I know I am the only one from that thread experience this problem. binutils-2.15.1-r1 from portage. I will try refetch his overlay and rememrge to see if it maybe fixes it, but i do not think he has done any update on it since I last did that.
(In reply to comment #47) > > Are you certain that part of the patch applied correctly ? > The patch didn't apply to gentoo's glibc 2.3.6-r1 so I had to resync it (this is the glibc overlay Peter is talking about). I might have done something wrong, but the part you are talkning about is included. Here is the resynced patch: http://snigel.no-ip.com/~nxsty/linux/3000_all_glibc-2.3.6-r1-bdirect-2.patch Does it look ok?
I just have to retest, but I think there are at least two apps that doesn't work well after setting export LD_BIND_DIRECT=1 at boot: - Kopete can't find its internal plugins. I cannot connect to any IM account. - Konqueror cannot load the "cookie manager". I think it's a plugin too, so it could be related to at least some of the kde plugins. If anyone needs for me to do some tests... P.S.: The entire recompile was done without the LD_BIND_DIRECT set. I have just set it today, just in case that matters. Cheers.
yes, I rebooted without export LD_BIND_DIRECT=1 and all is working ok (at least that 2 things I have seen it were not working). Any tip?
Luis - this is interesting; my guess is that perhaps there is some child shlib that defines the plugin entry points causing some problem. Either way - I'd love it if you could send me the output from: LD_DEBUG=symbols:direct kopete [ that is if you can stop it using the kdeinit thing ]. With and without LD_BIND_DIRECT set - and then it'll be easy to diagnose the problem.
Hi Michal, how could I send you the output? I think hundreds and hundreds of lines are printed to screen... Thanks and cheers.
By the way, I would need to boot entire KDE with export LD_BIND_DIRECT=1 because running kopete with it, makes it work. I think the problem is with some libraries that "links" kde to kopete. But anyways, tells me a good way to take you a log, because it prints literally _a lot_ of text.