Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 273279 - Gdb fails to access debug symbols (for libxul.so in galeon & libglib.so in firefox).
Summary: Gdb fails to access debug symbols (for libxul.so in galeon & libglib.so in fi...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Development (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Mozilla Gentoo Team
URL: N/A
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-09 01:42 UTC by Robert Bradbury
Modified: 2010-12-29 04:12 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info output (emrginfo.lst,3.84 KB, text/plain)
2009-06-09 01:45 UTC, Robert Bradbury
Details
Backtrace (thread apply all bt) for galeon showing missing symbols (btall.lst,20.37 KB, text/plain)
2009-06-09 01:53 UTC, Robert Bradbury
Details
Example of C program calling read() and poll() w/ gdb trace (example1.c,2.17 KB, text/plain)
2009-07-21 11:27 UTC, Robert Bradbury
Details
Gdb trace (edited) showing lack of arguments in back traces (gdb.args.lst,44.56 KB, text/plain)
2009-07-21 11:39 UTC, Robert Bradbury
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Bradbury 2009-06-09 01:42:12 UTC
Over the last month or so I've been switching my system over to a state where the default system libraries and error prone programs (e.g. firefox, galeon, etc.) have full debugging capabilities.  So when these programs and the required libraries are emerged, I use -ggdb for CFLAGS & CXXFLAGS and the debug files are placed on a non-critical disk symlinked to /usr/lib/debug.  This generally works fine (and my gracious thanks and doffed hat to the people who made this possible).

Except (there are are always one or two aren't there...) there seems to be a problem with getting symbols (files, line numbers, argument names, etc.) for a subset of the libraries required.

When debugging firefox or seamonkey, I can't appear to get symbols for libglib.so.  When debugging galeon I can't get symbols for libxul.so (which is kind of strange since emerge reports galeon was built (debug -seamonkey -xulrunner).  The debug files are there:
-rw-r--r-- 1 root root 343584 May 12 12:33 /usr/lib/debug/usr/bin/galeon.debug
-rw-r--r-- 1 root root 6187897 May 12 08:21 /usr/lib/debug/usr/lib/xulrunner-1.9/libxul.so.debug

And gdb "thinks" it loaded the symbols, e.g.
Reading symbols from /usr/lib/xulrunner-1.9/libxul.so...Reading symbols from /root3/usr/lib/debug/usr/lib/xulrunner-1.9/libxul.so.debug...done.
done.
Loaded symbols for /usr/lib/xulrunner-1.9/libxul.so
(in the middle of loading at least 149 shared libraries -- and one wonders why programs take so long to start!!)

For almost *all* of these shared libraries the gdb backtraces print what one would expect.  But for any functions in libxul.so zip.  The same applies to libglib.so functions when debugging firefox.

I've tried playing around with info sharedlibrary and add-symbol-file but these appear to have no effect.  I did notice that the loading of libxul.so symbols in galeon takes place after some additional threads have been started but don't think this should make a difference.  I don't think that is the case with firefox.

Reproducible: Always

Steps to Reproduce:
1. Emerge galeon (or firefox) and associated system libraries with -ggdb flags.
2. Run these programs under gdb and set a breakpoint @ main().
3. When main() is reached set a breakpoint in an often used function, poll() is a good example (or you can set the poll() breakpoint at startup taking into account that glibc isn't loaded.
4. Wait until a poll() is hit; or run the program for a while and then signal it with a "kill -13 galeon-PID-number" and do a thread apply all bt.

Actual Results:  
One gets a mixed bag of functions with and functions without symbols.  In the example I'm looking at it appears that libxul, libglib, libbonobo are missing symbols in spite of the fact that the debug libraries all exist.

I've tried linking the "version" named debug files to the generic versions, e.g.
/usr/lib/debug/usr/lib/libglib-2.0.so.0.2000.2.debug links to
/usr/lib/debug/usr/lib/libglib-2.0.so.0.debug
but that doesn't appear do anything.

Gdb acts as if it doesn't have any idea about certain library symbols.  For example, gpoll.c:g_poll() calls poll() but I can't list gpoll.c.

Expected Results:  
Gdb should locate and provide access to all symbols.

See attachments.
Comment 1 Robert Bradbury 2009-06-09 01:45:12 UTC
Created attachment 193956 [details]
emerge --info output

Emerge --info for gdb/galeon/firefox system missing sumbols
Comment 2 Robert Bradbury 2009-06-09 01:53:05 UTC
Created attachment 193958 [details]
Backtrace (thread apply all bt) for galeon showing missing symbols

This is a thread apply all bt for galeon interupted during a relatively large "restart" which was in an endless non-productive CPU loop (galeon, firefox and opera all are problematic with older and/or larger "restarts" involving some variabilities of max-ing out the CPU and/or IP I/O and their use of the poll() calls and when timeouts on various descriptors takes place).  This backtrace resulted from interrupting galeon with a SIGPIPE signal.  Similar results are obtained if one sets a breakpoint() at poll() proceeds through the first few simple examples of poll() being called and gets up to the point where there are multiple threads (doing DNS lookups?) running and one is attempting to poll() 10+ fds (sockets, pipes & files).
Comment 3 SpanKY gentoo-dev 2009-07-05 19:35:11 UTC
post the build log for the packages in question.  most likely they arent respecting the debug flag and instead are stripping debug information.  gdb is most likely correct -- it is reading the symbols *that are available* in the split debug file.
Comment 4 Robert Bradbury 2009-07-21 11:22:10 UTC
There are two separate bugs here.
1) Lack  symbol information from specific libraries (e.g. libgtk-2.0, libgdk-2.0, libxul and some others).  The packages gtk+ and mozilla-firefox are two that I'm dealing with right now.  They *do* however produce "local" debug information, for example the difference between a "nm" and a "nm --extern-only" on /usr/lib/debug/usr/lib/libgtk-x11-2.0.so.0.1600.1.debug is 18,000+ symbols.  And an "info shared" in gdb indicates that it has the symbols for "/usr/lib/libgtk-x11-2.0.so.0" which is symlinked to libgtk-x11-2.0.so.1600.1.  Gdb has the global symbols which is what you would expect but is missing the line number and the arguments symbols.  This does not apply to all shared libraries which is the strange part.

2) Building a glibc with debug symbols fails to create argument symbols for some functions.  For example poll() gets symbols while read() does not.  This is a library build problem presumably. I'm relatively convinced that this involves the use of strong_alias / weak_alias function name mapping (which may or may not imply saving argument symbol names & # for the aliased functions.

Attachments forthcoming.
Comment 5 Robert Bradbury 2009-07-21 11:27:48 UTC
Created attachment 198694 [details]
Example of C program calling read() and poll() w/ gdb trace

Program and trace thereof showing that in gdb with glibc functions one can get valid traces for some system calls (e.g. poll) and not for others (e.g. read).

Try it on your system with a glibc built for debugging.
Comment 6 Robert Bradbury 2009-07-21 11:39:30 UTC
Created attachment 198696 [details]
Gdb trace (edited) showing lack of arguments in back traces

The back traces (bt) show the mix of functions (most of which are from shared libraries (most of which are compiled for debugging), some of which provide argument lists (or function line numbers), some of which do not.  The only explanation that I have is a gdb bug.  I don't know if there is a way to pass an argument to "nm" to confirm that the line numbers & argument symbols are indeed in the *.debug libraries.  The "info shared" output seems to suggest the symbols are present.
Comment 7 Hugo Mildenberger 2010-07-30 13:50:17 UTC
While trying to analyze core dumps generated from xserver-1.8.0.901 using gdb-7.1, I observed that bypassing the splitdebug machinery by copying over libglx.so from portage's build directory to /usr/lib/opengl/xorg-x11/extensions/libglx.so brought back line numbers and source file information. I'm therefore suspecting another problem in elfutils/debugedit (like the one fixed in bug #288977).
Comment 8 Hugo Mildenberger 2010-08-07 09:21:59 UTC
While analysing an x11-base/xorg-server-1.8.2 core file for an opengl related segfault, I think I figured out a part of the problem:

# gdb $(which X) X-6362-6-1281112323 
GNU gdb (Gentoo 7.1 p1) 7.1 
[...]
(gdb) bt
[...]
#9  <signal handler called>
#10 0x4ebc8f46 in ?? () from /usr/lib/xorg/modules/extensions/libglx.so
#11 0x1102dbef in FreeClientResources (client=0x11421bf0) at resource.c:818
#12 0x11009fad in CloseDownClient (client=0x11421bf0) at dispatch.c:3631
#13 0x1100f071 in KillAllClients () at dispatch.c:3655
#14 Dispatch () at dispatch.c:468
#15 0x1100345a in main (argc=10, argv=0x5aecaee4, envp=0x5aecaf10) at main.c:286

==> no symbols for libglx.so

(gdb) info shared 
[...]
0x4efa0d90  0x4efa364c  Yes       /usr/lib/xorg/modules/extensions/libdbe.so
0x4ec1c9c0  0x4ec21d74  Yes (*)   /usr/lib/xorg/modules/extensions/libdri.so
0x4ebe9000  0x4ebee124  Yes       /usr/lib/libdrm.so.2
0x4ec15320  0x4ec17d54  Yes (*)   /usr/lib/xorg/modules/extensions/libdri2.so
0x4eb8f500  0x4ebd5520  Yes (*)   /usr/lib/xorg/modules/extensions/libglx.so
0x4ec08570  0x4ec10d5c  Yes       /usr/lib/xorg/modules/input/synaptics_drv.so
[...]
(*): Shared library is missing debugging information


--> except for libdri.so, libdri2.so libglx.so, all symbols are available. But these three shared objects are those which are shuffled around using symlinks by "eselect opengl set x11-xorg":

# ls /usr/lib/xorg/modules/extensions -l
-rwxr-xr-x 1 root root 17896  6. Aug 20:59 libdbe.so
lrwxrwxrwx 1 root root    46  6. Aug 21:00 libdri2.so -> 
             ../../../opengl/xorg-x11/extensions/libdri2.so
lrwxrwxrwx 1 root root    45  6. Aug 21:00 libdri.so 
             -> ../../../opengl/xorg-x11/extensions/libdri.so
-rwxr-xr-x 1 root root 96440  6. Aug 20:59 libextmod.so
lrwxrwxrwx 1 root root    45  6. Aug 21:00 libglx.so -> 
             ../../../opengl/xorg-x11/extensions/libglx.so
-rwxr-xr-x 1 root root 26132  6. Aug 20:59 librecord.so
 

Looking into the equivalent debug symbol path in /usr/src/debug:

# ls -l /usr/src/debug/usr/lib/xorg/modules/extensions
-rw-r--r-- 1 root root  67112  6. Aug 20:59 libdbe.so.debug
-rw-r--r-- 1 root root 277363  6. Aug 20:59 libextmod.so.debug
-rw-r--r-- 1 root root  71076  6. Aug 20:59 librecord.so.debug

So, where are libglx.so.debug, libdri.debug and libdri2.debug? Well, here:

# ls -l /usr/src/debug/usr/lib/opengl/xorg-x11/extensions/ 
-rw-r--r-- 1 root root   69468  6. Aug 20:59 libdri2.so.debug
-rw-r--r-- 1 root root  111270  6. Aug 20:59 libdri.so.debug
-rw-r--r-- 1 root root 1514581  6. Aug 20:59 libglx.so.debug


Now testing the hypothesis by linking libglx.so.debug manually:

# pwd
/usr/src/debug/usr/lib/xorg/modules/extensions
# ln -s ../../../opengl/xorg-x11/extensions/libglx.so.debug libglx.so.debug
# nm --line-numbers libglx.so.debug | head
000484f9 t .L10 /mnt/hda1/tmp/portage/x11-base/xorg-server-1.8.2/work/xorg-server-1.8.2/glx/singlesize.c:156
0004848b t .L11 /mnt/hda1/tmp/portage/x11-base/xorg-server-1.8.2/work/xorg-server-1.8.2/glx/singlesize.c:159
[...]


# gdb $(which X) X-6362-6-1281112323 
GNU gdb (Gentoo 7.1 p1) 7.1
[...]
(gdb)info shared 
[...]
0x4efa0d90  0x4efa364c  Yes       /usr/lib/xorg/modules/extensions/libdbe.so
0x4ec1c9c0  0x4ec21d74  Yes (*)   /usr/lib/xorg/modules/extensions/libdri.so
0x4ebe9000  0x4ebee124  Yes       /usr/lib/libdrm.so.2
0x4ec15320  0x4ec17d54  Yes (*)   /usr/lib/xorg/modules/extensions/libdri2.so
0x4eb8f500  0x4ebd5520  Yes       /usr/lib/xorg/modules/extensions/libglx.so
0x4ec08570  0x4ec10d5c  Yes         /usr/lib/xorg/modules/input/synaptics_drv.so
[...]


(gdb) bt
[...]
#9  <signal handler called>
#10 0x4ebc8f46 in DrawableGone (glxPriv=0x11ed1710, xid=20971624) 
    at glxext.c:133
#11 0x1102dbef in FreeClientResources (client=0x11421bf0) at resource.c:818
#12 0x11009fad in CloseDownClient (client=0x11421bf0) at dispatch.c:3631
#13 0x1100f071 in KillAllClients () at dispatch.c:3655
#14 Dispatch () at dispatch.c:468
#15 0x1100345a in main (argc=10, argv=0x5aecaee4, envp=0x5aecaf10) 
    at main.c:286


That however does not resolve the other issues:

1.) that the gdb symbol-file command does not work correctly. Possibly also because gdb is unable to relate the debug symbols to the already loaded object file due to the same pathname mismatch. If you try to use symbol-file with gdb-7.1 it will even destroy the other symbols loaded.

2.) the missing argument issue seen by Robert, which he suspects having to do with weak symbols.

Robert, can you try gdb-7.1 to see if there are also shared objects marked with (*) and then inspect these corresponding symbol files very careful for them really being in the right relative place? From your logs I would assume they are, or, that they had been in the correct location for a certain moment at least. Maybe all you else had to do would have been to restart the gdb session after having copied symbol files?

Comment 9 Hugo Mildenberger 2010-08-07 13:48:44 UTC
Using "add-symbol-file" instead of "symbol-file" and specify the module load address as taken from the "info shared" output, one can cope with such situations. 

(gdb) info shared
0x4eb8f500  0x4ebd5520  Yes (*)     /usr/lib/xorg/modules/extensions/libglx.so
[...]
(gdb) add-symbol-file /usr/src/debug/usr/lib/opengl/xorg-x11/extensions/libglx.so.debug 0x4eb8f500

Looking at the (horrible) gdb code in symfile.c it seems as if the command "symbol-file" simply overwrites all symbolic information that has been loaded before:

  /* Currently we keep symbols from the add-symbol-file command.
     If the user wants to get rid of them, they should do "symbol-file"
     without arguments first.  Not sure this is the best behavior
     (PR 2207).  */



Comment 10 Jory A. Pratt gentoo-dev 2010-12-29 04:12:51 UTC
4 months with no info closing, feel free to reopen if you are continuing to have issues.