Summary: | sys-libs/uclibc-0.9.33.2-r14 segfault in dlclose() | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | David Flogeras <dflogeras2> |
Component: | [OLD] Core system | Assignee: | Anthony Basile <blueness> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | embedded |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 570544 | ||
Attachments: |
Workaround for segfault
valgrind log from segfault |
Description
David Flogeras
2015-03-18 01:18:55 UTC
I rebuilt syslog-ng, glib, and uClibc with debugging symbols and stepped through with gdb. If I let it simply crash, the stack is an unusable mess. This is the stack trace directly before the crash. It is the second library that is being unloaded (/usr/lib/syslog-ng/libsyslog-ng-crypto.so). The first one (/usr/lib/syslog-ng/libdbparser.so) unloaded without event. If I step into _dl_run_fini_array(tpnt); on line 843 it segfaults. #0 do_dlclose (vhandle=0x3e680, need_fini=1) at ldso/libdl/libdl.c:843 #1 0xb6c73f38 in dlclose (vhandle=0x3e680) at ldso/libdl/libdl.c:1063 #2 0xb6ceb2a8 in _g_module_close (handle=0x3e680, is_unref=1) at /var/tmp/porta ge/dev-libs/glib-2.40.2/work/glib-2.40.2/gmodule/gmodule-dl.c:136 #3 0xb6cec224 in g_module_close (module=0x46240) at /var/tmp/portage/dev-libs/g lib-2.40.2/work/glib-2.40.2/gmodule/gmodule.c:759 #4 0xb6f7ac0c in plugin_load_candidate_modules (cfg=0x19010) at lib/plugin.c:46 5 #5 0xb6f4fe48 in cfg_load_candidate_modules (self=0x19010) at lib/cfg.c:389 #6 0xb6f530f0 in cfg_lexer_lex (self=0x21f10, yylval=0xbeffd238, yylloc=0xbeffd 25c) at lib/cfg-lexer.c:897 #7 0xb6f548a0 in main_lex (yylval=0xbeffd238, yylloc=0xbeffd25c, lexer=0x21f10) at lib/cfg-parser.c:173 #8 0xb6f90e9c in main_parse (lexer=0x21f10, dummy=0xbefff274, arg=0x0) at lib/c fg-grammar.c:3041 #9 0xb6f4eca0 in cfg_parser_parse (self=0xb6fedb08 <main_parser>, lexer=0x21f10 , instance=0xbefff274, arg=0x0) at ./lib/cfg-parser.h:83 #10 0xb6f4fd80 in cfg_run_parser (self=0x19010, lexer=0x21f10, parser=0xb6fedb08 <main_parser>, result=0xbefff274, arg=0x0) at lib/cfg.c:371 #11 0xb6f5002c in cfg_read_config (self=0x19010, fname=0x17830 "/etc/syslog-ng/s yslog-ng.conf", syntax_only=1, preprocess_into=0x0) at lib/cfg.c:443 #12 0xb6f71400 in main_loop_read_and_init_config () at lib/mainloop.c:445 #13 0x00009650 in main (argc=1, argv=0xbefff4d4) at syslog-ng/main.c:248 It is calling the dtor of syslog-ng-3.6.2/lib/crypto.c, which in turn uses _dl_linux_resolve, and eventually crashes inside _dl_find_hash (uClibc-0.9.33.2/ldso/ldso/arm/elfinterp.c:72). I don't pretent to try to understand the inner workings. Here is the backtrace at the segfault #0 _dl_lookup_sysv_hash (type_class=<optimized out>, undef_name=<optimized out> , hash=252420148, symtab=0xb6b62f38, tpnt=0x3e540) at ldso/ldso/dl-hash.c:260 #1 _dl_find_hash (name=name@entry=0xb6b551f8 "crypto_deinit", scope=<optimized out>, mytpnt=0x44890, type_class=type_class@entry=1, sym_ref=sym_ref@entry=0x0) at ldso/ldso/dl-hash.c:339 #2 0xb6ff22fc in _dl_linux_resolver (tpnt=<optimized out>, reloc_entry=<optimiz ed out>) at ldso/ldso/arm/elfinterp.c:72 #3 0xb6ff6584 in _dl_linux_resolve () at ldso/ldso/arm/resolve.S:126 #4 0xb6ff6584 in _dl_linux_resolve () at ldso/ldso/arm/resolve.S:126 #5 0xb6ff6584 in _dl_linux_resolve () at ldso/ldso/arm/resolve.S:126 ..... The following compiles and runs without issue on the same system: #include <dlfcn.h> #include <stdio.h> #include <errno.h> #include <string.h> int main() { void* v = dlopen( "/usr/lib/syslog-ng/libsyslog-ng-crypto.so", RTLD_NOW ); if( ! v ) fprintf( stderr, "failed to open" ); if( 0 != dlclose( v )) { fprintf( stderr, "%s\n", strerror( errno )); } return 0; } compiled with: gcc test.c -ldl -ggdb -lglib-2.0 -levtlog It appears that at the time of the segfault, in ldso/ldso/dl-hash.c:296 it is walking the linked list "scope". For each entry in scope, it walks anosther linked list loop_scope->r_list. It walked over the first two elements of the outer loop, calling _dl_lookup_sysv_hash() (line 339) each time through. On the third iteration, the 0th element of loop_scope->r_list has libname = "" which is pretty suspicious. I'm not sure if the memory got clobbered, and valgrind does not support armv6j. This seems unrelated to ARM which is good. Here's how you can reproduce in the comfort of your own home. 1 Boot a VM (I used virtualbox) with gentoo livecd YMMV. 2 Partition, and unpack the latest amd64-uclibc-vanilla stage3. No need to install a kernel since you can just chroot in. Do that now. 3 Install some dev tools, I put [c]gdb valgrind vim strace etc. 4 Edit make.conf to use FEATURES="nostrip" and CFLAGS="-ggdb -fno-omit-frame-pointer -pipe" (no optimization) 5 emerge/re-emerge syslog-ng and it's deps. 6 using ebuild, prepare the sources for uclibc, but reconfigure it and turn on debugging information in its menuconfig. Also turn on the debugging info related to dlopen. 7 merge the modified, debug version of uclibc Since uclibc doesn't seem to support elfutils, you cannot use splitdebug/installsources. Instead just unpack the pertinent sources in /var/tmp/portage: 'ebuild /usr/portage/CAT/PKG/PKG.ebuild prepare". I did this for uclibc, glib, and syslog-ng. Run syslog in the debugger, start with "[c]gdb syslog-ng", and start it inside gdb with "r -s -f /etc/syslog-ng/syslog-ng.conf" (This is now rc invokes it). It will segfault in _dl_lookup_gnu_hash(). Created attachment 399244 [details, diff]
Workaround for segfault
blueness showed me how to work around this issue. Here's a patch that removes the call to unmap the mapped region. This is NOT a fix, but may help someone to diagnose.
Created attachment 399246 [details]
valgrind log from segfault
Here's a log showing the invalid memory accesses prior to segfaulting. This is using glib-2.42.2 uclibc-0.9.33.2-r14 and syslog-ng-3.6.2 on amd64-uclibc-vanilla.
(In reply to David Flogeras from comment #5) > Created attachment 399244 [details, diff] [details, diff] > Workaround for segfault > > blueness showed me how to work around this issue. Here's a patch that > removes the call to unmap the mapped region. This is NOT a fix, but may > help someone to diagnose. Actually it is a fix because POSIX does not mandate that dlclose() actually do the unmappings. It may, but it need not. @Dave. Rich Felker suggested the bug might be in the dynamic linker and not in dlclose() per se. Can you try this patch and see if it fixes things: http://git.alpinelinux.org/cgit/aports/commit/main/libc0.9.32/uclibc-dlclose-fix.patch?h=2.7-stable&id=d36e402fae2b31ca2bf6eafbafa77d716ea99b15 Of course, undo my patch. Rebuilt with just the above patch. Segfaults in the same spot as originally. Also, I should clarify comment #6. The valgrind output happens when I comment out the call to unmap as well. It just doesn't segfault. #include <dlfcn.h> #include <stdio.h> #include <errno.h> #include <string.h> int main() { void* v = dlopen( "/usr/lib/syslog-ng/libdbparser.so", RTLD_LAZY | RTLD_GLOBAL ); if( ! v ) fprintf( stderr, "failed to open" ); if( 0 != dlclose( v )) { fprintf( stderr, "%s\n", strerror( errno )); } return 0; } This program causes it. Compile as in comment #2. Also, you have to revert the latest change to syslog-ng -3.6.2.ebuild (--with-embedded-crypto). I just put a copy of the old ebuild from CVS in an overlay. (In reply to David Flogeras from comment #10) I tried several times and I'm not able to hit it with your example. However, I use kvm. I'm going to reproduce your steps in comment #4 precisely and see if I can hit is then. sys-libs/uclibc has been removed from the tree, replaced by sys-libs/uclibc-ng. if this is still a problem on uclibc-ng, please open a new bug. |