Created attachment 503552 [details] Example of shared object which causes crash After upgrade to glibc-2.25 (a long, long while ago) or approximately this time I started to get sporadic segfaults in ld.so. After a scan with revdep-rebuild (now revdep-rebuild.sh), as an example, I am getting my syslog spammed with tons of messages like: [1985463.850366] ld-linux.so.2[2352]: segfault at 4 ip 00000000f77602c7 sp 00000000ffd1b700 error 4 in ld-2.25.so[f7754000+23000] [1985468.718062] show_signal_msg: 177 callbacks suppressed Segfaults happen at different ip values and at different addresses. Today I finally looked into the crash. 1. I managed to reduce the reproduction to this simple command (see the attachment): $ /lib64/ld-2.25.so --list vdso64.so Segmentation fault 2. Backtrace: (gdb) where #0 0x00007ffff7ddfca1 in elf_get_dynamic_info (temp=0x0, l=0x7ffff7ffe120) at get-dynamic-info.h:102 #1 _dl_map_object_from_fd (name=name@entry=0x7fffffffe12e "/tmp/vdso64.so", origname=origname@entry=0x0, fd=<optimized out>, fbp=<optimized out>, realname=<optimized out>, loader=loader@entry=0x0, l_type=0, mode=536870912, stack_endp=0x7fffffffc888, nsid=0) at dl-load.c:1200 #2 0x00007ffff7de2524 in _dl_map_object (loader=loader@entry=0x0, name=0x7fffffffe12e "/tmp/vdso64.so", type=type@entry=0, trace_mode=trace_mode@entry=0, mode=mode@entry=536870912, nsid=nsid@entry=0) at dl-load.c:2199 #3 0x00007ffff7ddece0 in dl_main (phdr=<optimized out>, phnum=6, user_entry=0x7fffffffdd38, auxv=0x7fffffffdfc8) at rtld.c:1037 #4 0x00007ffff7df114e in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffdde0, dl_main=dl_main@entry=0x7ffff7ddbd70 <dl_main>) at ../elf/dl-sysdep.c:253 #5 0x00007ffff7ddb909 in _dl_start_final (arg=0x7fffffffdde0) at rtld.c:399 #6 _dl_start (arg=0x7fffffffdde0) at rtld.c:505 #7 0x00007ffff7ddab48 in _start () 3. I have a gentoo host which produces these crashes (in normal everyday work) and one (also gentoo) which doesn't. Both are amd64, both are mostly on stable with the same packages installed in @system set, and both have the same version of gcc and binutils. The difference is CPU, the "bad" one is K8 while the "good" one is core-i5. CFLAGS on both include "-march=native". If I copy an .so from the "bad" host to the "good" one and run ld.so --list <this_bad_so_specimen> then I reproduce the crash on the "good" host. That is, what causes crash is the so file, not the environment. Another difference is kernel configs, and the so's which produce the crashes which I could catch so far belong to /usr/src/linux-*. 4. The corresponding snippet of code which produces the segfault is: # define DT_HASH 4 # define ADJUST_DYN_INFO(tag) \ do \ if (info[tag] != NULL) \ { \ if (temp) \ { \ temp[cnt].d_tag = info[tag]->d_tag; \ temp[cnt].d_un.d_ptr = info[tag]->d_un.d_ptr + l_addr; \ info[tag] = temp + cnt++; \ } \ else \ info[tag]->d_un.d_ptr += l_addr; \ } \ while (0) ADJUST_DYN_INFO (DT_HASH); And in assembly: 0x00007ffff7ddfc97 <+2055>: mov 0x60(%r12),%rax 0x00007ffff7ddfc9c <+2060>: test %rax,%rax 0x00007ffff7ddfc9f <+2063>: je 0x7ffff7ddfca5 <_dl_map_object_from_fd+2069> => 0x00007ffff7ddfca1 <+2065>: add %rcx,0x8(%rax) What makes me wondering is: (gdb) p temp $27 = (Elf64_Dyn *) 0x0 (gdb) p &info[4] $28 = (Elf64_Dyn **) 0x7ffff7ffe180 (gdb) p &info[4]->d_un.d_ptr $29 = (Elf64_Addr *) 0x7ffff7ff7370 (gdb) i r rax rax 0x7ffff7ff7368 140737354101608 (gdb) p *info[4] $30 = {d_tag = 4, d_un = {d_val = 288, d_ptr = 288}} (gdb) p info[4]->d_un.d_ptr += l_addr $31 = 140737354101024 (gdb) p *info[4] $32 = {d_tag = 4, d_un = {d_val = 140737354101024, d_ptr = 140737354101024}} So the memory (despite the VDSO page address) seems to be readable and writable. I see no obvious reason as to what causes the segmentation fault. I assume that after the access violation this page somehow becomes RW (maybe re-mapped in signal handler) but I don't know how to quickly check this assumption. 5. Rebuilding glibc, gcc, binutils does not make the issue disappear. 6. On both ("good" and "bad") hosts: >grep CONFIG_COMPAT_VDSO /usr/src/linux/.config # CONFIG_COMPAT_VDSO is not set 7. emerge --info (on "bad" host, to be specific) $ emerge --info Portage 2.3.8 (python 3.4.5-final-0, default/linux/amd64/13.0, gcc-5.4.0, glibc-2.25-r8, 4.9.57-alb x86_64) ================================================================= System uname: Linux-4.9.57-alb-x86_64-Dual_Core_AMD_Opteron-tm-_Processor_290-with-gentoo-2.4.1 KiB Mem: 16536472 total, 2109828 free KiB Swap: 50331636 total, 48188980 free Timestamp of repository gentoo: Fri, 10 Nov 2017 15:15:01 +0000 Head commit of repository gentoo: 9974fd94f4fdc69679834e441c7f86a787effe8e sh bash 4.3_p48-r1 ld GNU ld (Gentoo 2.28.1 p1.0) 2.28.1 app-shells/bash: 4.3_p48-r1::gentoo dev-java/java-config: 2.2.0-r3::gentoo dev-lang/perl: 5.24.3::gentoo dev-lang/python: 2.7.14::gentoo, 3.4.5::gentoo, 3.5.4::gentoo dev-util/cmake: 3.8.2::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.4.1-r2::gentoo sys-apps/openrc: 0.32.1::gentoo sys-apps/sandbox: 2.10-r4::gentoo sys-devel/autoconf: 2.13::gentoo, 2.69::gentoo sys-devel/automake: 1.11.6-r1::gentoo, 1.15-r2::gentoo sys-devel/binutils: 2.28.1::gentoo sys-devel/gcc: 5.4.0-r3::gentoo sys-devel/gcc-config: 1.8-r1::gentoo sys-devel/libtool: 2.4.6-r3::gentoo sys-devel/make: 4.2.1::gentoo sys-kernel/linux-headers: 4.9::gentoo (virtual/os-headers) sys-libs/glibc: 2.25-r8::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage priority: -1000 x-portage location: /usr/local/portage masters: gentoo priority: 0 vmware location: /var/lib/layman/vmware masters: gentoo priority: 50 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA sun-bcla-java-vm Oracle-BCLA-JavaSE dlj-1.1 skype-eula skype-4.0.0.7-copyright googleearth AdobeFlash-11.x Intel-SDP TeamViewer NVIDIA-CUDA NVIDIA-gdk ACML-EULA OPERA-12 RAR" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer -finline-functions-called-once -ftree-vectorize" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /etc/stunnel/stunnel.conf /usr/lib64/libreoffice/program/sofficerc /usr/share/gnupg/qualified.txt /usr/share/maven-bin-3.3/conf /usr/share/themes/oxygen-gtk/gtk-2.0" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.0/ext-active/ /etc/php/cgi-php7.0/ext-active/ /etc/php/cli-php7.0/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer -finline-functions-called-once -ftree-vectorize" DISTDIR="/scratch/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer -finline-functions-called-once -ftree-vectorize -fprefetch-loop-arrays -funroll-loops -fno-stack-protector" GENTOO_MIRRORS="http://mirror.yandex.ru/gentoo-distfiles/ http://trumpetti.atm.tut.fi/gentoo/ http://mirror.qubenet.net/mirror/gentoo/ ftp://mirror.yandex.ru/gentoo-distfiles/" LANG="en_US.UTF-8" LC_ALL="" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j8" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/tmp" USE="X a52 aac aacs acl acpi alsa amd64 amr apng asm bash-completion berkdb bidi bluray bundled-libs bzip2 caps cdda cdr celt cjk cli cracklib crypt cryptsetup cscope cups cxx dbus dirac djvu dri dv dvb dvd dvdr dvdread eselect exif faac ffmpeg flac fontconfig fortran g726 g729 gdbm gif gimp gmp gpm gsm gsm-nonstandard gtk http iconv icu idn ieee1394 ilbc jpeg jpeg2k lame lcms ldap ldapsam libnotify lm_sensors lock logrotate mad matroska mmap mms mng modules mp3 mpeg multilib musepack ncurses nls nodrm nptl nsplugin numa ogg opencl opengl openmp opus pam pcre pkcs11 png qt5 readline samba seccomp session silk srtp ssl startup-notification taglib tcpd theora threads thunar tiff timidity truetype udev unicode usb vcd vdpau vim-syntax visio vorbis vpx wavpack winbind wmf wpg x264 xattr xcomposite xinerama xmp xv xvid xvmc zlib" ABI_X86="32 64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="alias auth_basic auth_digest authn_alias authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_user autoindex dir env expires filter headers deflate info log_config logio mime mime_magic negotiation status unique_id userdir rewrite reqtimeout proxy proxy_connect proxy_http authn_core authz_core unixd socache_shmcb" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="3dnow 3dnowext mmx mmxext sse sse2 sse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev wacom" KERNEL="linux" L10N="en fa ru" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en fa ru" NGINX_MODULES_HTTP="auth_pam access auth_basic autoindex browser charset fastcgi fancyindex geoip gzip headers_more limit_conn limit_req proxy referer rewrite scgi stub_status" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-0" POSTGRES_TARGETS="postgres9_5" PYTHON_SINGLE_TARGET="python3_4" PYTHON_TARGETS="python2_7 python3_4" RUBY_TARGETS="ruby22" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
I realized what made this vdso64.so from linux kernel "toxic". It is CONFIG_LEGACY_VSYSCALL_EMULATE=y on the "bad" host. On the "good" host the selection is CONFIG_LEGACY_VSYSCALL_NONE=y (Both settings are on purpose.) This also probably explains why memory becomes read-write after the fault. Anyway, in my opinion, ld.so should not segfault with any input.
Yep, that looks like a bug.
(In reply to Alexander Bezrukov from comment #1) > I realized what made this vdso64.so from linux kernel "toxic". It is > > CONFIG_LEGACY_VSYSCALL_EMULATE=y > > on the "bad" host. On the "good" host the selection is > > CONFIG_LEGACY_VSYSCALL_NONE=y > > (Both settings are on purpose.) > > This also probably explains why memory becomes read-write after the fault. > > Anyway, in my opinion, ld.so should not segfault with any input. I suggest reporting the bug directly upstream. I would be very wary to add gentoo-specific code in early startup phase. It's very easy to break. ld.so does minimal to no validation of mapped ELF files: libc is not loaded, relocations are not yet processed. It's a very sensitive piece of code both from performance and fragility standpoints. But maybe it can be tweaked for this particular case. https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-load.c;h=1220183ce29f83668d2044dc25093c08184335fe;hb=HEAD#l857 Note how little it does before actually crashing: $ LD_DEBUG=all /lib/ld-2.26.so --list ./vdso64.so 24623: file=./vdso64.so [0]; generating link map <SIGSEGV> $ strace -f /lib/ld-2.26.so --list ./vdso64.so execve("/lib/ld-2.26.so", ["/lib/ld-2.26.so", "--list", "./vdso64.so"], 0x7ffe60180838 /* 80 vars */) = 0 brk(NULL) = 0x7fa529126000 openat(AT_FDCWD, "./vdso64.so", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\t\0\0\0\0\0\0"..., 832) = 832 lseek(3, 1960, SEEK_SET) = 1960 read(3, "\6\0\0\0\4\0\0\0\0\0\0\0Linux\0\0\0<\t\4\0\4\0\0\0\24\0\0\0"..., 60) = 60 fstat(3, {st_mode=S_IFREG|0755, st_size=4512, ...}) = 0 getcwd("/home/slyfox/Downloads", 128) = 23 mmap(NULL, 3209, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa5277bc000 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fa5277bc370} --- +++ killed by SIGSEGV (core dumped) +++
From comment #3 I assume that this is not restricted to 2.25 ... When you file an upstream bug report, please link to it here!
If you are still interested in making ld.so more robust please file upstream bug report at: https://sourceware.org/bugzilla/enter_bug.cgi?product=glibc Gentoo will not add extra downstream validation to the loader.