dhcpcd versions 8.0.0 through 8.0.2 abort on startup with the error message "malloc(): invalid size (unsorted)". I do not see the issue when running dhcpcd directly from the command-line. rc.log is below: * Bringing up interface eno1 * Caching network module dependencies * dhcp ... * Running dhcpcd ... eno1: waiting for carrier eno1: carrier acquired DUID 00:01:00:01:20:77:1e:bc:e0:3f:49:19:ab:d0 eno1: IAID 49:19:ab:d0 eno1: adding address fe80::e79c:4bc:7909:4d9c eno1: soliciting an IPv6 router eno1: rebinding lease of 172.16.0.5 eno1: probing address 172.16.0.5/27 eno1: Router Advertisement from fe80::72f2:20ff:fee5:4250 eno1: adding address <removed public IPv6 addresss> eno1: adding address fd00::6918:2056:dae:3912/64 eno1: adding route to <removed public IPv6 route> eno1: adding route to fd00::/64 eno1: adding default route via fe80::72f2:20ff:fee5:4250 eno1: requesting DHCPv6 information malloc(): invalid size (unsorted) /lib/netifrc/net/dhcpcd.sh: line 68: 3294 Aborted dhcpcd -t 60 -L -m 2 eno1 [ !! ] [ !! ] * ERROR: net.eno1 failed to start /etc/conf.d/net: dhcpcd_eno1="-t 60 -L" emerge --info: $ emerge --info Portage 2.3.70 (python 3.6.9-final-0, default/linux/amd64/17.1/desktop, gcc-9.1.0, glibc-2.29-r3, 5.2.5-gentoo x86_64) ================================================================= System uname: Linux-5.2.5-gentoo-x86_64-Intel-R-_Core-TM-_i7-4771_CPU_@_3.50GHz-with-gentoo-2.6 KiB Mem: 32633968 total, 25639108 free KiB Swap: 2097148 total, 2097148 free Timestamp of repository gentoo: Sun, 04 Aug 2019 15:27:55 +0000 Head commit of repository gentoo: c6a54080d52de135697665953f348c3c11452d34 Timestamp of repository brother-overlay: Thu, 18 Jul 2019 16:23:03 +0000 Head commit of repository brother-overlay: a6885c69a0c5a9c2c11ee3abe91ba77ec003d88a Head commit of repository nuntoo: 74a482ca8d371849fd7b49d19e223e320ad3e4bb sh bash 5.0_p7 ld GNU ld (Gentoo 2.32 p2) 2.32.0 app-shells/bash: 5.0_p7::gentoo dev-java/java-config: 2.2.0-r4::gentoo dev-lang/perl: 5.30.0::gentoo dev-lang/python: 2.7.16::gentoo, 3.6.9::gentoo dev-util/cmake: 3.15.1::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.41.2::gentoo sys-apps/sandbox: 2.18::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.69-r4::gentoo sys-devel/automake: 1.11.6-r3::gentoo, 1.16.1-r1::gentoo sys-devel/binutils: 2.32-r1::gentoo sys-devel/gcc: 9.1.0-r1::gentoo sys-devel/gcc-config: 2.0::gentoo sys-devel/libtool: 2.4.6-r5::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 5.1::gentoo (virtual/os-headers) sys-libs/glibc: 2.29-r3::gentoo Repositories: gentoo location: /var/db/repos/gentoo sync-type: git sync-uri: https://github.com/gentoo-mirror/gentoo.git priority: -1000 brother-overlay location: /var/db/repos/brother-overlay sync-type: git sync-uri: https://github.com/gentoo-mirror/brother-overlay.git masters: gentoo nuntoo location: /var/db/repos/nuntoo sync-type: git sync-uri: https://github.com/nvinson/nuntoo.git masters: gentoo private location: /var/db/repos/private masters: gentoo ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /opt/brother/scanner/brscan4/brsanenetdevice4.cfg /usr/lib64/libreoffice/program/sofficerc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=native -O2 -pipe" DISTDIR="/var/cache/portage/distfiles" EMERGE_DEFAULT_OPTS="--jobs=8 --load-average=8" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-march=native -O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-march=native -O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LANG="en_US.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="en_US en" MAKEOPTS="-j8 -l8" PKGDIR="/var/cache/binpkgs" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="X a52 aac acl acpi alsa amd64 berkdb branding bzip2 cairo cdda cdr cleartype cli consolekit corefonts crypt cups cxx dbus dri dts dvd dvdr emboss encode exif fam flac fortran gdbm gif glamor gpm gtk iconv icu ipv6 jpeg lcms ldap libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf png policykit ppds qt5 readline sdl seccomp spell split-usr ssl startup-notification svg tcpd theora tiff truetype udev udisks unicode upower usb vaapi vorbis vpx wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev" KERNEL="linux" L10N="en-US en" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby24" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Can you reproduce it with the following command? dhcpcd -t 60 -L -m 2 -dB6 eno1 You will have to ctrl-C dhcpcd or kill it from another terminal once done testing.
(In reply to Roy Marples from comment #1) > Can you reproduce it with the following command? > > dhcpcd -t 60 -L -m 2 -dB6 eno1 > > You will have to ctrl-C dhcpcd or kill it from another terminal once done > testing. I tried several times before posting this bug. I could not reproduce it that way. It only appeared via /etc/init.d/net.eno1.
(In reply to nvinson234 from comment #2) > (In reply to Roy Marples from comment #1) > > Can you reproduce it with the following command? > > > > dhcpcd -t 60 -L -m 2 -dB6 eno1 > > > > You will have to ctrl-C dhcpcd or kill it from another terminal once done > > testing. > > I tried several times before posting this bug. I could not reproduce it > that way. It only appeared via /etc/init.d/net.eno1. I'm somewhat surprised by that. Anyway, can you capture a stack trace please? Gentoo has a guide here: https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces I would be able to fix this without that. Maybe a raw tcpdump I can view via wireshark would help also. Email that to me privately if you don't want to attach it here.
One stack trace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007efced05c535 in __GI_abort () at abort.c:79 #2 0x00007efced0b8947 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7efced1cdac8 "%s\n") at ../sysdeps/posix/libc_fatal.c:181 #3 0x00007efced0c0778 in malloc_printerr (str=str@entry=0x7efced1cfb18 "malloc(): invalid size (unsorted)") at malloc.c:5361 #4 0x00007efced0c364c in _int_malloc (av=av@entry=0x7efced201c40 <main_arena>, bytes=bytes@entry=32) at malloc.c:3750 #5 0x00007efced0c60b6 in __libc_calloc (n=n@entry=1, elem_size=<optimized out>) at malloc.c:3446 #6 0x0000559e3872603f in ipv6nd_advertise (ia=ia@entry=0x559e38a90c60) at ipv6nd.c:497 #7 0x0000559e387262a0 in ipv6nd_dadcallback (arg=0x559e38a90c60) at ipv6nd.c:910 #8 0x0000559e387241c4 in ipv6_handleifa_addrs (cmd=cmd@entry=20, addrs=addrs@entry=0x559e38a90a90, addr=addr@entry=0x559e38a92380, pid=pid@entry=0) at ipv6.c:1757 #9 0x0000559e38726c07 in ipv6nd_handleifa (cmd=cmd@entry=20, addr=addr@entry=0x559e38a92380, pid=pid@entry=0) at ipv6nd.c:1496 #10 0x0000559e3872382c in ipv6_handleifa (ctx=ctx@entry=0x7ffc28744220, cmd=<optimized out>, ifs=<optimized out>, ifs@entry=0x0, ifname=ifname@entry=0x559e38aa3ee8 "eno1", addr=addr@entry=0x7ffc2873ff70, prefix_len=<optimized out>, addrflags=0, pid=0) at ipv6.c:1190 #11 0x0000559e38715bdd in link_addr (ifp=0x7ffc28740188, nlm=0x7ffc28740140, ctx=0x7ffc28744220) at if-linux.c:647 #12 link_netlink (ctx=ctx@entry=0x7ffc28744220, arg=arg@entry=0x0, nlm=nlm@entry=0x7ffc28740140) at if-linux.c:733 #13 0x0000559e38715ffa in get_netlink (ctx=0x7ffc28744220, iov=iov@entry=0x7ffc28740130, arg=arg@entry=0x0, fd=5, flags=flags@entry=64, callback=callback@entry=0x559e38715820 <link_netlink>) at if-linux.c:420 #14 0x0000559e38716769 in if_handlelink (ctx=<optimized out>) at if-linux.c:833 #15 0x0000559e38706fb9 in dhcpcd_handlelink (arg=0x7ffc28744220) at dhcpcd.c:1083 #16 0x0000559e3870878b in eloop_start (eloop=0x559e38aa3dd0, signals=0x7ffc28744330) at eloop.c:979 #17 0x0000559e387034ea in main (argc=<optimized out>, argv=<optimized out>) at dhcpcd.c:2101
Created attachment 585736 [details, diff] debug Please add this patch to get more information. The output of `ip a` would also help.
(In reply to Roy Marples from comment #5) > Created attachment 585736 [details, diff] [details, diff] > debug > > Please add this patch to get more information. > The output of `ip a` would also help. % sudo /etc/init.d/net.eno1 restart * Caching service dependencies ... [ ok ] * Bringing down interface eno1 * Stopping dhcpcd on eno1 ... sending signal TERM to pid 30713 dhcpcd not running [ !! ] * Bringing up interface eno1 * dhcp ... * Running dhcpcd ... eno1: waiting for carrier eno1: carrier acquired DUID 00:01:00:01:20:77:1e:bc:e0:3f:49:19:ab:d0 eno1: IAID 49:19:ab:d0 eno1: adding address fe80::e79c:4bc:7909:4d9c eno1: soliciting an IPv6 router eno1: rebinding lease of 172.16.0.5 eno1: probing address 172.16.0.5/27 eno1: Router Advertisement from fe80::72f2:20ff:fee5:4250 eno1: adding address <removed> eno1: adding address fd00::6918:2056:dae:3912/64 eno1: adding route to <removed> eno1: adding route to fd00::/64 eno1: adding default route via fe80::72f2:20ff:fee5:4250 eno1: requesting DHCPv6 information MAKING NA FOR eno1 malloc(): invalid size (unsorted) /lib/netifrc/net/dhcpcd.sh: line 68: 1664 Aborted (core dumped) dhcpcd -t 60 -L -m 2 eno1 [ !! ] * 172.16.0.30/27 ... [ ok ] * Waiting for tentative IPv6 addresses to complete DAD (5 seconds) . [ ok ] ----------------------------- % ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether e0:3f:49:19:ab:d0 brd ff:ff:ff:ff:ff:ff inet 172.16.0.30/27 brd 172.16.0.31 scope global eno1 valid_lft forever preferred_lft forever inet6 fd00::6918:2056:dae:3912/64 scope global dynamic mngtmpaddr noprefixroute valid_lft 3526sec preferred_lft 526sec inet6 <removed> scope global dynamic mngtmpaddr noprefixroute valid_lft 3526sec preferred_lft 526sec inet6 fe80::e79c:4bc:7909:4d9c/64 scope link valid_lft forever preferred_lft forever 3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0
(In reply to nvinson234 from comment #6) > eno1: requesting DHCPv6 information > MAKING NA FOR eno1 > malloc(): invalid size (unsorted) > /lib/netifrc/net/dhcpcd.sh: line 68: 1664 Aborted (core > dumped) dhcpcd -t 60 -L -m 2 eno1 There should be two more diagnostic lines before calloc is even called. Lets rule out an OpenRC logging issue and add this to the top of /etc/dhcpcd.conf debug logfile /var/log/dhcpcd.log Make it crash again and attach that file please.
Created attachment 585740 [details] dhcpcd_debug_on.log The log as requested. However, dhcpcd doesn't crash when debug is enabled.
Created attachment 585742 [details] dhcpcd_debug_off.log A dhcpcd run without debugging enabled. This one crashes.
(In reply to nvinson234 from comment #9) > Created attachment 585742 [details] > dhcpcd_debug_off.log > > A dhcpcd run without debugging enabled. This one crashes. Aug 04 19:44:42 [4142]: MAKING NA FOR eno1 Aug 04 19:44:42 [4142]: NA LEN 24 (6) Aug 04 19:44:42 [4142]: NA LEN FINAL 32 Over to toolchain to explain why a calloc of 32 bytes fails in netifrc only without dhcpcd debug enabled.
(In reply to Roy Marples from comment #10) > (In reply to nvinson234 from comment #9) > > Created attachment 585742 [details] > > dhcpcd_debug_off.log > > > > A dhcpcd run without debugging enabled. This one crashes. > > Aug 04 19:44:42 [4142]: MAKING NA FOR eno1 > Aug 04 19:44:42 [4142]: NA LEN 24 (6) > Aug 04 19:44:42 [4142]: NA LEN FINAL 32 > > Over to toolchain to explain why a calloc of 32 bytes fails in netifrc only > without dhcpcd debug enabled. Hard to say without looking at both binaries. Assuming generated code is identical. Stripped and unstripped binaries have different number of sections in the ELF header. This causes early startup code to allocate different amount of memory at program startup. This might result inn different heap layout and different allocation order. From "malloc(): invalid size (unsorted)" I am guessing we are dealing with memory corruption here (run out of buffer?) that corrupts internal malloc() data structures. Thus initial heap layout is different enough not to trigger a problem. I would suggest running dhcpcd against asan or valgrind in hopes to catch earlier corruption. Someting like: # valgrind dhcpcd -t 60 -L -m 2 -dB6 eno1
Created attachment 585750 [details] valgrind.log
(In reply to nvinson234 from comment #12) > Created attachment 585750 [details] > valgrind.log ... > ==23041== Command: /etc/init.d/net.eno1 restart If you run valgrind against the shell script you also need to use --trace-children=yes to get to emulation of dhcpcd: # valgrind --trace-children=yes /etc/init.d/net.eno1 restart
Created attachment 585758 [details] valgrind.log
You might have to edit dhcpcd.sh of the netifrc package to add valgrind around line 69: https://gitweb.gentoo.org/proj/netifrc.git/tree/net/dhcpcd.sh#n69 eval valgrind dhcpcd "${args}" "${IFACE}"
Or add -fsanitize=address to CFLAGS and -lasan to LDFLAGS and re-emerge dhcpcd with compiler address sanitisation. You can check this with ldd /sbin/dhcpcd and ensure it's linked to the asan library.
Created attachment 585766 [details] asan_enabled_run.log
That log is perfect! Show's exactly where the error is, but why it crashes there needs more analysis. Can you email me the DHCPv6 lease file in /var/lib/dhcpcd/eno1.lease6 please? dhcpcd should create or modify that file before crashing with the information needed.
dhcpcd -U6 eno1 should also crash dhcpcd?
(In reply to Roy Marples from comment #19) > dhcpcd -U6 eno1 > > should also crash dhcpcd? It does not. I'll send the lease directly to you.
forgot to mention that dhcpcd required some patching to get it to build with libasan. The patch is to modify configure so that it checks for dlsym(NULL, "dlsym"). Otherwise, it'll drop the -ldl flag because libasan defines dlopen() as well.
Created attachment 585976 [details] asan_enabled_dhcpcd_and_openrc_run.log A new run where both openrc and dhcpcd were build with libasan instrumentation.
Created attachment 586102 [details, diff] more debug I've examined the lease extensively and concluded it is fine and on 4 different OS by itself it won't crash dhcpcd or trigger any asan issues. Valgrind also reports no errors. Please add this patch and post a new log.
Created attachment 586106 [details] dhcpcd_w_patches.log Requested log added.
> Aug 05 00:35:33 [15756]: dhcp6_listen: Address already in use Looks like another dhcpcd instance is already running? Unsure how it's related, but can you crash it at all with this debug patch?
Created attachment 586108 [details] dhcpcd_w_patches.log updated the log and made sure dhcpcd wasn't running.
Everything looks normal and no crash? Can you repeat the actions that caused the crash please? Kill all dhcpcd and restart net.eno1
That is a log of a crashed session. I still have the sanitizer library built into dhcpcd. Do you want me to rebuild without it and try again?
Created attachment 586110 [details] openrc_output.log I pulled the openrc output of that run from my screen buffer. I'm attaching it.
Created attachment 586126 [details, diff] Even more debug So what seems to be happening is that we count the NULLs in a first pass and ensure we have enough space to store a pointer for each + 1 (NULL terminator). Then we loop through the same data, again looking for NULLs and assinging the pointers. But even though we are aborting the same loop one character earlier, we are couting an extra NULL which is causing this issue. This is backed up by the below debug which counts 17 in the first pass but the difference count is 18. nenv 17 envlen 14 nenv 17 envlen 17 buf 0x625000002900 endp 0x625000002b04 script 0x60d000002810 end 0x60d000002898 envp 0x60d0000028a0 diff 18 This new patch shows where it thinks the NULLs are within the data AND the data itself (twice for validation). So it's a lot more debug, but will hopefully show where it's going wrong.
Created attachment 586130 [details] openrc_20190808T141433.log
Created attachment 586132 [details] dhcpcd_20190808T141433.log
Created attachment 586138 [details, diff] Fix the issue Many thanks for the prompt reports! So the problem was that an env string was not terminated correctly. I've added assertations to ensure everything IS terminated correctly and fixed the underlying issue. Let me know if this fixes it for you!
The patch appears to be working. /etc/init.d/net.eno1 runs successfully now. Thanks.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2ce3a8e57adc2ea9389c35301ce081ab89b42931 commit 2ce3a8e57adc2ea9389c35301ce081ab89b42931 Author: Lars Wendler <polynomial-c@gentoo.org> AuthorDate: 2019-08-09 15:16:40 +0000 Commit: Lars Wendler <polynomial-c@gentoo.org> CommitDate: 2019-08-09 15:17:03 +0000 net-misc/dhcpcd: Revbump to fix incorrect string termination Bug: https://bugs.gentoo.org/691426 Package-Manager: Portage-2.3.71, Repoman-2.3.17 Signed-off-by: Lars Wendler <polynomial-c@gentoo.org> ...{dhcpcd-8.0.2.ebuild => dhcpcd-8.0.2-r1.ebuild} | 4 ++ .../files/dhcpcd-8.0.2-string_termination.patch | 80 ++++++++++++++++++++++ 2 files changed, 84 insertions(+)