Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 691426 - >=net-misc/dhcpcd-8.0.0 aborts on net.eno startup when running IPv6 scripts
Summary: >=net-misc/dhcpcd-8.0.0 aborts on net.eno startup when running IPv6 scripts
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: William Hubbs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-04 16:10 UTC by nvinson234
Modified: 2020-04-22 17:30 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
debug (dhcpcd-nadebug.diff,539 bytes, patch)
2019-08-05 00:00 UTC, Roy Marples
Details | Diff
dhcpcd_debug_on.log (dhcpcd_debug_on.log,7.49 KB, text/x-log)
2019-08-05 02:49 UTC, nvinson234
Details
dhcpcd_debug_off.log (dhcpcd_debug_off.log,1.12 KB, text/x-log)
2019-08-05 02:50 UTC, nvinson234
Details
valgrind.log (valgrind.log,2.85 KB, text/x-log)
2019-08-05 07:49 UTC, nvinson234
Details
valgrind.log (valgrind.log,12.46 KB, text/x-log)
2019-08-05 09:03 UTC, nvinson234
Details
asan_enabled_run.log (run.log,3.93 KB, text/x-log)
2019-08-05 13:36 UTC, nvinson234
Details
asan_enabled_dhcpcd_and_openrc_run.log (openrc.log,9.00 KB, text/x-log)
2019-08-07 14:25 UTC, nvinson234
Details
more debug (dhcpcd-crash-debug.diff,1008 bytes, patch)
2019-08-08 00:05 UTC, Roy Marples
Details | Diff
dhcpcd_w_patches.log (dhcpcd.log,89.55 KB, text/x-log)
2019-08-08 01:28 UTC, nvinson234
Details
dhcpcd_w_patches.log (dhcpcd.log,3.94 KB, text/x-log)
2019-08-08 02:18 UTC, nvinson234
Details
openrc_output.log (openrc_output.log,6.54 KB, text/x-log)
2019-08-08 02:41 UTC, nvinson234
Details
Even more debug (dhcpcd-more-debug.diff,1.48 KB, patch)
2019-08-08 13:34 UTC, Roy Marples
Details | Diff
openrc_20190808T141433.log (openrc.log,14.85 KB, text/x-log)
2019-08-08 14:15 UTC, nvinson234
Details
dhcpcd_20190808T141433.log (dhcpcd.log,108.10 KB, text/x-log)
2019-08-08 14:16 UTC, nvinson234
Details
Fix the issue (dhcpcd-fix.diff,1.75 KB, patch)
2019-08-08 15:35 UTC, Roy Marples
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description nvinson234 2019-08-04 16:10:20 UTC
dhcpcd versions 8.0.0 through 8.0.2 abort on startup with the error message "malloc(): invalid size (unsorted)".  I do not see the issue when running dhcpcd directly from the command-line.

rc.log is below:

 * Bringing up interface eno1
 *   Caching network module dependencies
 *   dhcp ...
 *     Running dhcpcd ...
eno1: waiting for carrier
eno1: carrier acquired
DUID 00:01:00:01:20:77:1e:bc:e0:3f:49:19:ab:d0
eno1: IAID 49:19:ab:d0
eno1: adding address fe80::e79c:4bc:7909:4d9c
eno1: soliciting an IPv6 router
eno1: rebinding lease of 172.16.0.5
eno1: probing address 172.16.0.5/27
eno1: Router Advertisement from fe80::72f2:20ff:fee5:4250
eno1: adding address <removed public IPv6 addresss>
eno1: adding address fd00::6918:2056:dae:3912/64
eno1: adding route to <removed public IPv6 route>
eno1: adding route to fd00::/64
eno1: adding default route via fe80::72f2:20ff:fee5:4250
eno1: requesting DHCPv6 information
malloc(): invalid size (unsorted)
/lib/netifrc/net/dhcpcd.sh: line 68:  3294 Aborted                 dhcpcd -t 60 -L -m 2 eno1
 [ !! ]
 [ !! ]
 * ERROR: net.eno1 failed to start

/etc/conf.d/net:

dhcpcd_eno1="-t 60 -L"

emerge --info:
$ emerge --info
Portage 2.3.70 (python 3.6.9-final-0, default/linux/amd64/17.1/desktop, gcc-9.1.0, glibc-2.29-r3, 5.2.5-gentoo x86_64)
=================================================================
System uname: Linux-5.2.5-gentoo-x86_64-Intel-R-_Core-TM-_i7-4771_CPU_@_3.50GHz-with-gentoo-2.6
KiB Mem:    32633968 total,  25639108 free
KiB Swap:    2097148 total,   2097148 free
Timestamp of repository gentoo: Sun, 04 Aug 2019 15:27:55 +0000
Head commit of repository gentoo: c6a54080d52de135697665953f348c3c11452d34

Timestamp of repository brother-overlay: Thu, 18 Jul 2019 16:23:03 +0000
Head commit of repository brother-overlay: a6885c69a0c5a9c2c11ee3abe91ba77ec003d88a

Head commit of repository nuntoo: 74a482ca8d371849fd7b49d19e223e320ad3e4bb

sh bash 5.0_p7
ld GNU ld (Gentoo 2.32 p2) 2.32.0
app-shells/bash:          5.0_p7::gentoo
dev-java/java-config:     2.2.0-r4::gentoo
dev-lang/perl:            5.30.0::gentoo
dev-lang/python:          2.7.16::gentoo, 3.6.9::gentoo
dev-util/cmake:           3.15.1::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.41.2::gentoo
sys-apps/sandbox:         2.18::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.11.6-r3::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.32-r1::gentoo
sys-devel/gcc:            9.1.0-r1::gentoo
sys-devel/gcc-config:     2.0::gentoo
sys-devel/libtool:        2.4.6-r5::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.1::gentoo (virtual/os-headers)
sys-libs/glibc:           2.29-r3::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo.git
    priority: -1000

brother-overlay
    location: /var/db/repos/brother-overlay
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/brother-overlay.git
    masters: gentoo

nuntoo
    location: /var/db/repos/nuntoo
    sync-type: git
    sync-uri: https://github.com/nvinson/nuntoo.git
    masters: gentoo

private
    location: /var/db/repos/private
    masters: gentoo

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /opt/brother/scanner/brscan4/brsanenetdevice4.cfg /usr/lib64/libreoffice/program/sofficerc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/var/cache/portage/distfiles"
EMERGE_DEFAULT_OPTS="--jobs=8 --load-average=8"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en_US en"
MAKEOPTS="-j8 -l8"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X a52 aac acl acpi alsa amd64 berkdb branding bzip2 cairo cdda cdr cleartype cli consolekit corefonts crypt cups cxx dbus dri dts dvd dvdr emboss encode exif fam flac fortran gdbm gif glamor gpm gtk iconv icu ipv6 jpeg lcms ldap libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf png policykit ppds qt5 readline sdl seccomp spell split-usr ssl startup-notification svg tcpd theora tiff truetype udev udisks unicode upower usb vaapi vorbis vpx wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev" KERNEL="linux" L10N="en-US en" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby24" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Roy Marples 2019-08-04 18:31:04 UTC
Can you reproduce it with the following command?

dhcpcd -t 60 -L -m 2 -dB6 eno1

You will have to ctrl-C dhcpcd or kill it from another terminal once done testing.
Comment 2 nvinson234 2019-08-04 19:49:06 UTC
(In reply to Roy Marples from comment #1)
> Can you reproduce it with the following command?
> 
> dhcpcd -t 60 -L -m 2 -dB6 eno1
> 
> You will have to ctrl-C dhcpcd or kill it from another terminal once done
> testing.

I tried several times before posting this bug.  I could not reproduce it that way.  It only appeared via /etc/init.d/net.eno1.
Comment 3 Roy Marples 2019-08-04 21:37:44 UTC
(In reply to nvinson234 from comment #2)
> (In reply to Roy Marples from comment #1)
> > Can you reproduce it with the following command?
> > 
> > dhcpcd -t 60 -L -m 2 -dB6 eno1
> > 
> > You will have to ctrl-C dhcpcd or kill it from another terminal once done
> > testing.
> 
> I tried several times before posting this bug.  I could not reproduce it
> that way.  It only appeared via /etc/init.d/net.eno1.

I'm somewhat surprised by that.
Anyway, can you capture a stack trace please?
Gentoo has a guide here: https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces

I would be able to fix this without that.
Maybe a raw tcpdump I can view via wireshark would help also.
Email that to me privately if you don't want to attach it here.
Comment 4 nvinson234 2019-08-04 23:24:52 UTC
One stack trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007efced05c535 in __GI_abort () at abort.c:79
#2  0x00007efced0b8947 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7efced1cdac8 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007efced0c0778 in malloc_printerr (str=str@entry=0x7efced1cfb18 "malloc(): invalid size (unsorted)") at malloc.c:5361
#4  0x00007efced0c364c in _int_malloc (av=av@entry=0x7efced201c40 <main_arena>, bytes=bytes@entry=32) at malloc.c:3750
#5  0x00007efced0c60b6 in __libc_calloc (n=n@entry=1, elem_size=<optimized out>) at malloc.c:3446
#6  0x0000559e3872603f in ipv6nd_advertise (ia=ia@entry=0x559e38a90c60) at ipv6nd.c:497
#7  0x0000559e387262a0 in ipv6nd_dadcallback (arg=0x559e38a90c60) at ipv6nd.c:910
#8  0x0000559e387241c4 in ipv6_handleifa_addrs (cmd=cmd@entry=20, addrs=addrs@entry=0x559e38a90a90, addr=addr@entry=0x559e38a92380, pid=pid@entry=0) at ipv6.c:1757
#9  0x0000559e38726c07 in ipv6nd_handleifa (cmd=cmd@entry=20, addr=addr@entry=0x559e38a92380, pid=pid@entry=0) at ipv6nd.c:1496
#10 0x0000559e3872382c in ipv6_handleifa (ctx=ctx@entry=0x7ffc28744220, cmd=<optimized out>, ifs=<optimized out>, ifs@entry=0x0, ifname=ifname@entry=0x559e38aa3ee8 "eno1", addr=addr@entry=0x7ffc2873ff70, 
    prefix_len=<optimized out>, addrflags=0, pid=0) at ipv6.c:1190
#11 0x0000559e38715bdd in link_addr (ifp=0x7ffc28740188, nlm=0x7ffc28740140, ctx=0x7ffc28744220) at if-linux.c:647
#12 link_netlink (ctx=ctx@entry=0x7ffc28744220, arg=arg@entry=0x0, nlm=nlm@entry=0x7ffc28740140) at if-linux.c:733
#13 0x0000559e38715ffa in get_netlink (ctx=0x7ffc28744220, iov=iov@entry=0x7ffc28740130, arg=arg@entry=0x0, fd=5, flags=flags@entry=64, callback=callback@entry=0x559e38715820 <link_netlink>) at if-linux.c:420
#14 0x0000559e38716769 in if_handlelink (ctx=<optimized out>) at if-linux.c:833
#15 0x0000559e38706fb9 in dhcpcd_handlelink (arg=0x7ffc28744220) at dhcpcd.c:1083
#16 0x0000559e3870878b in eloop_start (eloop=0x559e38aa3dd0, signals=0x7ffc28744330) at eloop.c:979
#17 0x0000559e387034ea in main (argc=<optimized out>, argv=<optimized out>) at dhcpcd.c:2101
Comment 5 Roy Marples 2019-08-05 00:00:41 UTC
Created attachment 585736 [details, diff]
debug

Please add this patch to get more information.
The output of `ip a` would also help.
Comment 6 nvinson234 2019-08-05 01:54:22 UTC
(In reply to Roy Marples from comment #5)
> Created attachment 585736 [details, diff] [details, diff]
> debug
> 
> Please add this patch to get more information.
> The output of `ip a` would also help.

% sudo /etc/init.d/net.eno1 restart 
 * Caching service dependencies ...                                                                                                                                                                              [ ok ]
 * Bringing down interface eno1
 *   Stopping dhcpcd on eno1 ...
sending signal TERM to pid 30713
dhcpcd not running                                                                                                                                                                                               [ !! ]
 * Bringing up interface eno1
 *   dhcp ...
 *     Running dhcpcd ...
eno1: waiting for carrier
eno1: carrier acquired
DUID 00:01:00:01:20:77:1e:bc:e0:3f:49:19:ab:d0
eno1: IAID 49:19:ab:d0
eno1: adding address fe80::e79c:4bc:7909:4d9c
eno1: soliciting an IPv6 router
eno1: rebinding lease of 172.16.0.5
eno1: probing address 172.16.0.5/27
eno1: Router Advertisement from fe80::72f2:20ff:fee5:4250
eno1: adding address <removed>
eno1: adding address fd00::6918:2056:dae:3912/64
eno1: adding route to <removed>
eno1: adding route to fd00::/64
eno1: adding default route via fe80::72f2:20ff:fee5:4250
eno1: requesting DHCPv6 information
MAKING NA FOR eno1
malloc(): invalid size (unsorted)
/lib/netifrc/net/dhcpcd.sh: line 68:  1664 Aborted                 (core dumped) dhcpcd -t 60 -L -m 2 eno1                                                                                                       [ !! ]
 *   172.16.0.30/27 ...                                                                                                                                                                                          [ ok ]
 *   Waiting for tentative IPv6 addresses to complete DAD (5 seconds) .                                                                                                                                          [ ok ]

-----------------------------

% ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether e0:3f:49:19:ab:d0 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.30/27 brd 172.16.0.31 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fd00::6918:2056:dae:3912/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 3526sec preferred_lft 526sec
    inet6 <removed> scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 3526sec preferred_lft 526sec
    inet6 fe80::e79c:4bc:7909:4d9c/64 scope link 
       valid_lft forever preferred_lft forever
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
Comment 7 Roy Marples 2019-08-05 02:16:39 UTC
(In reply to nvinson234 from comment #6)
> eno1: requesting DHCPv6 information
> MAKING NA FOR eno1
> malloc(): invalid size (unsorted)
> /lib/netifrc/net/dhcpcd.sh: line 68:  1664 Aborted                 (core
> dumped) dhcpcd -t 60 -L -m 2 eno1                                           

There should be two more diagnostic lines before calloc is even called.
Lets rule out an OpenRC logging issue and add this to the top of /etc/dhcpcd.conf

debug
logfile /var/log/dhcpcd.log

Make it crash again and attach that file please.
Comment 8 nvinson234 2019-08-05 02:49:18 UTC
Created attachment 585740 [details]
dhcpcd_debug_on.log

The log as requested.  However, dhcpcd doesn't crash when debug is enabled.
Comment 9 nvinson234 2019-08-05 02:50:13 UTC
Created attachment 585742 [details]
dhcpcd_debug_off.log

A dhcpcd run without debugging enabled.  This one crashes.
Comment 10 Roy Marples 2019-08-05 03:52:43 UTC
(In reply to nvinson234 from comment #9)
> Created attachment 585742 [details]
> dhcpcd_debug_off.log
> 
> A dhcpcd run without debugging enabled.  This one crashes.

Aug 04 19:44:42 [4142]: MAKING NA FOR eno1
Aug 04 19:44:42 [4142]: NA LEN 24 (6)
Aug 04 19:44:42 [4142]: NA LEN FINAL 32

Over to toolchain to explain why a calloc of 32 bytes fails in netifrc only without dhcpcd debug enabled.
Comment 11 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-05 06:32:41 UTC
(In reply to Roy Marples from comment #10)
> (In reply to nvinson234 from comment #9)
> > Created attachment 585742 [details]
> > dhcpcd_debug_off.log
> > 
> > A dhcpcd run without debugging enabled.  This one crashes.
> 
> Aug 04 19:44:42 [4142]: MAKING NA FOR eno1
> Aug 04 19:44:42 [4142]: NA LEN 24 (6)
> Aug 04 19:44:42 [4142]: NA LEN FINAL 32
> 
> Over to toolchain to explain why a calloc of 32 bytes fails in netifrc only
> without dhcpcd debug enabled.

Hard to say without looking at both binaries. Assuming generated code is identical.

Stripped and unstripped binaries have different number of sections in the ELF header. This causes early startup code to allocate different amount of memory at program startup. This might result inn different heap layout and different allocation order.

From "malloc(): invalid size (unsorted)" I am guessing we are dealing with memory corruption here (run out of buffer?) that corrupts internal  malloc() data structures. Thus initial heap layout is different enough not to trigger a problem.

I would suggest running dhcpcd against asan or valgrind in hopes to catch earlier corruption. Someting like:

    # valgrind dhcpcd -t 60 -L -m 2 -dB6 eno1
Comment 12 nvinson234 2019-08-05 07:49:30 UTC
Created attachment 585750 [details]
valgrind.log
Comment 13 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-05 07:58:53 UTC
(In reply to nvinson234 from comment #12)
> Created attachment 585750 [details]
> valgrind.log
...
> ==23041== Command: /etc/init.d/net.eno1 restart

If you run valgrind against the shell script you also need to use --trace-children=yes to get to emulation of dhcpcd:

    # valgrind --trace-children=yes /etc/init.d/net.eno1 restart
Comment 14 nvinson234 2019-08-05 09:03:08 UTC
Created attachment 585758 [details]
valgrind.log
Comment 15 Roy Marples 2019-08-05 12:07:39 UTC
You might have to edit dhcpcd.sh of the netifrc package to add valgrind around line 69:
https://gitweb.gentoo.org/proj/netifrc.git/tree/net/dhcpcd.sh#n69

eval valgrind dhcpcd "${args}" "${IFACE}"
Comment 16 Roy Marples 2019-08-05 12:25:43 UTC
Or add -fsanitize=address to CFLAGS and -lasan to LDFLAGS and re-emerge dhcpcd with compiler address sanitisation. You can check this with ldd /sbin/dhcpcd and ensure it's linked to the asan library.
Comment 17 nvinson234 2019-08-05 13:36:20 UTC
Created attachment 585766 [details]
asan_enabled_run.log
Comment 18 Roy Marples 2019-08-05 14:16:02 UTC
That log is perfect! Show's exactly where the error is, but why it crashes there needs more analysis. Can you email me the DHCPv6 lease file in /var/lib/dhcpcd/eno1.lease6 please? dhcpcd should create or modify that file before crashing with the information needed.
Comment 19 Roy Marples 2019-08-05 15:19:04 UTC
dhcpcd -U6 eno1

should also crash dhcpcd?
Comment 20 nvinson234 2019-08-05 19:22:18 UTC
(In reply to Roy Marples from comment #19)
> dhcpcd -U6 eno1
> 
> should also crash dhcpcd?

It does not.  I'll send the lease directly to you.
Comment 21 nvinson234 2019-08-07 14:15:56 UTC
forgot to mention that dhcpcd required some patching to get it to build with libasan.

The patch is to modify configure so that it checks for dlsym(NULL, "dlsym").  Otherwise, it'll drop the -ldl flag because libasan defines dlopen() as well.
Comment 22 nvinson234 2019-08-07 14:25:36 UTC
Created attachment 585976 [details]
asan_enabled_dhcpcd_and_openrc_run.log

A new run where both openrc and dhcpcd were build with libasan instrumentation.
Comment 23 Roy Marples 2019-08-08 00:05:04 UTC
Created attachment 586102 [details, diff]
more debug

I've examined the lease extensively and concluded it is fine and on 4 different OS by itself it won't crash dhcpcd or trigger any asan issues. Valgrind also reports no errors.

Please add this patch and post a new log.
Comment 24 nvinson234 2019-08-08 01:28:29 UTC
Created attachment 586106 [details]
dhcpcd_w_patches.log

Requested log added.
Comment 25 Roy Marples 2019-08-08 01:46:13 UTC
> Aug 05 00:35:33 [15756]: dhcp6_listen: Address already in use

Looks like another dhcpcd instance is already running? Unsure how it's related, but can you crash it at all with this debug patch?
Comment 26 nvinson234 2019-08-08 02:18:31 UTC
Created attachment 586108 [details]
dhcpcd_w_patches.log

updated the log and made sure dhcpcd wasn't running.
Comment 27 Roy Marples 2019-08-08 02:24:03 UTC
Everything looks normal and no crash?
Can you repeat the actions that caused the crash please? Kill all dhcpcd and restart net.eno1
Comment 28 nvinson234 2019-08-08 02:33:56 UTC
That is a log of a crashed session.  I still have the sanitizer library built into dhcpcd.  Do you want me to rebuild without it and try again?
Comment 29 nvinson234 2019-08-08 02:41:30 UTC
Created attachment 586110 [details]
openrc_output.log

I pulled the openrc output of that run from my screen buffer.  I'm attaching it.
Comment 30 Roy Marples 2019-08-08 13:34:06 UTC
Created attachment 586126 [details, diff]
Even more debug

So what seems to be happening is that we count the NULLs in a first pass and ensure we have enough space to store a pointer for each + 1 (NULL terminator).

Then we loop through the same data, again looking for NULLs and assinging the pointers. But even though we are aborting the same loop one character earlier, we are couting an extra NULL which is causing this issue. This is backed up by the below debug which counts 17 in the first pass but the difference count is 18.

nenv 17 envlen 14
nenv 17 envlen 17
buf 0x625000002900 endp 0x625000002b04
script 0x60d000002810 end 0x60d000002898 envp 0x60d0000028a0
diff 18 

This new patch shows where it thinks the NULLs are within the data AND the data itself (twice for validation). So it's a lot more debug, but will hopefully show where it's going wrong.
Comment 31 nvinson234 2019-08-08 14:15:39 UTC
Created attachment 586130 [details]
openrc_20190808T141433.log
Comment 32 nvinson234 2019-08-08 14:16:14 UTC
Created attachment 586132 [details]
dhcpcd_20190808T141433.log
Comment 33 Roy Marples 2019-08-08 15:35:40 UTC
Created attachment 586138 [details, diff]
Fix the issue

Many thanks for the prompt reports!

So the problem was that an env string was not terminated correctly.
I've added assertations to ensure everything IS terminated correctly and fixed the underlying issue.

Let me know if this fixes it for you!
Comment 34 nvinson234 2019-08-08 19:26:58 UTC
The patch appears to be working.  /etc/init.d/net.eno1 runs successfully now.  Thanks.
Comment 35 Larry the Git Cow gentoo-dev 2019-08-09 15:17:09 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2ce3a8e57adc2ea9389c35301ce081ab89b42931

commit 2ce3a8e57adc2ea9389c35301ce081ab89b42931
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-08-09 15:16:40 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-08-09 15:17:03 +0000

    net-misc/dhcpcd: Revbump to fix incorrect string termination
    
    Bug: https://bugs.gentoo.org/691426
    Package-Manager: Portage-2.3.71, Repoman-2.3.17
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 ...{dhcpcd-8.0.2.ebuild => dhcpcd-8.0.2-r1.ebuild} |  4 ++
 .../files/dhcpcd-8.0.2-string_termination.patch    | 80 ++++++++++++++++++++++
 2 files changed, 84 insertions(+)