Since upgrading to net-misc/dhcpcd-8.1.1, I have noticed that sometimes after resuming from suspend-to-RAM dhcpcd uses 100% CPU. Killing and restarting dhcpcd always made the problem go away, until a couple of suspends later when the same thing happens again. I rebuilt dhcpcd with debugging symbols, waited for the problem to occur, attached a GDB, and saw this: (gdb) bt #0 ipv4ll_drop (ifp=0x56244c322100) at ipv4ll.c:496 #1 0x000056244b928151 in dhcpcd_drop (ifp=0x56244c322100, stop=<optimized out>) at dhcpcd.c:403 #2 0x000056244b938f99 in link_netlink (arg=0x7ffced429ef0, nlm=0x7ffced42a010, ctx=0x7ffced42e0f0) at if-linux.c:861 #3 link_netlink (ctx=ctx@entry=0x7ffced42e0f0, arg=arg@entry=0x0, nlm=nlm@entry=0x7ffced42a010) at if-linux.c:764 #4 0x000056244b9392ea in get_netlink (ctx=0x7ffced42e0f0, iov=iov@entry=0x7ffced42a000, arg=arg@entry=0x0, fd=8, flags=flags@entry=64, callback=callback@entry=0x56244b938b20 <link_netlink>) at if-linux.c:466 #5 0x000056244b939b9f in if_handlelink (ctx=<optimized out>) at if-linux.c:876 #6 0x000056244b92a1b9 in dhcpcd_handlelink (arg=0x7ffced42e0f0) at dhcpcd.c:1083 #7 0x000056244b92b98b in eloop_start (eloop=0x56244c317dd0, signals=0x7ffced42e200) at eloop.c:979 #8 0x000056244b9265a4 in main (argc=<optimized out>, argv=<optimized out>) at dhcpcd.c:2104 (gdb) info locals ia = 0x56244c326b20 ian = 0x56244c326b20 state = <optimized out> dropped = false istate = <optimized out> (gdb) print *ia $21 = {next = {tqe_next = 0x56244c326b20, tqe_prev = 0x56244c326b20}, addr = {s_addr = 521350666}, mask = {s_addr = 16777215}, brd = { s_addr = 4279447050}, iface = 0x56244c322100, addr_flags = 1684086842, flags = 4294967295, vltime = 7200, pltime = 6300, saddr = "10.46.19.31/24\000\066.19"} (gdb) step 52 return __builtin_bswap32 (__bsx); (gdb) step 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) step 52 return __builtin_bswap32 (__bsx); (gdb) step 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) step 52 return __builtin_bswap32 (__bsx); (gdb) step 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) print ia $22 = (struct ipv4_addr *) 0x56244c326b20 That structure at 0x56244c326b20 never changes, and ia and ian continues to point to it in every iteration of the loop. This is a simple laptop with one wired interface (unplugged) and one WiFi interface, which has the IP address seen in the structure above: # ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s25: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff 3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff inet 10.46.19.31/24 brd 10.46.19.255 scope global dynamic noprefixroute wlp3s0 valid_lft 3900sec preferred_lft 3000sec 4: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0 Reproducible: Sometimes gcc (Gentoo Hardened 9.2.0-r1 p2) 9.2.0 Portage 2.3.78 (python 3.6.9-final-0, default/linux/amd64/17.1/hardened, gcc-9.2.0, glibc-2.29-r6, 5.3.7-gentoo x86_64) ================================================================= System Settings ================================================================= System uname: Linux-5.3.7-gentoo-x86_64-Intel-R-_Core-TM-_i7-2620M_CPU_@_2.70GHz-with-gentoo-2.6 KiB Mem: 16334084 total, 5666240 free KiB Swap: 16777212 total, 16777212 free Timestamp of repository gentoo: Tue, 29 Oct 2019 04:15:01 +0000 Head commit of repository gentoo: adc0493532194f5c4b33a3ae3d2b0fefe21a7d2b Head commit of repository creideiki: cecb6b7d9b7fbe6a599943114069d2c13c2bad89 sh bash 5.0_p11 ld GNU ld (Gentoo 2.32 p2) 2.32.0 app-shells/bash: 5.0_p11::gentoo dev-java/java-config: 2.2.0-r4::gentoo dev-lang/perl: 5.30.0::gentoo dev-lang/python: 2.7.16::gentoo, 3.6.9::gentoo dev-util/cmake: 3.15.4::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.42.1::gentoo sys-apps/sandbox: 2.18::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.69-r4::gentoo sys-devel/automake: 1.13.4-r2::gentoo, 1.16.1-r1::gentoo sys-devel/binutils: 2.32-r1::gentoo sys-devel/gcc: 9.2.0-r1::gentoo sys-devel/gcc-config: 2.1::gentoo sys-devel/libtool: 2.4.6-r5::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 5.3::gentoo (virtual/os-headers) sys-libs/glibc: 2.29-r6::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage priority: -1000 sync-rsync-verify-jobs: 1 sync-rsync-verify-metamanifest: yes sync-rsync-extra-opts: --timeout=10 sync-rsync-verify-max-age: 24 creideiki location: /usr/local/portage sync-type: git sync-uri: https://github.com/creideiki/portage masters: gentoo rion location: /var/lib/layman/rion masters: gentoo priority: 50 seden location: /var/lib/layman/seden masters: gentoo priority: 50 steam-overlay location: /var/lib/layman/steam-overlay masters: gentoo priority: 50 torbrowser location: /var/lib/layman/torbrowser masters: gentoo priority: 50 ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="@FREE" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=native -O2 -pipe" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--alphabetical --keep-going --quiet-build=n --verbose-conflicts" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://mirror.mdfnet.se/gentoo http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo http://distfiles.gentoo.org" LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-O1 -Wl,--hash-style=gnu -Wl,--enable-new-dtags" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_EXTRA_OPTS="--timeout=10" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="X acl alsa amd64 bzip2 cairo consolekit crypt cups cxx dbus dri dri3 egl flac fontconfig gif hardened iconv ipv6 jpeg kde libtirpc lm-sensors mp3 multilib ncurses nls nptl ogg opengl openmp pam pcre pie png policykit qt3support qt5 readline seccomp split-usr ssl ssp tiff truetype udisks unicode upower vaapi vorbis xattr xcb xkb xtpax zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="hda-intel" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" CAMERAS="canon" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc" INPUT_DEVICES="evdev wacom" KERNEL="linux" L10N="en en-US en-GB sv sv-SE" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" QEMU_SOFTMMU_TARGETS="i386 x86_64" RUBY_TARGETS="ruby26" SANE_BACKENDS="hp" USERLAND="GNU" VIDEO_CARDS="intel" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS ================================================================= Package Settings ================================================================= net-misc/dhcpcd-8.1.1::gentoo was built with the following: USE="embedded ipv6 udev" ABI_X86="(64)" CFLAGS="-ggdb -march=native -O2 -pipe" CXXFLAGS="-ggdb -march=native -O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news nostrip parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
Is this reproduceable with the 9999 ebuild?
(In reply to Roy Marples from comment #1) > Is this reproduceable with the 9999 ebuild? I'll check that next. I currently have a build of 8.1.1 without optimisation, and intended to let that run for a day or two and see if I can reproduce it on that.
(In reply to Karl-Johan Karlsson from comment #2) > (In reply to Roy Marples from comment #1) > > Is this reproduceable with the 9999 ebuild? > > I'll check that next. I currently have a build of 8.1.1 without > optimisation, and intended to let that run for a day or two and see if I can > reproduce it on that. I could. On 8.1.1 built without optimisation, the infinite loop looks like this: (gdb) bt #0 ipv4ll_drop (ifp=0x56309c303b20) at ipv4ll.c:496 #1 0x000056309b5e1bed in dhcpcd_drop (ifp=0x56309c303b20, stop=0) at dhcpcd.c:403 #2 0x000056309b5e2906 in dhcpcd_handlecarrier (ctx=0x7fff575a2870, carrier=-1, flags=4099, ifname=0x7fff5759e5f0 "wlp3s0") at dhcpcd.c:752 #3 0x000056309b5fb106 in link_netlink (ctx=0x7fff575a2870, arg=0x0, nlm=0x7fff5759e700) at if-linux.c:861 #4 0x000056309b5fa2e1 in get_netlink (ctx=0x7fff575a2870, iov=0x7fff5759e6f0, arg=0x0, fd=7, flags=64, callback=0x56309b5fad43 <link_netlink>) at if-linux.c:466 #5 0x000056309b5fb1a8 in if_handlelink (ctx=0x7fff575a2870) at if-linux.c:876 #6 0x000056309b5e36e7 in dhcpcd_handlelink (arg=0x7fff575a2870) at dhcpcd.c:1083 #7 0x000056309b5e811d in eloop_start (eloop=0x56309c2f1dd0, signals=0x7fff575a2980) at eloop.c:979 #8 0x000056309b5e6287 in main (argc=2, argv=0x7fff575a2c28) at dhcpcd.c:2104 (gdb) info locals ia = 0x56309c2fcf90 ian = 0x56309c2fcf90 state = 0x0 dropped = false istate = 0x56309c306080 (gdb) print *ia $7 = {next = {tqe_next = 0x56309c2fcf90, tqe_prev = 0x56309c2fcf90}, addr = {s_addr = 655568394}, mask = {s_addr = 16777215}, brd = { s_addr = 4279447050}, iface = 0x56309c303b20, addr_flags = 1701584954, flags = 4294967295, vltime = 7200, pltime = 6300, saddr = "10.46.19.39/24\000econ"} (gdb) step 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { Trying the -9999 version now.
Happens on 9999 as well. Specifically, this version: * Checking out https://roy.marples.name/git/dhcpcd.git to /var/tmp/portage/net-misc/dhcpcd-9999/work/dhcpcd-9999 ... git checkout --quiet refs/git-r3/HEAD GIT update --> repository: https://roy.marples.name/git/dhcpcd.git at the commit: a6edfed6e331d3716815a91e9233af52a2f39510 >>> Source unpacked in /var/tmp/portage/net-misc/dhcpcd-9999/work GDB still looks the same: (gdb) bt #0 0x0000557867f20984 in ipv4ll_drop (ifp=0x55786813eb60) at ipv4ll.c:496 #1 0x0000557867ef4c43 in dhcpcd_drop (ifp=0x55786813eb60, stop=0) at dhcpcd.c:409 #2 0x0000557867ef595c in dhcpcd_handlecarrier (ctx=0x7ffe3a0c6180, carrier=-1, flags=4099, ifname=0x7ffe3a0c1f00 "wlp3s0") at dhcpcd.c:758 #3 0x0000557867f0ea84 in link_netlink (ctx=0x7ffe3a0c6180, arg=0x0, nlm=0x7ffe3a0c2010) at if-linux.c:895 #4 0x0000557867f0dbe6 in get_netlink (ctx=0x7ffe3a0c6180, iov=0x7ffe3a0c2000, arg=0x0, fd=7, flags=64, callback=0x557867f0e6c1 <link_netlink>) at if-linux.c:476 #5 0x0000557867f0eb26 in if_handlelink (ctx=0x7ffe3a0c6180) at if-linux.c:910 #6 0x0000557867ef673d in dhcpcd_handlelink (arg=0x7ffe3a0c6180) at dhcpcd.c:1089 #7 0x0000557867efb3d2 in eloop_start (eloop=0x55786812cdd0, signals=0x7ffe3a0c6290) at eloop.c:997 #8 0x0000557867ef92dd in main (argc=2, argv=0x7ffe3a0c6538) at dhcpcd.c:2110 (gdb) step 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) 497 if (IN_LINKLOCAL(ntohl(ia->addr.s_addr))) { (gdb) 496 TAILQ_FOREACH_SAFE(ia, &istate->addrs, next, ian) { (gdb) info locals ia = 0x557868137fe0 ian = 0x557868137fe0 state = 0x0 dropped = false istate = 0x557868140720 __PRETTY_FUNCTION__ = "ipv4ll_drop" (gdb) print *ia $1 = {next = {tqe_next = 0x557868137fe0, tqe_prev = 0x557868137fe0}, addr = {s_addr = 655568394}, mask = {s_addr = 16777215}, brd = { s_addr = 4279447050}, iface = 0x55786813eb60, addr_flags = 1679833648, flags = 4294967295, vltime = 7200, pltime = 6300, saddr = "10.46.19.39/24\000/24"}
Ok. Can you add -fsanitize=address to CFLAGS and LDFLAGS please and then re-emerge dhcpcd. Them restart it and hopefully it will give more information as to the problem.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=549bd76ddcf0c425e58ed38325b8d5d947be21d9 commit 549bd76ddcf0c425e58ed38325b8d5d947be21d9 Author: Thomas Deutschmann <whissi@gentoo.org> AuthorDate: 2019-11-03 18:10:47 +0000 Commit: Thomas Deutschmann <whissi@gentoo.org> CommitDate: 2019-11-03 18:10:47 +0000 package.mask: Mask =net-misc/dhcpcd-8.1.1 Bug: https://bugs.gentoo.org/698856 Signed-off-by: Thomas Deutschmann <whissi@gentoo.org> profiles/package.mask | 5 +++++ 1 file changed, 5 insertions(+)
(In reply to Roy Marples from comment #5) > Ok. Can you add -fsanitize=address to CFLAGS and LDFLAGS please and then > re-emerge dhcpcd. Them restart it and hopefully it will give more > information as to the problem. LDFLAGS needed "-ldl" as well, but I now have a "dhcpcd --nobackground" running under "screen". It'll probably take a day or two to exhibit the problem.
(In reply to Karl-Johan Karlsson from comment #7) > (In reply to Roy Marples from comment #5) > > Ok. Can you add -fsanitize=address to CFLAGS and LDFLAGS please and then > > re-emerge dhcpcd. Them restart it and hopefully it will give more > > information as to the problem. > > LDFLAGS needed "-ldl" as well, but I now have a "dhcpcd --nobackground" > running under "screen". It'll probably take a day or two to exhibit the > problem. That was fast, but unfortunately, gave no extra information. dhcpcd printed: wlp3s0: carrier lost wlp3s0: executing `/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER wlp3s0: deleting address 2001:XXXX/64 wlp3s0: deleting route to 2001:XXXX wlp3s0: deleting default route via fe80::YYYY wlp3s0: executing `/lib/dhcpcd/dhcpcd-run-hooks' ROUTERADVERT and then hung. AddressSanitizer printed nothing.
This should be fixed in https://roy.marples.name/cgit/dhcpcd.git/commit/src?id=73ac184333f77b38a8b4c4202c2928278e2237ca which you can find in the 9999 ebuild. Let me know if it now works please!
(In reply to Roy Marples from comment #9) > This should be fixed in > https://roy.marples.name/cgit/dhcpcd.git/commit/ > src?id=73ac184333f77b38a8b4c4202c2928278e2237ca which you can find in the > 9999 ebuild. > > Let me know if it now works please! Great, thanks! I am running this version now, and will see if it hangs again in the next couple of days.
I've had two machines running 73ac184333f77b38a8b4c4202c2928278e2237ca for two days now, with no hangs. That is not the world's greatest collection of data points, but I'm feeling optimistic.
I can add 20 machines, no problems yet. But it will take 72h for me to confirm because in the past, problem occurred between 48-72h runtime.
Looking good for me. No issues in this week with -9999.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=44d7b48c59c5fba4fca240029f886f7832ed87e3 commit 44d7b48c59c5fba4fca240029f886f7832ed87e3 Author: Lars Wendler <polynomial-c@gentoo.org> AuthorDate: 2019-11-13 11:07:29 +0000 Commit: Lars Wendler <polynomial-c@gentoo.org> CommitDate: 2019-11-13 11:12:46 +0000 net-misc/dhcpcd: Bump to version 8.1.2 Bug: https://bugs.gentoo.org/698856 Package-Manager: Portage-2.3.79, Repoman-2.3.18 Signed-off-by: Lars Wendler <polynomial-c@gentoo.org> net-misc/dhcpcd/Manifest | 1 + net-misc/dhcpcd/dhcpcd-8.1.2.ebuild | 144 ++++++++++++++++++++++++++++++++++++ 2 files changed, 145 insertions(+)
I have not seen this problem reappear for several weeks with either 9999 or 8.1.2. I believe it to be solved.