Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 697672

Summary: net-misc/dhcpcd-8.1.0: checksum failure from dhcp.server.ip.addy
Product: Gentoo Linux Reporter: Tsukasa <mcp_reznor>
Component: Current packagesAssignee: William Hubbs <williamh>
Status: RESOLVED FIXED    
Severity: critical CC: 85nrdzgv, base-system, casta, christian, creideiki+gentoo-bugzilla, hydrapolic, it, jwbraun, kfm, lekto, marduk, mcp_reznor, moiman, rei4dan, roy, slyfox, toolchain, volkov, vorner, whissi, xaviermiller
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 915000    
Attachments: Force boolean
Fixes issues
Fix aliasing and sanity.
dhcp.c

Description Tsukasa 2019-10-13 21:06:13 UTC
During standard emerge updates, went from dhcpcd-8.0.6 to 8.1.0, upon reboot received messages stating checksum failure from dhcp.server.ip.addy, it kept nfs mounts from working, etc.

downgrading to 8.0.6 restored working functionality.  Please remove/hardmask/fix/report to author that 8.1.0 doesn't function properly

Thanks
Comment 1 Tsukasa 2019-10-13 21:08:42 UTC
Portage 2.3.76 (python 3.6.9-final-0, default/linux/amd64/17.1/desktop, gcc-9.2.0, glibc-2.29-r5, 5.3.4-gentoo x86_64)
=================================================================
System uname: Linux-5.3.4-gentoo-x86_64-AMD_FX-tm-8350_Eight-Core_Processor-with-gentoo-2.6
KiB Mem:     8117860 total,   6131180 free
KiB Swap:    2097148 total,   2097148 free
Timestamp of repository gentoo: Sun, 13 Oct 2019 19:15:47 +0000
Head commit of repository gentoo: 41bc579929e8b0cbcbb0ac695b03c4fc20ed5a34

Timestamp of repository eclipse: Fri, 13 Sep 2019 19:44:02 +0000
Head commit of repository eclipse: e35205466a070b3d0976ad3652ff150120ce8b15

Head commit of repository steam-overlay: 5b099b835b5b8dc7f37390e37e44cebba7cc315d

Head commit of repository x11: afbaafaa912dc09a19fee7ee28ef94c44a134a06

sh bash 5.0_p11
ld GNU ld (Gentoo 2.32 p2) 2.32.0
app-shells/bash:          5.0_p11::gentoo
dev-java/java-config:     2.2.0-r4::gentoo
dev-lang/perl:            5.30.0::gentoo
dev-lang/python:          2.7.16::gentoo, 3.6.9::gentoo
dev-util/cmake:           3.15.4::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.18::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.11.6-r3::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.32-r1::gentoo
sys-devel/gcc:            9.2.0-r1::gentoo
sys-devel/gcc-config:     2.1::gentoo
sys-devel/libtool:        2.4.6-r5::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.3::gentoo (virtual/os-headers)
sys-libs/glibc:           2.29-r5::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: git
    sync-uri: https://anongit.gentoo.org/git/repo/gentoo.git
    priority: -1000

eclipse
    location: /var/db/repos/eclipse
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/eclipse.git
    masters: gentoo

localrepo
    location: /usr/local/portage
    masters: gentoo

steam-overlay
    location: /usr/local/steam-overlay
    sync-type: git
    sync-uri: https://github.com/anyc/steam-overlay/
    masters: gentoo

x11
    location: /usr/local/x11-overlay
    sync-type: git
    sync-uri: https://anongit.gentoo.org/git/proj/x11.git
    masters: gentoo

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=native -mtune=native"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=native -mtune=native"
DISTDIR="/usr/portagedistfiles"
EMERGE_DEFAULT_OPTS="--jobs 2 --load-average 4 --nospinner --fail-clean --with-bdeps=y --autounmask-keep-masks"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fail-clean fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="en_US.utf8"
LDFLAGS="-O2 -march=native -mtune=native"
MAKEOPTS="-j8 -l8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X a52 aac acl acpi amd64 berkdb bluetooth branding bzip2 cairo cdda cdr cli consolekit crypt cups cxx dbus dri dts dvd dvdr emboss encode exif fam flac fortran gdbm gif glamor gpm gtk iconv icu introspection jpeg lcms ldap libnotify libtirpc lm_sensors mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf png policykit ppds pulseaudio qt5 readline sdl seccomp spell split-usr ssl startup-notification svg tcpd threads tiff truetype udev udisks unicode upower usb vaapi vdpau vorbis wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx f16c fma3 fma4 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 xop" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby24 ruby25" SANE_BACKENDS="pixma" USERLAND="GNU" VIDEO_CARDS="amdgpu radeonsi" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 2 Roy Marples 2019-10-13 21:41:22 UTC
Checksums are working fine here.
Can you email me private a wireshark trace of the DHCP transaction please?
Comment 3 Eric Benoit 2019-10-14 08:44:13 UTC
I too have run into this same issue, at least on x86_64. Rebuilding dhcpcd with CFLAGS set to -O0 or -O1 yields a working dhcpcd, but with -O2 we get a checksum failure.
Comment 4 Eric Benoit 2019-10-14 11:58:41 UTC
It seems it may have something to do with the inlining of in_cksum. If I add the volatile keyword to sum within (or otherwise force it to not be inlined) the issue disappears.
Comment 5 Christian Bricart 2019-10-14 14:33:34 UTC
ARCH=~x86 is also affected
Comment 6 Tsukasa 2019-10-14 21:14:03 UTC
I finally have taken a tshark capture 172MB, sent to Roy Marples.

Glad someone else was affected, I was wondering if somehow my systems were unique in some way that was changing the outcome (running nfsroot systems)
Comment 7 Tomáš Mózes 2019-10-15 04:14:57 UTC
dhcpcd-8.0.6 works
dhcpcd-8.1.0 fails with checksum failure


Portage 2.3.76 (python 3.6.9-final-0, default/linux/amd64/17.0, gcc-9.2.0, glibc-2.29-r5, 4.19.66-gentoo x86_64)
=================================================================
                         System Settings
=================================================================
System uname: Linux-4.19.66-gentoo-x86_64-Intel-R-_Core-TM-_i5-3570_CPU_@_3.40GHz-with-gentoo-2.6
KiB Mem:    16318796 total,  14139304 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Tue, 15 Oct 2019 00:15:01 +0000
Head commit of repository gentoo: 491c3b2c497837cff2b3bc2d7da90154da353e41
sh bash 5.0_p11
ld GNU ld (Gentoo 2.32 p2) 2.32.0
app-shells/bash:          5.0_p11::gentoo
dev-java/java-config:     2.2.0-r4::gentoo
dev-lang/perl:            5.30.0::gentoo
dev-lang/python:          2.7.16::gentoo, 3.5.7::gentoo, 3.6.9::gentoo, 3.7.4-r1::gentoo
dev-util/cmake:           3.15.4::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.18::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.13.4-r2::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.32-r1::gentoo
sys-devel/gcc:            9.2.0-r1::gentoo
sys-devel/gcc-config:     2.1::gentoo
sys-devel/libtool:        2.4.6-r5::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.3::gentoo (virtual/os-headers)
sys-libs/glibc:           2.29-r5::gentoo

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-mtune=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.2/ext-active/ /etc/php/apache2-php7.3/ext-active/ /etc/php/cgi-php7.2/ext-active/ /etc/php/cgi-php7.3/ext-active/ /etc/php/cli-php7.2/ext-active/ /etc/php/cli-php7.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-mtune=native -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://tux.rainside.sk/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="acl amd64 berkdb bzip2 cli crypt cxx dri fortran gdbm iconv ipv6 libtirpc multilib ncurses nptl openmp pam pcre readline seccomp split-usr ssl tcpd unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_core authn_dbm authn_file authz_core authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif socache_shmcb speling status unique_id unixd userdir usertrack vhost_alias watchdog proxy proxy_connect proxy_fcgi proxy_http" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx f16c popcnt sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64 xen pc" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" NGINX_MODULES_HTTP="access charset gzip limit_conn limit_req log proxy rewrite gunzip upstream_check stub_status geoip2" NGINX_MODULES_STREAM="geoip2" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2 php7-3" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6 python3_7" RUBY_TARGETS="ruby25 ruby26" USERLAND="GNU" VIDEO_CARDS="intel i915 i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

=================================================================
                        Package Settings
=================================================================

net-misc/dhcpcd-8.0.6::gentoo was built with the following:
USE="embedded ipv6 udev" ABI_X86="(64)"
Comment 8 Tomáš Mózes 2019-10-15 04:42:57 UTC
Just to note that the dhcp server runs on Microsoft Windows servers.
Comment 9 Christian Bricart 2019-10-15 07:48:55 UTC
there is some difference in declaration of »uh_sum« and »csum« which comparison finally gets »return«ed in checksums_valid():

checksums_valid(...)
{
  ...
  uint16_t ..., uh_sum;
  ...
  uint32_t csum;
  ...
  return csum == uh_sum;
}

might that be something that is affected to be "optimized" by choosing a higher "-O" in CFLAGS, as this checksum failure seems to happen only with -O2 rather than -O0 or -O1 

(also GCC-9.2.0 with -O2 here)
Comment 10 Roy Marples 2019-10-15 08:02:45 UTC
Created attachment 592746 [details, diff]
Force boolean

I cannot repliate this problem at my end.
Does this patch help at all?
Comment 11 Roy Marples 2019-10-15 09:09:56 UTC
Created attachment 592750 [details, diff]
Fixes issues

This patch should fix things.
Comment 12 Andrey Volkov 2019-10-15 10:30:43 UTC
(In reply to Roy Marples from comment #11)
> Created attachment 592750 [details, diff] [details, diff]
> Fixes issues
> 
> This patch should fix things.

The patch didn't fix checksum error for me
(GCC-9.2.0-r1, CFLAGS=-O2)

dnsmasq at openwrt as DHCP server here
Comment 13 Ionen Wolkens gentoo-dev 2019-10-15 11:12:26 UTC
Just here to confirm same issues, updated earlier and this started happening. Tried with -O1 and it now works again, the above patch with -O2 didn't help.

I would suggest masking 8.1.0 as soon as possible so fewer affected people are left attempting to fix things without easy connectivity.
Comment 14 Christian Bricart 2019-10-15 11:27:43 UTC
the patch does not fix for me either.

however, I second the proposal that code inlining might be the root cause from comment 4 

if I add "-fno-inline-small-functions" to CFLAGS while keeping "-O2", the resulting binary also works - just like using "-O1".
Comment 15 David Seifert gentoo-dev 2019-10-15 11:59:01 UTC
(In reply to Roy Marples from comment #11)
> Created attachment 592750 [details, diff] [details, diff]
> Fixes issues
> 
> This patch should fix things.

I see things like

  udp = (struct udphdr *)(void *)((char *)ip + ip_hlen);

in checksums_valid(). Are you sure that cast is valid and does not violate strict-aliasing rules?
Comment 16 Ionen Wolkens gentoo-dev 2019-10-15 12:09:03 UTC
(In reply to David Seifert from comment #15)
> in checksums_valid(). Are you sure that cast is valid and does not violate
> strict-aliasing rules?
Likely on to something because -O2 -fno-strict-aliasing produces a working dhcpcd for me.
Comment 17 Christian Bricart 2019-10-15 12:43:18 UTC
I've just checked: it DOES help to prevent in_cksum(..) being inlined automagically by implicit "-finline-small-functions" set by -O2+

changing the declaration from:
> static uint16_t in_cksum(const void *data, size_t len, uint32_t *isum) { .. }
to:
> static __attribute__ ((noinline)) uint16_t in_cksum(const void *data, size_t len, uint32_t *isum) { .. }

produced a working binary - though this stanza is GCC specific.

googled for a portable way and found it might be called through a pointer will also keep it from being inlined:

> static uint16_t in_cksum_ptr(const void *data, size_t len, uint32_t *isum) { .. }
> static uint16_t (*in_cksum)() = in_cksum_ptr;

this also works for me.
Comment 18 Roy Marples 2019-10-15 13:18:20 UTC
Created attachment 592758 [details, diff]
Fix aliasing and sanity.

New patch also copies out UDP header to avoid any aliasing issue.

I don't know why this just broke ..... the UDP reference has been working fine for ages.
Comment 19 Ionen Wolkens gentoo-dev 2019-10-15 13:30:06 UTC
Unfortunately the problem appears to be persisting for me with the new patch, unless I pass -fno-strict-aliasing still.
Comment 20 Roy Marples 2019-10-15 14:33:44 UTC
Is everone here using gcc-9? The newest gcc I have is 8.3.
Comment 21 Ionen Wolkens gentoo-dev 2019-10-15 14:53:05 UTC
(In reply to Roy Marples from comment #20)
> Is everyone here using gcc-9? The newest gcc I have is 8.3.
I would assume so, otherwise they'd "likely" be using packages marked stable which mean they'd also still be using dhcpcd 7.2.3 instead. I don't have 8.3 around anymore but let me get this setup and I'll check if it works with it.
Comment 22 Albert W. Hopkins 2019-10-15 15:07:51 UTC
(In reply to Roy Marples from comment #20)
> Is everone here using gcc-9? The newest gcc I have is 8.3.

I am on gcc 9.2.  However I havent yet had time to evaluate the patch(es) to see if anything changed.
Comment 23 Ionen Wolkens gentoo-dev 2019-10-15 15:12:11 UTC
(In reply to Roy Marples from comment #20)
> Is everone here using gcc-9? The newest gcc I have is 8.3.
Well there we have it, it works entirely fine with 8.3 without any patches and -O2. So this is gcc-9 specific.
Comment 24 Roy Marples 2019-10-15 16:44:49 UTC
I'll need someone to explain to me why this fails with gcc-9 AND if the expectation is on me to fix it and if so how.

For the time being I'll assume this is a gcc bug as it works fine with all other compilers thus far.
Comment 25 David Seifert gentoo-dev 2019-10-15 17:36:48 UTC
(In reply to Roy Marples from comment #24)
> I'll need someone to explain to me why this fails with gcc-9 AND if the
> expectation is on me to fix it and if so how.
> 
> For the time being I'll assume this is a gcc bug as it works fine with all
> other compilers thus far.

Have you tried running this code under UBSAN? It is very likely that dhcpcd triggers undefined behavior, given how GCC has been aggressively optimizing code in the presence of undefined behavior in recent releases. Disabling inlining seems to support this argument.
Comment 26 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev 2019-10-15 19:15:16 UTC
Let's ask toolchain team for their opinion.
Comment 27 Sergei Trofimovich (RETIRED) gentoo-dev 2019-10-15 21:07:51 UTC
I don't thing existing -fsanitize= would catch the bug. Compiler assumes loads and stores to different types are independent.

llvm has something as a work in progress for that: https://reviews.llvm.org/D32199

Does dhcpcd have minimal unit test that exposes failure with and without -fno-strict-aliasing?

When I build dhcp.c locally with and without -fno-strict-aliasing I see a lot of loads and stores reordered across local function calls.

Having concrete function call would help a lot.
Comment 28 Yuriy Dmitriev 2019-10-15 21:29:42 UTC
Confirm problem.


Please mask =net-misc/dhcpcd-8.1.0

HW: Mikrotik RB951Ui-2HnD with latest stable firmware(6.45.6) <=======> Intel Centrino Advanced-N 6205


Prev version dhcpcd-8.0.6 work nice.

With best wishes - Y.D.
Comment 29 Guilherme Thomazi Bonicontro 2019-10-15 21:55:31 UTC
Confirmed on my end too. Was going crazy debugging it, should have come here sooner.
Comment 30 Roy Marples 2019-10-15 23:50:07 UTC
(In reply to Sergei Trofimovich from comment #27)
> Does dhcpcd have minimal unit test that exposes failure with and without
> -fno-strict-aliasing?
> 
> When I build dhcp.c locally with and without -fno-strict-aliasing I see a
> lot of loads and stores reordered across local function calls.
> 
> Having concrete function call would help a lot.

I edited dhcp.c to save the packet and captured an invalid checksum.
I copied the checksum code to a small test case file verbtaim and the passed the saved packet to it. Checksum code worked fine, but fails on exactly the same data in dhcpcd. I cannot explain this.

I do have another data point though:
Alpine Linux with gcc-9.2 works fine with dhcpcd-8.1.0.
Arch Linux with gcc-9.2 works fine with dhcpcd-8.1.0.
Devuan Linux with gcc-9.2 fails with dhcpcd-8.1.0.
Comment 31 Tomáš Mózes 2019-10-16 03:25:48 UTC
Thank you for trying to debug and fix the issue, but since several people ran into this, could you please mask the package for the time being? Thanks
Comment 32 Sergei Trofimovich (RETIRED) gentoo-dev 2019-10-16 07:12:12 UTC
(In reply to Roy Marples from comment #30)
> (In reply to Sergei Trofimovich from comment #27)
> > Does dhcpcd have minimal unit test that exposes failure with and without
> > -fno-strict-aliasing?
> > 
> > When I build dhcp.c locally with and without -fno-strict-aliasing I see a
> > lot of loads and stores reordered across local function calls.
> > 
> > Having concrete function call would help a lot.
> 
> I edited dhcp.c to save the packet and captured an invalid checksum.
> I copied the checksum code to a small test case file verbtaim and the passed
> the saved packet to it. Checksum code worked fine, but fails on exactly the
> same data in dhcpcd. I cannot explain this.

Looking at generated dhcp.c code everything is inlined into dhcp_readbpf(). Thus you might need to call it as-is.

If I read correctly a diff of '-fverbose-asm' vs '-fverbose-asm -fno-strict-aliasing' checksums_valid() lost reads of ip->ip_src / ip->ip_dst around:

  static bool
  checksums_valid(void *packet,
    struct in_addr *from, unsigned int flags)
  {
    struct ip *ip = packet;
    struct ip pseudo_ip = {
        .ip_p = IPPROTO_UDP,
        .ip_src = ip->ip_src, // <- this 32-bit store
        .ip_dst = ip->ip_dst  // <- and this 32-bit store
    };
    ...
    in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
    ...

and left them uninitialized on stack.

in_cksum() treats 'pseudo_ip' effectively as unint16_t[]. I could see why gcc might decide to consider 32-bit stores and 16-bit loads as not affecting one another and removing "dead" stored of 'pseudo_ip.ip_src'.

I think the only valid interpretation of struct is a uint8_t[].
Comment 33 Larry the Git Cow gentoo-dev 2019-10-16 08:00:19 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c0cd07cf03f3024c83749097d9e70cae0c28e41b

commit c0cd07cf03f3024c83749097d9e70cae0c28e41b
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-10-16 07:58:58 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-10-16 07:58:58 +0000

    package.mask: Masked =net-misc/dhcpcd-8.1.0
    
    Bug: https://bugs.gentoo.org/697672
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 profiles/package.mask | 5 +++++
 1 file changed, 5 insertions(+)
Comment 34 Roy Marples 2019-10-16 10:37:28 UTC
(In reply to Sergei Trofimovich from comment #32)
> Looking at generated dhcp.c code everything is inlined into dhcp_readbpf().
> Thus you might need to call it as-is.
> 
> If I read correctly a diff of '-fverbose-asm' vs '-fverbose-asm
> -fno-strict-aliasing' checksums_valid() lost reads of ip->ip_src /
> ip->ip_dst around:
> 
>   static bool
>   checksums_valid(void *packet,
>     struct in_addr *from, unsigned int flags)
>   {
>     struct ip *ip = packet;
>     struct ip pseudo_ip = {
>         .ip_p = IPPROTO_UDP,
>         .ip_src = ip->ip_src, // <- this 32-bit store
>         .ip_dst = ip->ip_dst  // <- and this 32-bit store
>     };
>     ...
>     in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
>     ...
> 
> and left them uninitialized on stack.

What is not initialised exactly? We have a compile time assert at the top to ensure that there is no padding in the structure and C member initialisation ensures that all members are set to zero if not specified.

> 
> in_cksum() treats 'pseudo_ip' effectively as unint16_t[]. I could see why
> gcc might decide to consider 32-bit stores and 16-bit loads as not affecting
> one another and removing "dead" stored of 'pseudo_ip.ip_src'.
> 
> I think the only valid interpretation of struct is a uint8_t[].

in_cksum is based on RFC 1071 section 4.1 which you'll find in many samples like here:
https://gist.github.com/fakessh/4137434#file-client-c-L34

Are suggesting all these implementations are broken?
Comment 35 David Seifert gentoo-dev 2019-10-16 11:46:17 UTC
(In reply to Roy Marples from comment #34)
> (In reply to Sergei Trofimovich from comment #32)
> > Looking at generated dhcp.c code everything is inlined into dhcp_readbpf().
> > Thus you might need to call it as-is.
> > 
> > If I read correctly a diff of '-fverbose-asm' vs '-fverbose-asm
> > -fno-strict-aliasing' checksums_valid() lost reads of ip->ip_src /
> > ip->ip_dst around:
> > 
> >   static bool
> >   checksums_valid(void *packet,
> >     struct in_addr *from, unsigned int flags)
> >   {
> >     struct ip *ip = packet;
> >     struct ip pseudo_ip = {
> >         .ip_p = IPPROTO_UDP,
> >         .ip_src = ip->ip_src, // <- this 32-bit store
> >         .ip_dst = ip->ip_dst  // <- and this 32-bit store
> >     };
> >     ...
> >     in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
> >     ...
> > 
> > and left them uninitialized on stack.
> 
> What is not initialised exactly? We have a compile time assert at the top to
> ensure that there is no padding in the structure and C member initialisation
> ensures that all members are set to zero if not specified.
> 
> > 
> > in_cksum() treats 'pseudo_ip' effectively as unint16_t[]. I could see why
> > gcc might decide to consider 32-bit stores and 16-bit loads as not affecting
> > one another and removing "dead" stored of 'pseudo_ip.ip_src'.
> > 
> > I think the only valid interpretation of struct is a uint8_t[].
> 
> in_cksum is based on RFC 1071 section 4.1 which you'll find in many samples
> like here:
> https://gist.github.com/fakessh/4137434#file-client-c-L34
> 
> Are suggesting all these implementations are broken?

It seems like you are ultimately overlaying a uint16_t pointer over struct Packet, which definitely violates strict-aliasing rules. Just because you ensure fields are padded doesn't make this valid. C is not portable assembly.
Comment 36 Larry the Git Cow gentoo-dev 2019-10-16 15:11:14 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=06a772cc26a47136b71d2d0a9eaec5bf09b817dd

commit 06a772cc26a47136b71d2d0a9eaec5bf09b817dd
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-10-16 15:10:17 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-10-16 15:11:09 +0000

    package.mask: Removed dhcpcd-8.1.0 mask
    
    Bug: https://bugs.gentoo.org/697672
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 profiles/package.mask | 5 -----
 1 file changed, 5 deletions(-)

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1b2a8af882b4b47b2c4189a52b4bd9f3308666fe

commit 1b2a8af882b4b47b2c4189a52b4bd9f3308666fe
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-10-16 15:09:50 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-10-16 15:11:08 +0000

    net-misc/dhcpcd: Bump to version 8.1.1. Removed old
    
    Bug: https://bugs.gentoo.org/697672
    Package-Manager: Portage-2.3.77, Repoman-2.3.17
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 net-misc/dhcpcd/Manifest                                     | 2 +-
 net-misc/dhcpcd/{dhcpcd-8.1.0.ebuild => dhcpcd-8.1.1.ebuild} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
Comment 37 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev 2019-10-16 15:12:59 UTC
Please test dhcpcd-8.1.1 and report back your results.
Comment 38 Roy Marples 2019-10-16 15:26:11 UTC
(In reply to David Seifert from comment #35)
> It seems like you are ultimately overlaying a uint16_t pointer over struct
> Packet, which definitely violates strict-aliasing rules. Just because you
> ensure fields are padded doesn't make this valid. C is not portable assembly.

That's a fair point.

Using a union with struct ip and uint16_t[sizeof(struct ip)] members allow strict aliasing to work with in_cksum.

dhcpcd-8.1.1 has been pushed with this change so should fix this.
At least it does on the one box I can repliate this on.
Comment 39 Thomas Deutschmann (RETIRED) gentoo-dev 2019-10-16 16:20:18 UTC
=net-misc/dhcpcd-8.1.1 works for me, thanks.
Comment 40 Yuriy Dmitriev 2019-10-16 18:10:50 UTC
It seems like working)))
Comment 41 Sergei Trofimovich (RETIRED) gentoo-dev 2019-10-16 18:22:04 UTC
It's probably worth providing a related upstream change for posterity:
    https://github.com/rsmarples/dhcpcd/commit/0f8f4506858ed4d3e029054ce74f7fc3be7112b3

[ on unrelated note 'uint16_t w[sizeof(struct ip)];' likely allocates twice as much space than you need ]

(In reply to Roy Marples from comment #34)
> (In reply to Sergei Trofimovich from comment #32)
> > Looking at generated dhcp.c code everything is inlined into dhcp_readbpf().
> > Thus you might need to call it as-is.
> > 
> > If I read correctly a diff of '-fverbose-asm' vs '-fverbose-asm
> > -fno-strict-aliasing' checksums_valid() lost reads of ip->ip_src /
> > ip->ip_dst around:
> > 
> >   static bool
> >   checksums_valid(void *packet,
> >     struct in_addr *from, unsigned int flags)
> >   {
> >     struct ip *ip = packet;
> >     struct ip pseudo_ip = {
> >         .ip_p = IPPROTO_UDP,
> >         .ip_src = ip->ip_src, // <- this 32-bit store
> >         .ip_dst = ip->ip_dst  // <- and this 32-bit store
> >     };
> >     ...
> >     in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
> >     ...
> > 
> > and left them uninitialized on stack.
> 
> What is not initialised exactly? We have a compile time assert at the top to
> ensure that there is no padding in the structure and C member initialisation
> ensures that all members are set to zero if not specified.

GCC's optimiser effectively deletes the lines I marked as "// <- ". I suspect it's type based alias analysis but for my own curiosity will check in more detail. The presence of loop also seems to have the effect.

> > in_cksum() treats 'pseudo_ip' effectively as unint16_t[]. I could see why
> > gcc might decide to consider 32-bit stores and 16-bit loads as not affecting
> > one another and removing "dead" stored of 'pseudo_ip.ip_src'.
> > 
> > I think the only valid interpretation of struct is a uint8_t[].
> 
> in_cksum is based on RFC 1071 section 4.1 which you'll find in many samples
> like here:
> https://gist.github.com/fakessh/4137434#file-client-c-L34
> 
> Are suggesting all these implementations are broken?

Yes, I think this code snippet is broken at:
    https://gist.github.com/fakessh/4137434#file-client-c-L101

as it also casts a struct* to uint16_t*.
Comment 42 Roy Marples 2019-10-16 21:55:00 UTC
(In reply to Sergei Trofimovich from comment #41)
> [ on unrelated note 'uint16_t w[sizeof(struct ip)];' likely allocates twice
> as much space than you need ]

https://roy.marples.name/cgit/dhcpcd.git/commit/?id=f77022dbce8890eb5c6597af99490852aa4cd5ec

Fixed, thanks
Comment 43 Ionen Wolkens gentoo-dev 2019-10-16 23:45:03 UTC
Can confirm fixed for me as well, great to see this was figured out rather than worked around.
Comment 44 Tomáš Mózes 2019-10-18 06:55:53 UTC
Thanks for the fix, 8.1.1 works here too.
Comment 45 Sergei Trofimovich (RETIRED) gentoo-dev 2019-10-27 19:23:16 UTC
Created attachment 594184 [details]
dhcp.c

(In reply to Sergei Trofimovich from comment #41)
> > What is not initialised exactly? We have a compile time assert at the top to
> > ensure that there is no padding in the structure and C member initialisation
> > ensures that all members are set to zero if not specified.
> 
> GCC's optimiser effectively deletes the lines I marked as "// <- ". I
> suspect it's type based alias analysis but for my own curiosity will check
> in more detail. The presence of loop also seems to have the effect.

Added trimmed down dhcp.c example. Looks like the loop was not the major issue here. The type cast is enough for gcc to throw away most of struct initializer.

    $ gcc dhcp.c -o dhcp -O2                      && ./dhcp
    packet sum: 0x011f5b
    $ gcc dhcp.c -o dhcp -O2 -fno-strict-aliasing && ./dhcp
    packet sum: 0x005555

Fun fact: the code had a chance to mis-compile even by gcc-4.0.4:

gcc-3.3.6: packet sum: 0x005555
gcc-3.4.6: packet sum: 0x005555
gcc-4.0.4: packet sum: 0x001180
gcc-4.1.2: packet sum: 0x005555
gcc-4.2.4: packet sum: 0x005555
gcc-4.3.6: packet sum: 0x005555
gcc-4.4.7: packet sum: 0x0011b0
gcc-4.5.4: packet sum: 0x0011b0
gcc-4.6.4: packet sum: 0x0010c0
gcc-4.7.4: packet sum: 0x0011d0
gcc-4.8.5: packet sum: 0x0011d0
gcc-4.9.4: packet sum: 0x001210
gcc-5.4.0: packet sum: 0x005555
gcc-5.5.0: packet sum: 0x005555
gcc-6.4.0: packet sum: 00000000
gcc-6.5.0: packet sum: 00000000
gcc-7.3.0: packet sum: 00000000
gcc-7.3.1: packet sum: 00000000
gcc-7.4.0: packet sum: 00000000
gcc-8.2.0: packet sum: 0x00bb1e
gcc-8.3.0: packet sum: 0x00acf4
gcc-8.3.1: packet sum: 0x0109f1
gcc-9.1.0: packet sum: 0x016d51
gcc-9.2.0: packet sum: 0x01b295