With current stable GCC 4.7 and depending on kernel configuration (more on this later), trying to debug a program with GDB immediately ends with: (gdb) run Starting program: /usr/bin/bogus-program Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000000000000 in ?? () (gdb) This issue plagues ia64 for quite some time now and was originally incorrectly reported first on Debian BTS against the Linux kernel [1]. As alluded to there by Ben Hutchings on behalf of Will Deacon, "gcc's code generation for ia64 has regressed in 4.6 or earlier" [2]. I thus ran git bisect on GCC's git mirror and found the bad commit during GCC 4.6 development cycle: SVN r165240 [3] (that aimed to fix PR/rtl-opt/33721 [4]). I've also found that this issue has been fixed during GCC 4.8 development cycle: SVN r191928 [5] (that aimed to fix PR rtl-optimization/54457 [6]). I've no idea whether it was expected that SVN r165240 could break GCC on ia64 or not. Similarly, I've no idea if it was expected that SVN r191928 could fix it. It's also possible that both these SVN revisions are only side-effects of something more subtly broken elsewhere. As I'm not an ia64 or GCC guru, please discuss these aspect in upstream bug #61799 [7]. Last, GDB error is one incarnation of this GCC breakage. Who knows which ia64 bugs ia64 could be explained by this GCC breakage too (e.g. bug #510136 [8] or bug #497514 [9])? Émeric [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=691576 [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=691576#116 [3] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=165240 [4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721 [5] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=191928 [6] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54457 [7] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61799 [8] https://bugs.gentoo.org/show_bug.cgi?id=510136 [9] https://bugs.gentoo.org/show_bug.cgi?id=497514
emerge --info output Portage 2.2.8-r1 (default/linux/ia64/13.0/desktop/gnome/systemd, gcc-4.7.3, glibc-2.17, 3.12.21-gentoo-r1 ia64) ================================================================= System uname: Linux-3.12.21-gentoo-r1-ia64-Madison-with-gentoo-2.2 KiB Mem: 25053312 total, 16999296 free KiB Swap: 524224 total, 524224 free Timestamp of tree: Thu, 24 Jul 2014 21:30:01 +0000 ld GNU ld (GNU Binutils) 2.23.2 app-shells/bash: 4.2_p45 dev-java/java-config: 2.1.12-r1 dev-lang/python: 2.7.6, 3.3.3 dev-util/cmake: 2.8.12.2 dev-util/pkgconfig: 0.28-r1 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.12.4 sys-apps/sandbox: 2.6-r1 sys-devel/autoconf: 2.13, 2.69 sys-devel/automake: 1.11.6, 1.13.4 sys-devel/binutils: 2.23.2 sys-devel/gcc: 4.7.3 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4.2-r1 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.13 (virtual/os-headers) sys-libs/glibc: 2.17 Repositories: gentoo my_ebuilds ACCEPT_KEYWORDS="ia64" ACCEPT_LICENSE="* -@EULA" CBUILD="ia64-unknown-linux-gnu" CFLAGS="-mtune=itanium2 -O2 -pipe" CHOST="ia64-unknown-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-mtune=itanium2 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="ftp://mirrors.linuxant.fr/distfiles.gentoo.org/" LANG="fr_FR.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/var/lib/layman/my_ebuilds" USE="X a52 aac acl acpi alsa berkdb branding bzip2 cairo cdda cdr cli colord cracklib crypt cups cxx dbus dri dts dvdr eds encode evo exif fam firefox flac fortran gdbm gif gnome gnome-keyring gnome-online-accounts gpm gstreamer gtk ia64 iconv introspection ipv6 jpeg lcms ldap libnotify libsecret mad mng modules mp3 mp4 mpeg nautilus ncurses nls nptl ogg opengl openmp pam pango pcre pdf png policykit ppds pulseaudio qt3support qt4 readline sdl session socialweb spell ssl startup-notification svg systemd tcpd tiff truetype udev udisks unicode upower usb vorbis wxwidgets xcb xml xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="fr" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="radeon fbdev modesetting" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON
i can confirm that rebuilding the kernel with gcc-4.8 makes strace/gdb work again. unfortunately we can't stabilize gcc-4.8+ yet because of bug 503838 :(.
gcc-4.8/gcc-4.9 are now stable on ia64, so this issue should resolve itself
(In reply to SpanKY from comment #3) > gcc-4.8/gcc-4.9 are now stable on ia64, so this issue should resolve itself I bet that an updated glibc (https://bugs.gentoo.org/show_bug.cgi?id=503838#c16) is also required to effectively fix the problem, right? Indeed, with current ia64 stable =sys-libs/glibc-2.21-r1, I'm still having this issue when compiling =sys-kernel/gentoo-sources-4.1.12 with now ia64 stable =sys-devel/gcc-4.9.3. No problem when recompiling the kernel with =sys-devel/gcc-4.5.4. Émeric
I'm sorry, but as I was reporting in https://bugs.gentoo.org/show_bug.cgi?id=518130#c4, that's definitely not the case ATM. Isn't an updated glibc also required to fix this issue? BTW, while =sys-devel/gdb-7.9.1 was failing with SIGTRAP at 0 address with kernel built with >sys-devel/gcc-4.5, new =sys-devel/gdb-7.10.1 simply says (e.g. with /bin/ls): Starting program: /bin/ls Failed to read a valid object file image from memory. and stays here forever. Booting with a kernel compiled with =sys-devel/gcc-4.5.4 brings gdb back to normal working. Émeric
(In reply to Émeric Maschino from comment #4) your original report said the bug exists in gcc-4.6 & gcc-4.7, and that the issue was fixed in gcc-4.8. i too built the kernel w/gcc-4.8 and couldn't reproduce. since the only thing holding back gcc-4.8+ going stable was the broken glibc, once we addressed the glibc issue, it was all marked stable & fixed. more directly, the fix to glibc is entirely irrelevant to this bug. you do not need any specific version of glibc to reproduce or fix this bug. you only need to use a specific gcc version. now you're saying, contrary to your original comment #0, building the kernel with a newer gcc does not seem to help with this bug.
I've been reading along in the various bugs. Just a quick test trying to compile a newer kernel with GCC 4.9. -------------------------------------------- ia64 linux-4.1.15-gentoo-r1 # date Tue Feb 2 09:00:18 HKT 2016 ia64 linux-4.1.15-gentoo-r1 # pwd /usr/src/linux-4.1.15-gentoo-r1 ia64 linux-4.1.15-gentoo-r1 # gcc-config -l [1] ia64-unknown-linux-gnu-4.5.4 [2] ia64-unknown-linux-gnu-4.7.4 [3] ia64-unknown-linux-gnu-4.9.3 * ia64 linux-4.1.15-gentoo-r1 # -------------------------------------------- ---------------------------------------------------------------------------- gcc: internal compiler error: Segmentation fault (program as) 0x400000000001cfaf execute ../../gcc/gcc.c:2823 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. scripts/Makefile.build:258: recipe for target 'drivers/rtc/rtc-lib.o' failed make[2]: *** [drivers/rtc/rtc-lib.o] Error 4 scripts/Makefile.build:403: recipe for target 'drivers/rtc' failed make[1]: *** [drivers/rtc] Error 2 Makefile:947: recipe for target 'drivers' failed make: *** [drivers] Error 2 ---------------------------------------------------------------------------- Not sure if the above is related to what Émeric Maschino saw. I will try building the same kernel with GCC 4.7.
(In reply to Brendan Horan from comment #7) this bug has nothing to do with the actual segfaults. it's purely about gdb itself misbehaving in trying to trace programs. if gcc itself is crashing, then please file a new bug.
(In reply to SpanKY from comment #8) > (In reply to Brendan Horan from comment #7) > > this bug has nothing to do with the actual segfaults. it's purely about gdb > itself misbehaving in trying to trace programs. if gcc itself is crashing, > then please file a new bug. Will do :) Thanks
(In reply to SpanKY from comment #6) > > your original report said the bug exists in gcc-4.6 & gcc-4.7, and that the > issue was fixed in gcc-4.8. i too built the kernel w/gcc-4.8 and couldn't > reproduce. since the only thing holding back gcc-4.8+ going stable was the > broken glibc, once we addressed the glibc issue, it was all marked stable & > fixed. > > more directly, the fix to glibc is entirely irrelevant to this bug. you do > not need any specific version of glibc to reproduce or fix this bug. you > only need to use a specific gcc version. > > now you're saying, contrary to your original comment #0, building the kernel > with a newer gcc does not seem to help with this bug. What I was saying is that I ran bisect and found first bad and good commits. But as I was outlining there [1]: "Back to the original GDB issue, can it now be explained (breakage and fix) by retrospectively looking at the code modified by revisions 165240 and 191928? Or does it make no sense at all and I'm simply observing random side-effects of some more subtle breakage elsewhere?" Answer to this question was pretty laconic [2] and I fear that we're indeed "observing random side-effects of some more subtle breakage elsewhere." Émeric [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61799#c2 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61799#c3
(In reply to Émeric Maschino from comment #10) > > Answer to this question was pretty laconic [2] and I fear that we're indeed > "observing random side-effects of some more subtle breakage elsewhere." > > Émeric > > > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61799#c3 Unless yet another bad commit in gcc-4.8+ triggered the GDB bug again... Émeric
(In reply to Émeric Maschino from comment #11) > > Unless yet another bad commit in gcc-4.8+ triggered the GDB bug again... And while I can't check for currently stable gcc 4.9.4 because of bug #601014, gcc 4.9.3 no more stops with SIGTRAP at 0 address. Émeric
(In reply to Émeric Maschino from comment #12) > (In reply to Émeric Maschino from comment #11) > > > > Unless yet another bad commit in gcc-4.8+ triggered the GDB bug again... > > And while I can't check for currently stable gcc 4.9.4 because of bug > #601014, gcc 4.9.3 no more stops with SIGTRAP at 0 address. > > Émeric So with gcc 4.9.4 and 5.3.0, gdb no more stops with SIGTRAP, but stalls indefinitely at 0 address (until interrupted with Ctrl+C): (gdb) run Starting program: /usr/bin/bogus-program ^C Program received signal SIGINT, Interrupt. 0x0000000000000000 in ?? () (gdb) Émeric
(In reply to Émeric Maschino from comment #13) > (In reply to Émeric Maschino from comment #12) > > (In reply to Émeric Maschino from comment #11) > > > > > > Unless yet another bad commit in gcc-4.8+ triggered the GDB bug again... > > > > And while I can't check for currently stable gcc 4.9.4 because of bug > > #601014, gcc 4.9.3 no more stops with SIGTRAP at 0 address. > > > > Émeric > > So with gcc 4.9.4 and 5.3.0, gdb no more stops with SIGTRAP, but stalls > indefinitely at 0 address (until interrupted with Ctrl+C): > > (gdb) run > Starting program: /usr/bin/bogus-program > ^C > Program received signal SIGINT, Interrupt. > 0x0000000000000000 in ?? () > (gdb) > > Émeric You should probably file a fresh upstream bug then, since they are unaware it's still broken.
After upgrading kernel on guppy from 3.14.14-gentoo (GCC: Gentoo 4.9.3 p1.2, pie-0.6.3) to 4.9.72-gentoo (GCC: Gentoo 6.4.0-r1 p1.3) I see the same problem: gdb-8.0.1 hangs on every program (gdb ls; run) strace hangs on every program (strace ls) I think it started happening with kernel upgrade and not before.
(In reply to Sergei Trofimovich from comment #15) > After upgrading kernel on guppy > from 3.14.14-gentoo (GCC: Gentoo 4.9.3 p1.2, pie-0.6.3) > to 4.9.72-gentoo (GCC: Gentoo 6.4.0-r1 p1.3) > I see the same problem: > > gdb-8.0.1 hangs on every program (gdb ls; run) > strace hangs on every program (strace ls) > > I think it started happening with kernel upgrade and not before. The fun thing is that it depends on how you compile your kernel. Sometimes, simply changing compiling a feature as a module rather than built-in (and vice versa) makes GDB working again. Or not. But I've always failed to decipher a clear pattern here. Émeric
It looks like strace fails to decode some machine states: $ strace -d ls strace: ptrace_setoptions = 0x51 strace: new tcb for pid 10131, active tcbs:1 strace: [wait(0x80137f) = 10131] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128) strace: pid 10131 has TCB_STARTUP, initializing it strace: [wait(0x80057f) = 10131] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128) strace: [wait(0x00127f) = 10131] WIFSTOPPED,sig=SIGCONT strace: [wait(0x00857f) = 10131] WIFSTOPPED,sig=133 133 is 0x85, or 0x80 (PT_TRACESYSGOOD) | 0x5 (SIGTRAP). Loks legitimate, but for some reason is not decoded by strace.
Filed bug upstream https://github.com/strace/strace/issues/33 to clear up debugging output and get some hints on what is obviously wrong here.
ldv++ pointed out it's OK and real problem is in always-failing PTRACE_GETREGS: """ ./strace: get_regs: get_regs_error: Input/output error ???? Looks like ptrace(PTRACE_GETREGS) always fails with EIO on this new kernel. """
The suspect is another compiler bug (or aliasing effects in kernel, unlikely). Disabling inlining on init_unwind_table in linux kernel unbreaks at least strace: --- a/arch/ia64/kernel/unwind.c +++ b/arch/ia64/kernel/unwind.c @@ -2071,21 +2071,21 @@ EXPORT_SYMBOL(unw_init_frame_info); -static void +static noinline void init_unwind_table (struct unw_table *table, const char *name, unsigned long segment_base, unsigned long gp, const void *table_start, const void *table_end) { const struct unw_table_entry *start = table_start, *end = table_end; table->name = name; table->segment_base = segment_base; table->gp = gp; table->start = segment_base + start[0].start_offset; table->end = segment_base + end[-1].end_offset; gcc manages to miscompile this: table->end = segment_base + end[-1].end_offset; into something that does not refer to end of unwind table. Un-inlining workarounds the breakage.
In our case reproducer is: __end_unwind[-1].end_offset It should generate ((const char*)__end_unwind - 24) reference but GCC generates something amazing and scary. Filed upstream GCC bug with minimal reproducer: https://gcc.gnu.org/PR84184 To workaround it we should avoid gcc to see optimising 'symbol[negative-literal]' references.
Proposed linux kernel patch upstream as: https://lkml.org/lkml/2018/2/2/914
Created attachment 517626 [details, diff] 0001-ia64-fix-ptrace-PTRACE_GETREGS-unbreaks-strace-gdb.patch
Hia kernel@! Can you apply the patch 0001-ia64-fix-ptrace-PTRACE_GETREGS-unbreaks-strace-gdb.patch for gentoo-sources? It unbreaks ptrace() syscall and unbreaks booting for some users. The patch is not yet upstream but was tested on real hardware (sent as https://lkml.org/lkml/2018/2/2/914) ia64@ would like to see this patch for next to-be-stable kernels to make autobuild install CDs pick the version. I guess it's 4.9.* and above. Patch should apply as-is to the wide range of versions. Thank you!
added patch to gentoo-sources-4.9.85 gentoo-sources-4.14.23 gentoo-sources-4.15.7
Thanks, Alice. Closing as patch exists in released kernels. Please re-open or tell me too if I am misunderstanding the fix in place.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ff2c7b91695f91aa82e5cba3e10e56086f4ab74f commit ff2c7b91695f91aa82e5cba3e10e56086f4ab74f Author: Sergei Trofimovich <slyfox@gentoo.org> AuthorDate: 2018-03-18 23:25:41 +0000 Commit: Sergei Trofimovich <slyfox@gentoo.org> CommitDate: 2018-03-18 23:25:41 +0000 sys-kernel/gentoo-sources: ia64 stable, bug #518130 Stabilize kernel with ptrace() fix as it fixes boot for some types fo ia64 machines. Bug: https://bugs.gentoo.org/579278 Bug: https://bugs.gentoo.org/518130 Package-Manager: Portage-2.3.24, Repoman-2.3.6 sys-kernel/gentoo-sources/gentoo-sources-4.9.85.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)}
Explained a few details of breakage mechanics at: https://trofi.github.io/posts/210-ptrace-and-accidental-boot-fix-on-ia64.html and sent the link to mailing list to clarify why patching is needed.