Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 528690 - app-emulation/xen-tools-4.4.1-r3 - xl: segmentation fault in execute_stack_op() at .../work/gcc-4.8.3/libgcc/unwind-dw2.c:516
Summary: app-emulation/xen-tools-4.4.1-r3 - xl: segmentation fault in execute_stack_op...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal with 3 votes (vote)
Assignee: Gentoo Xen Devs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-08 21:11 UTC by cyberbat
Modified: 2015-03-10 05:35 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
full backtrace log with debug symbols (file_528690.txt,7.96 KB, text/plain)
2014-11-08 21:11 UTC, cyberbat
Details
xen-tools.build.log.gz (xen-tools.build.log.gz,71.28 KB, application/gzip)
2014-11-08 22:07 UTC, cyberbat
Details
backtrace for xen-4.3.3-r1 (backtrace-xen-4.3.3-r1,11.13 KB, text/plain)
2014-11-09 22:44 UTC, KK
Details
gdb backtrace log (xen_log.txt,3.99 KB, text/plain)
2014-11-14 06:54 UTC, Yixun Lan
Details
patch for gcc-4.8.3 to disable -stack-check (02_gcc-4.8.3-disable_stack_check.patch,1.03 KB, patch)
2014-11-29 23:31 UTC, Yixun Lan
Details | Diff
build gcc libgcc without -fstack-check (libgcc_nofstack-check.patch,490 bytes, patch)
2014-11-30 19:47 UTC, Magnus Granberg
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description cyberbat 2014-11-08 21:11:51 UTC
Created attachment 388900 [details]
full backtrace log with debug symbols

I'm not quitely sure but seems it can happen only on hardened system.

I use app-emulation/xen-4.4.1-r2, sys-kernel/hardened-sources-3.14.17-r1, sys-devel/gcc-4.8.3.

app-emulation/xen-tools-4.4.1-r3  USE="hvm pam pygrub python qemu screen"

After compiling xen-tools with gcc-4.8.3 I begin to get segfault on most of all xl command being used. I don't use hvm, only paravirtualized domU. I get segmentaion fault on xl create, xl block-attach, xl destroy. Regardless segfault actions seem to be completed: domain starts, block dev attaches or domain destroys.

I googled these two threads seems to be similar with mine:
https://forums.gentoo.org/viewtopic-p-7647392.html
http://lists.xen.org/archives/html/xen-devel/2014-10/msg03352.html

Top of backtrace is

#0  0x00006ddf38775824 in execute_stack_op (op_ptr=0x6ddf3a211b83 "w\240\001\006\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300", 
    op_end=0x6ddf3a211b87 "\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300", context=context@entry=0x6ddf3add9190, initial=initial@entry=0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:516
        stack = {0 <repeats 44 times>, 120805532733600, 120805492489619, 120805532733616, 120805532733920, 120805520366588, 120805520368872, 
          120805532733744, 120805492494635, 120805532733760, 120804545134595, 120805520369416, 352, 10, 167, 220, 0, 0, 0, 0, 120805533860168}
        stack_elt = <optimized out>
#1  0x00006ddf3877628c in uw_update_context_1 (context=context@entry=0x6ddf3add95a0, fs=fs@entry=0x6ddf3add92f0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:1424
        exp = <optimized out>
        len = <optimized out>
        orig_context = {reg = {0x6ddf3add9698, 0x6ddf3add96a0, 0x0, 0x6ddf3add96a8, 0x0, 0x0, 0x6ddf3add96f0, 0x6ddf3add9180, 0x0, 0x0, 0x0, 0x0, 
            0x6ddf3add96b0, 0x6ddf3add96b8, 0x6ddf3add96c0, 0x6ddf3add96c8, 0x6ddf3add96f8, 0x0}, cfa = 0x6ddf3add9700, ra = 0x6ddf3a20ae00 <__restore_rt>, 
          lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0, func = 0x6ddf3a20adff}, flags = 4611686018427387904, version = 0, args_size = 0, 
          by_value = '\000' <repeats 17 times>}
        cfa = <optimized out>
        i = <optimized out>
        tmp_sp = {ptr = 120805532735232, word = 120805532735232}
#2  0x00006ddf38776605 in uw_update_context (context=context@entry=0x6ddf3add95a0, fs=fs@entry=0x6ddf3add92f0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:1506
No locals.

emerge --info
Portage 2.2.8-r2 (hardened/linux/amd64, gcc-4.8.3, glibc-2.19-r1, 3.14.17-hardened-r1 x86_64)
=================================================================
System uname: Linux-3.14.17-hardened-r1-x86_64-Intel-R-_Core-TM-_i7-4770_CPU_@_3.40GHz-with-gentoo-2.2
KiB Mem:     4030288 total,   2773828 free
KiB Swap:   33554428 total,  33554428 free
Timestamp of tree: Unknown
ld GNU ld (Gentoo 2.24 p1.4) 2.24
app-shells/bash:          4.2_p53
dev-lang/perl:            5.18.2-r2
dev-lang/python:          2.7.7, 3.3.5-r1, 3.4.1
dev-util/cmake:           2.8.12.2-r1
dev-util/pkgconfig:       0.28-r1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.12.4
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.69
sys-devel/automake:       1.13.4
sys-devel/binutils:       2.24-r3
sys-devel/gcc:            4.7.3-r1, 4.8.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4.2-r1
sys-devel/make:           4.0-r1
sys-kernel/linux-headers: 3.13 (virtual/os-headers)
sys-libs/glibc:           2.19-r1
Repositories: gentoo mylay
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -mtune=native -ggdb"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -mtune=native -ggdb"
DISTDIR="/var/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo"
LANG="C"
LDFLAGS="-Wl,--hash-style=gnu -Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j8"
PKGDIR="/var/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/portage/mylay"
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage"
USE="acl amd64 bash-completion berkdb bzip2 caps cli cracklib crypt cxx device-mapper dri gdbm gmp gnutls hardened iconv icu idn ipv6 justify mmx modules multilib ncurses nls nptl openmp pam pax_kernel pcre readline session sse sse2 ssl ssse3 unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en ru" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2014-11-08 21:47:58 UTC
Please attach the entire build log to this bug report.
Comment 2 cyberbat 2014-11-08 22:07:15 UTC
Created attachment 388906 [details]
xen-tools.build.log.gz

Here it is.
Comment 3 KK 2014-11-08 23:04:58 UTC
Same thing here (I am the original author of the thread in the gentoo forum and the xen list) with the difference that I am using xen-4.3.3-r1 which is the latest stable version in the tree.
I can neither confirm nor deny that the issue experienced is confined to hardened only but I have since reverted back to the previously working xen-4.3.2-r5 (which I had to bring in from the attic) and gcc-4.7.3 and now still experience the segfaults. The situation is even worse with HVM and PCI passthrough as the segfault prevents PCI devices from being passed through and therefore renders any such domUs completely useless.

It's probably worth mentioning that my system (with vt-d support) has worked for almost a year without any hickups before the update to xen-4.3.3 and gcc-4.8.3 which happened with a day's difference (first there was the gcc update, the next day the xen update and then a reboot).
The rest of the system is still working flawlessly and is absolutely stable. I have also re-compiled the toolchain and then both system and world after the upgrade to gcc-4.8.3 just to be on the safe side.
After the downgrade of gcc to again test with 4.7.3 I have only re-emerged xen-* glibc (all with -ggdb and splitdebug) and rebooted the system.

======================================================
emerge --info (after downgrade to 4.7.3 with 4.7.3 selected in gcc-config):

Portage 2.2.8-r2 (hardened/linux/amd64, gcc-4.7.3, glibc-2.19-r1, 3.15.10-hardened-r1 x86_64)
=================================================================
System uname: Linux-3.15.10-hardened-r1-x86_64-Intel-R-_Xeon-R-_CPU_E31260L_@_2.40GHz-with-gentoo-2.2
KiB Mem:     4033480 total,   3756384 free
KiB Swap:   16777148 total,  16777148 free
Timestamp of tree: Sat, 08 Nov 2014 00:45:01 +0000
ld GNU ld (Gentoo 2.24 p1.4) 2.24
app-shells/bash:          4.2_p53
dev-lang/perl:            5.18.2-r2
dev-lang/python:          2.7.7, 3.4.1
dev-util/cmake:           2.8.12.2-r1
dev-util/pkgconfig:       0.28-r1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.12.4
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.69
sys-devel/automake:       1.13.4
sys-devel/binutils:       2.24-r3
sys-devel/gcc:            4.7.3-r1, 4.8.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4.2-r1
sys-devel/make:           4.0-r1
sys-kernel/linux-headers: 3.13 (virtual/os-headers)
sys-libs/glibc:           2.19-r1
Repositories: gentoo x-portage
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--quiet-build=y --buildpkg-exclude sys-kernel/hardened-sources"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs buildpkg config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://gd.tuwien.ac.at/opsys/linux/gentoo/ ftp://gd.tuwien.ac.at/opsys/linux/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/usr/portage/packages"
PORTAGE_COMPRESS=""
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--quiet --progress"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage/"
USE="acl amd64 avx bash-completion berkdb bzip2 cli cracklib crypt cxx gdbm hardened iconv justify lm_sensors mmx mmxext modules multilib ncurses nls nptl openmp pam pax_kernel pcre readline session sse sse2 sse3 sse4.1 sse4_1 ssl ssse3 tcpd unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" ELIBC="glibc" KERNEL="linux" LINGUAS="en" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7" RUBY_TARGETS="ruby20" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
USE_PYTHON="2.7"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS_FLAGS
Comment 4 KK 2014-11-08 23:15:05 UTC
(In reply to Jeroen Roovers from comment #1)
> Please attach the entire build log to this bug report.

Just to be sure that everybody is on the same page here: It's not the build process emerging gcc-4.8.3 that fails but rather the binary 'xl' that is created by emerging xen-tools now segfaults when being started to create a domU.
Comment 5 KK 2014-11-09 22:43:12 UTC
In addition to the backtrace for xen-4.4.1-r2 (~amd64) provided by the OP I have also attached a gdb session including full backtrace for xen-4.3.3-r1 (stable,  emerged with gcc-4.8.3) for the command
     xl create pfsense -c
where pfsense is an HVM domU with PCI passthrough:

In case you want me to file a separate bug for the stable xen-4.3.3 version please let me know.

Tx KK
Comment 6 KK 2014-11-09 22:44:17 UTC
Created attachment 388978 [details]
backtrace for xen-4.3.3-r1
Comment 7 Yixun Lan archtester gentoo-dev 2014-11-10 10:22:39 UTC
hello, thanks for reporting this
but none of us (@xen team -> idella4 and me) have hardened system installed...

@KK, could you also attach the pfsense file (xen configuration) here? I'll try to setup a hardened system and see if I can re-produce. (note, I'm rather busy this days, so can't guarantee when I would done this)
Comment 8 KK 2014-11-10 11:27:14 UTC
(In reply to Yixun Lan from comment #7)
> hello, thanks for reporting this
> but none of us (@xen team -> idella4 and me) have hardened system
> installed...
Hi Yixun,
I am happy to help out with my installed system (trying patches or whatever you think I am able to help with) as long as it is with my limited possibilities compared to your knowledge; I am also happy to join a session via irc if you deem that beneficial
> 
> @KK, could you also attach the pfsense file (xen configuration) here? I'll
> try to setup a hardened system and see if I can re-produce. (note, I'm
> rather busy this days, so can't guarantee when I would done this)
Sure - here you go (please note that my system does support vt-d):

===== pfsense xen configuration file ========
builder                 = 'hvm'
cpus                    = '2-7'
vcpus                   = 1
cpu_weight              = 512
memory                  = 512
name                    = 'pfsense'
disk                    = [ 'phy:/etc/xen/guests/disk.d/pfsense.disk,sda,w' ]
vif                     = [ 'mac=00:16:3e:a1:64:01,bridge=xenbr0,model=e1000' ]
on_poweroff             = 'destroy'
on_reboot               = 'restart'
on_crash                = 'restart'
localtime               = 0
boot                    = 'c'
vnc                     = 0
nographic               = 1
serial                  = 'pty'
nx                      = 1
pci                     = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
======= end of configuration file ==========

relevant lspci output for passed through devices (NOTE: only 0a:08.0 is currently used in the domU [this device is connected to the Ethernet port of my ADSL modem], but all were successfully passed through and visible in the domU before the problem started):
04:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
0a:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 10)
0a:0b.0 Network controller: Qualcomm Atheros AR922X Wireless Network Adapter (rev 01)

If you need any more information please let me know; answers should usually be pretty quick

Thanks KK
Comment 9 Romain Bezut 2014-11-10 12:53:33 UTC
Hi !

Note sure if it helps but I get the exact same problem on my system.
I am gentoo hardened with xen-tools 4.4.1-r3.
It used to work pretty fine before I upgraded to GCC 4.8.3 (and removed gcc-4.7...),
but since the upgrade I get a segfault on each "xl create" command (but the domU runs well).

Here is my domU configuration:

======= start of configuration file ==========
name        = 'yadh'
kernel      = '/data/xen/kernels/generic/kernel'
ramdisk     = '/data/xen/kernels/generic/initrd'
extra       = 'root=/dev/yodh/root ro dolvm'
vcpus       = 2
memory      = 4096
disk        = [ 'phy:/dev/zvol/rpool/VM/yadh,xvda1,w' ]
vif         = [
  'mac=00:16:3e:6c:24:e7,script=vif-bridge,vifname=tap.yadh.in',
]
pvh         = 1
on_xend_stop= 'shutdown'
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
======= end of configuration file ==========

I tried to add a -fno-aggressive-loop-optimizations in xen-4.4.1/config/StdGNU.mk but it didn't help.
I also tried with pvh disabled still get the segfault in libgcc_s.so.1
Trying to re-emerge gcc 4.7 and switching back to it gave me a "C compiler cannot create executables"
for an unknown reason.

If you want any additional details please ask.

Romain
Comment 10 Yixun Lan archtester gentoo-dev 2014-11-10 13:55:13 UTC
I've no clue what's the root cause here, it seems the hardened gcc-4.8.3 trigger the problem.

I'm also CCing @hardened team, see if they can shed some lights on here.
Comment 11 Yixun Lan archtester gentoo-dev 2014-11-14 06:54:12 UTC
Created attachment 389280 [details]
gdb backtrace log

it seems the problem lay in gcc-4.8.3, if I replace libgcc_s.so.1 to gcc-4.7.3[1], then segment fault gone. guest vm started successfully

[1] override libgcc_s.so.1, note, please don't do this in production system, I'm not sure if it may break your system
cp \
/usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.4/libgcc_s.so.1 \
/usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1


btw, the reason why I'm using "cp" to override libgcc_s.so.1, is that I have problem to downgrade to gcc-4.7.3, see the attached log which is the gdb output I got after doing following
1) install gcc-4.7.3, because the default hardened stage3 image with gcc-4.8.3 installed
2) switch to gcc-4.7.3 hardened toolchain
3) emerge sys-libs/glibc, app-emulation/xen-tools, app-emulation/xen
4) run xl create guest_vm, and use gdb got backtrace, found it still call 4.8.3 symbols
Comment 12 Yixun Lan archtester gentoo-dev 2014-11-14 06:56:11 UTC
got @toolchain CCed see if they can give any hint, thanks
Comment 13 KK 2014-11-14 09:32:51 UTC
(In reply to Yixun Lan from comment #11)
> btw, the reason why I'm using "cp" to override libgcc_s.so.1, is that I have
> problem to downgrade to gcc-4.7.3, see the attached log which is the gdb
> output I got after doing following
> 1) install gcc-4.7.3, because the default hardened stage3 image with
> gcc-4.8.3 installed
> 2) switch to gcc-4.7.3 hardened toolchain
> 3) emerge sys-libs/glibc, app-emulation/xen-tools, app-emulation/xen
> 4) run xl create guest_vm, and use gdb got backtrace, found it still call
> 4.8.3 symbols
Thanks Yixun for working on confirming this issue - it's very much appreciated.

Further to your findings I can also confirm that downgrading gcc didn't work for me as well: After an attempted downgrade to gcc-4.7.3 I had to learn that even after that attempt the segfault was still happening in libgcc_s.so.1 - and that was in gcc-4.8.3's version of libgcc_s.so.1. I then decided to not follow that route any further. Additional details about that failed downgrade attempt may be found at https://forums.gentoo.org/viewtopic-t-1003746.html in the 6th post from the start of the thread.

Given that there's currently no sane workaround and all HVM PCI passthrough VM's are simply failing to start/work I'd further suggest to increase the importance/severity of this bug, ideally to a blocker status (which in my view it is).

Thanks KK
Comment 14 SpanKY gentoo-dev 2014-11-14 17:55:44 UTC
does the issue resolve itself if you upgrade to gcc-4.8, and then rebuild all of the packages in question w/gcc-4.8 ?
Comment 15 cyberbat 2014-11-14 18:27:41 UTC
(In reply to SpanKY from comment #14)
> does the issue resolve itself if you upgrade to gcc-4.8, and then rebuild
> all of the packages in question w/gcc-4.8 ?

I've rebuilt all packages in @world with gcc-4.8.
Comment 16 KK 2014-11-14 20:06:00 UTC
(In reply to SpanKY from comment #14)
> does the issue resolve itself if you upgrade to gcc-4.8, and then rebuild
> all of the packages in question w/gcc-4.8 ?
SpanKY, thanks for looking into this. 

I have also rebuilt not only all packages in question after the gcc-4.8.3 upgrade but even @system, and then @world - the issue unfortunately persists.

Thanks KK
Comment 17 SpanKY gentoo-dev 2014-11-14 23:01:28 UTC
i don't suppose there's a simpler test case here that doesn't involve running Xen (as a hypervisor/etc...).  i don't use Xen anywhere and can't really patch kernels to do so.
Comment 18 cyberbat 2014-11-14 23:20:54 UTC
(In reply to SpanKY from comment #17)
> i don't suppose there's a simpler test case here that doesn't involve
> running Xen (as a hypervisor/etc...).  i don't use Xen anywhere and can't
> really patch kernels to do so.

I can provide my hardware for running Xen to fix this bug. If you contact with me by email I can give you root access for one of my servers.
Comment 19 KK 2014-11-15 19:59:56 UTC
(In reply to SpanKY from comment #17)
> i don't use Xen anywhere and can't really patch kernels to do so.
Just to clear any confusion that might still be out there: Xen being mainstreamed since kernel version 3.0 (mid 2011) does no longer require any kernel patches.

To use linux under the Xen hypervisor it's just a matter of setting the right kernel configuration parameters, re-compiling the kernel, emerging xen-* and booting the hypervisor with the linux kernel as a module to the hypervisor (which obviously additionally requires a change to the bootloader configuration).

It's probably also worth pointing out that once the kernle has been re-compiled, the exact same kernel image may either be booted on bare metal (i.e. without the Xen hypervisor) or as the privileged dom0 running under the Xen hypervisor.

If the offer from cyberbat to use one of his servers for fixing the bug does not work out for you for whatever reason, I am (and, I am sure, also cyberbat or any of the capable guys from gentoo's @xen team [idella4 or dlan] is) more than happy to help you in coming up with the required kernel configuration parameters to get Xen up and running on your system.

Thanks Atom2
Comment 20 Sergey Anufrienko 2014-11-26 18:57:18 UTC
Same problem: xl segfaults in libgcc_s.so on almost every action, PCI passthrough doesn't work as domain initialization doesn't finish (domain remains in paused state).

GCC 4.8.3 (USE="cxx hardened (multilib) nls nptl openmp")
Xen 4.3.3-r1 (USE="hvm pam pygrub python qemu")
Xen-tools 4.3.3-r1
Kernel 3.15.10-hardened-r1
Comment 21 Sergey Anufrienko 2014-11-27 07:18:39 UTC
I found that, while GCC 4.8.3 is installed in my system, the active GCC profile is 4.7.3, and Glibc, Xen-tools, Xen were all compiled with GCC 4.7.3, so perhaps this bug also affects GCC 4.7.3.
Comment 22 Sergey Anufrienko 2014-11-27 07:40:40 UTC
So, there was a file on my system:
/etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf, having the following contents, which may explain why libgcc_s.so.1 from 4.8.3 was loaded instead of the one from 4.7.3:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/32
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3

After moving 4.7.3 to be before 4.8.3, doing env-update, and restarting xl, the segfaults are gone.
Comment 23 KK 2014-11-27 22:51:55 UTC
(In reply to Sergey Anufrienko from comment #22)
> After moving 4.7.3 to be before 4.8.3, doing env-update, and restarting xl,
> the segfaults are gone.
I guess you should be safe and that should work as long as you do not use gcc-4.8.3 to compile other stuff. Other software compiled with 4.8.3 might require libgcc_s.so.1 from 4.8.3 and then you might be in trouble with xen.

BTW: The same observation was already made by Yixun in comment #11 when he concluded as follows:
> it seems the problem lay in gcc-4.8.3, if I replace libgcc_s.so.1 to
> gcc-4.7.3[1], then segment fault gone. guest vm started successfully
>
> [1] override libgcc_s.so.1, note, please don't do this in production system,
> I'm not sure if it may break your system

So sticking to 4.7.3 (with the manual changes you had to do after emerging gcc-4.8.3) seems to be a safe workaround to this bug as long as 4.7.3 was not unmerged and it's still the default compiler. When gcc-4.7.3 is no longer installed this unfortunately won't work as going back to an older compiler will most likely trigger other issues/breakages further down the line.

In essence, however, to me this suggests that the problem most likely is with gcc/toolchain (and not with XEN) and I'd really appreciate if the any of the gcc/toolchain devs had a look into it and sorted out this major bug asap - albeit this bug is a _BLOCKER_ for anybody on gcc-4.8.3 using PCI passthrough with XEN. Though at the moment it just looks as if nobody really cared about it.

I would also like to re-iterate that the community has provided all support (i.e. and offer for root access to a system wiht XEN installed [comment #18] or a helping hand in setting up a kernel configuration [comment #19]) in order to ease moving this issue forward for those that do not have XEN installed.

Regards KK
Comment 24 Sergey Anufrienko 2014-11-28 07:06:02 UTC
(In reply to KK from comment #23)
I agree this seems to be a potentially major issue, not only because the issue with Xen and PCI passthrough, but also because with the "default" 05gcc-x86_64-pc-linux-gnu.conf libgcc_s.so from 4.8.3 might be loaded for *all* packages (not only xen-tools), despite the version of GCC they were compiled with, and it's just that xen-tools helped to track it down because it segfaults (other packages could segfault too). I would think the issue is merely that a wrong version of libgcc_s.so is loaded, if it wasn't apparently that xl still segfaults even when all the packages are recompiled with 4.8.3, though I didn't confirm this myself, since 4.7.3 is still my default compiler, and I didn't recompile world.
Comment 25 Yixun Lan archtester gentoo-dev 2014-11-28 14:58:13 UTC
there is regression from gcc-4.8.0 (good) to gcc-4.8.1-r1(bad)

I'm thinking of bisect the root cause, but will unlikely work on this during weekend, so just update status here.
Comment 26 Yixun Lan archtester gentoo-dev 2014-11-29 23:31:08 UTC
Created attachment 390616 [details, diff]
patch for gcc-4.8.3 to disable -stack-check

this issue due to stack check enabled in hardended system

attached patch should apply to gcc-4.8.3[1] and will disable stack check, and solve the segfault err, we are still investigating, but tested and feedback are more than welcome.

[1] to test

a) mkdir -p /etc/portage/patches/sys-devel/gcc-4.8.3/
b) cp 02_gcc-4.8.3-disable_stack_check.patch etc/portage/patches/sys-devel/gcc-4.8.3/
c) emerge =gcc-4.8.3
Comment 27 cyberbat 2014-11-30 00:13:09 UTC
(In reply to Yixun Lan from comment #26)
> Created attachment 390616 [details, diff] [details, diff]
> patch for gcc-4.8.3 to disable -stack-check
> 
> this issue due to stack check enabled in hardended system

I've tried the patch. segfaults seem to really go away. Thank you.
Comment 28 Jason Zaman gentoo-dev 2014-11-30 11:29:52 UTC
Can you re-emerge gcc-4.8.3 without any of the changes you've done (eg remove the patch from comment #26).

then add "-fno-stack-check" to your CFLAGS and emerge xen-tools and then test if the problem still exists?
Comment 29 Romain Bezut 2014-11-30 12:46:27 UTC
(In reply to Yixun Lan from comment #26)
> Created attachment 390616 [details, diff] [details, diff]
> patch for gcc-4.8.3 to disable -stack-check
> 
> this issue due to stack check enabled in hardended system
> 
> attached patch should apply to gcc-4.8.3[1] and will disable stack check,
> and solve the segfault err, we are still investigating, but tested and
> feedback are more than welcome.

Applying this patch worked for me as well on GCC 4.8.3
I did a few straces on the "xl create" call, but I'm not sure it can help a lot.

Anyway I noticed using that the page behind the segfault is mapped by the main xl process as a single read-write page with the flag MAP_STACK.
Right after the mapping I see a mprotect on this page with PROT_NONE.

The segfault is caused by an invalid access (SEGV_ACCERR) to the offset 3760 of this page.


(In reply to Jason Zaman from comment #28)
> Can you re-emerge gcc-4.8.3 without any of the changes you've done (eg
> remove the patch from comment #26).
> 
> then add "-fno-stack-check" to your CFLAGS and emerge xen-tools and then
> test if the problem still exists?

Global CFLAGS does not seem to have an influence on xen-tools (4.4.1-r4) build.
Anyway I added a line in xen-4.4.1/Config.mk I was able to merge it.
The problem is still present after this rebuild.
Comment 30 cyberbat 2014-11-30 13:40:24 UTC
(In reply to Jason Zaman from comment #28)
> Can you re-emerge gcc-4.8.3 without any of the changes you've done (eg
> remove the patch from comment #26).
> 
> then add "-fno-stack-check" to your CFLAGS and emerge xen-tools and then
> test if the problem still exists?

It seems that xen-tools ebuild totally ignores my CFLAGS set in make.conf or using package.env.

So I need an advice how to set CFLAGS for xen-tools.
Comment 31 Sergey Anufrienko 2014-11-30 13:47:34 UTC
Hi,

The xen-tools ebuild has "custom-cflags" USE flag, which apparently needs to be set for any custom CFLAGS settings to be honored.
If I remember right, the CFLAGS="..." line should then be placed into /etc/portage/env/app-emulation/xen-tools file.
Comment 32 Magnus Granberg gentoo-dev 2014-11-30 14:29:12 UTC
Can some one test to build xen on a no hardened system and add -fstack-check?
Comment 33 cyberbat 2014-11-30 14:38:04 UTC
(In reply to Romain Bezut from comment #29)

> (In reply to Jason Zaman from comment #28)
> > Can you re-emerge gcc-4.8.3 without any of the changes you've done (eg
> > remove the patch from comment #26).
> > 
> > then add "-fno-stack-check" to your CFLAGS and emerge xen-tools and then
> > test if the problem still exists?
> 
> Global CFLAGS does not seem to have an influence on xen-tools (4.4.1-r4)
> build.
> Anyway I added a line in xen-4.4.1/Config.mk I was able to merge it.
> The problem is still present after this rebuild.

I confirm this. I've added 
CFLAGS += -fno-stack-check
to xen-tools-4.4.1-r4/work/xen-4.4.1/Config.mk

checked that it's used in compiling output, but segfaults still happen.
Comment 34 Magnus Granberg gentoo-dev 2014-11-30 14:49:15 UTC
(In reply to cyberbat from comment #33)
> (In reply to Romain Bezut from comment #29)
> 
> > (In reply to Jason Zaman from comment #28)
> > > Can you re-emerge gcc-4.8.3 without any of the changes you've done (eg
> > > remove the patch from comment #26).
> > > 
> > > then add "-fno-stack-check" to your CFLAGS and emerge xen-tools and then
> > > test if the problem still exists?
> > 
> > Global CFLAGS does not seem to have an influence on xen-tools (4.4.1-r4)
> > build.
> > Anyway I added a line in xen-4.4.1/Config.mk I was able to merge it.
> > The problem is still present after this rebuild.
> 
> I confirm this. I've added 
> CFLAGS += -fno-stack-check
> to xen-tools-4.4.1-r4/work/xen-4.4.1/Config.mk
> 
> checked that it's used in compiling output, but segfaults still happen.
-fstack-check=no
https://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Code-Gen-Options.html#Code-Gen-Options
Comment 35 cyberbat 2014-11-30 15:06:09 UTC
(In reply to Magnus Granberg from comment #34)

> -fstack-check=no
> https://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Code-Gen-Options.html#Code-Gen-
> Options

Still segfaulting. Compiled using ebuild compile,ebuild install, ebuild qmerge.
Comment 36 Magnus Granberg gentoo-dev 2014-11-30 19:47:45 UTC
Created attachment 390666 [details, diff]
build gcc libgcc without -fstack-check

Test this patch for gcc. It build libgcc without -fstack-check.
Comment 37 Eric Gisse 2014-11-30 20:04:41 UTC
I was hitting this issue as well on app-emulation/xen-tools-4.4.1-r3

Eg, xl create will build the VM successfully then segfault. 

With attachment 390666 [details, diff] applied the issue with xl was resolved once sys-devel/gcc-4.8.3 was rebuilt.

I can build a new VM via xl create, without any segfaults.

Note: **No changes** were needed for xen-tools itself to resolve the issue, just that patch.
Comment 38 cyberbat 2014-12-01 00:15:57 UTC
(In reply to Eric Gisse from comment #37)
> With attachment 390666 [details, diff] [details, diff] applied the issue with xl was
> resolved once sys-devel/gcc-4.8.3 was rebuilt.
> 
> I can build a new VM via xl create, without any segfaults.
> 

I confirm this.
Comment 39 KK 2014-12-01 21:10:03 UTC
Hi Yixun!
(In reply to Yixun Lan from comment #26)
> Created attachment 390616 [details, diff] [details, diff]
> patch for gcc-4.8.3 to disable -stack-check
> 
> this issue due to stack check enabled in hardended system
Appologies for my late reply, but you have been much quicker than your previous reply had suggested - you seem to have been working in turbo mode over the weekend. I can now, however, confirm that your patch also works for HVM domains with PCI passthrough: My system is finally up and running again.

I have also tested the seond patch:
(In reply to Magnus Granberg from comment #36)
> Created attachment 390666 [details, diff] [details, diff]
> build gcc libgcc without -fstack-check
> 
> Test this patch for gcc. It build libgcc without -fstack-check.
This patch also works for HVM domains with PCI passthrough.

From my side there's just one question remaining: Are any of those patches going to find their way into the standard gentoo ebuilds for (hardened) gcc or is there more behind the bug and this is only a temporary solution until the real cause of the issue has been found?

Thanks again to everybody who has worked on this

KK
Comment 40 Magnus Granberg gentoo-dev 2014-12-03 21:57:19 UTC
Have updated the piepatchset to 0.6.1
Toolchain do we bump or just update the stable version with
updated piepatchset?
Comment 41 Magnus Granberg gentoo-dev 2015-01-05 23:30:02 UTC
Fixed in gcc 4.8.4 and piepatchset 0.6.2 for gcc 4.9.2
Thanx for the report.
Comment 42 cyberbat 2015-01-06 00:30:13 UTC
Have tried to reemerge xen-tools 4.4.1-r4 with gcc-4.8.4. Seems to be no segfaults on xl create.