compiling sys-libs/zlib-1.2.3-r1 with gcc-4.4 and -O3, specifically the -ftree-vectorize flag causes applications to crash. I've noticed this in both mozilla-firefox and mozilla-thunderbird. Compiling with: CFLAGS="-march=native -O2 -fomit-frame-pointer -pipe" causes no problems. Additionally -O3 works fine on the stable gcc, so this appears to be a regression of some kind. Reproducible: Always Steps to Reproduce: 1. install gcc-4.4.0 2. set CFLAGS="-march=native -O3 -fomit-frame-pointer -pipe" or set CFLAGS="-march=native -O2 -ftree-vectorize -fomit-frame-pointer -pipe" 3. attempt to run firefox Actual Results: segmentation fault (appears to be a null pointer) emerge --info Portage 2.2_rc28 (default/linux/x86/2008.0/desktop, gcc-4.4.0, glibc-2.8_p20080602-r1, 2.6.28-gentoo-r5 i686) ================================================================= System uname: Linux-2.6.28-gentoo-r5-i686-Intel-R-_Core-TM-2_Duo_CPU_T7700_@_2.40GHz-with-gentoo-1.12.11.1 Timestamp of tree: Sat, 16 May 2009 17:30:01 +0000 app-shells/bash: 3.2_p39 dev-java/java-config: 2.1.7 dev-lang/python: 2.6.2 dev-util/cmake: 2.6.2-r1 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.13, 2.63 sys-devel/automake: 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-march=native -O2 -fomit-frame-pointer -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/4.2/env /usr/kde/4.2/share/config /usr/kde/4.2/shutdown /usr/share/config /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-march=native -O2 -fomit-frame-pointer -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="collision-protect distlocks fixpackages parallel-fetch preserve-libs protect-owned sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="en_US en" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY=" " SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X a52 aac accessibility acl acpi alsa apache2 arts avahi bash-completion berkdb bluetooth boost branding bzip2 cairo captury cdr chroot cleartype cli cracklib crypt cups curl cvs dbus debugger dell dhcp divx doc dri dvd dvdr dvdread eds emboss encode esd evo examples expat fam fat ffmpeg firefox flac gdbm gif glibc-omitfp glitz gmp gnome gnutls google-gadgets gpm graphite graphviz gstreamer gtk hal htmlhandbook iconv imagemagick innodb inotify ipod ipv6 ipw3945 isdnlog jadetex java java6 jpeg jpeg2k kde kdeprefix kpathsea kqemu lame ldap lesstif libnotify libwww lm_sensors mad mdnsresponder-compat midi mikmod mjpeg mmap mmx mng mono mp3 mp4 mpeg mpeg2 mplayer mudflap mysql ncurses network-cron nls nptl nptlonly nsplugin ntfs nvidia ogg openal openexr opengl openmp openssl oss pam pango pcap pch pcre pdf perl phonon php physfs plasma pmu png posix ppds pppd python qt3 qt3support qt4 quicktime readline reflection rss rtc samba sdk sdl session smp sockets spell spl sqlite sqlite3 sse sse2 ssl ssse3 startup-notification subversion svg sysfs tcl tcpd templates theora threads thumbnail tiff tivo tk truetype unicode usb userlocales utempter v4l vcd vim-syntax vnc vorbis webkit wifi win32codecs wireshark wmf wmp wxwindows x86 xanim xcomposite xft xine xinerama xml xorg xpm xrandr xrender xscreensaver xulrunner xv xvid zeroconf zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse synaptics evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en_US en" USERLAND="GNU" VIDEO_CARDS="nvidia" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
last time this came up was Bug 151394. but just like there, firefox isnt really a useful test case. we need something reduced or the bug report is going to sit around indefinitely.
Fair enough, I'll try to come up with a minimal program that triggers the bug when I have a chance.
seems to be x86 specific. i can't reproduce here.
(In reply to comment #3) > seems to be x86 specific. i can't reproduce here. > Good. I want GCC 4.4 unmasked for amd64 today.
*** Bug 281758 has been marked as a duplicate of this bug. ***
(In reply to comment #4) > (In reply to comment #3) > > seems to be x86 specific. i can't reproduce here. > Good. I want GCC 4.4 unmasked for amd64 today. Not 4.4.0. IMHO. For 4.4.1 & this bug exists test (I don't try, only mozillas). See Bug 281758 (links to tests too).
Created attachment 201741 [details, diff] gcc-4.4-pr40838-1.patch This new patch only looks working for me for x86_32, gcc 4.4.1, vectorizer and sse, at least like 4.3.3 (by idea must be better then 4.3.* ...). From http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 with fixed pathes.
*** Bug 283487 has been marked as a duplicate of this bug. ***
*** Bug 265986 has been marked as a duplicate of this bug. ***
*** Bug 256677 has been marked as a duplicate of this bug. ***
*** Bug 283183 has been marked as a duplicate of this bug. ***
*** Bug 278798 has been marked as a duplicate of this bug. ***
perhaps we should add arch/x86/profile.bashrc like the amd64 one and have it whine/barf when someone is using -ftree-vectorize in CFLAGS. stack alignment is known to be screwed with x86 and optimization and sse instructions for pretty much all versions of gcc.
it works well enough in 4.3, if just because it disables itself when it sees anything scary, and hopefully 4.5 will be fixed. a big fat warning would probably help in the meantime though.
*** Bug 282341 has been marked as a duplicate of this bug. ***
*** Bug 283220 has been marked as a duplicate of this bug. ***
Try to use "-mstackrealign" in CFLAGS in cases when -msse (-msse*, -march=pentium4, etc) is enabled. I use this system-wide with published patch (patch is good enough), but IMHO "-mstackrealign" must work without it. IMHO good idea to add -mstackrealign into wiki pages about "safe cflags" into 32bit sse targets. Just I read about bugs with -mstackrealign in old gcc, but there are too old to be interesting for me. Additional link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41025 - gcc self-compiling failed with "-mstackrealign", but other packages problems not found while.
(In reply to comment #17) > Additional link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41025 - gcc > self-compiling failed with "-mstackrealign", but other packages problems not > found while. Sorry, link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156
yes, lets add more random CFLAGS. you don't need to add anything to the Safe CFLAGs page. it recommends -O2 which will not trigger this.
(In reply to comment #19) > yes, lets add more random CFLAGS. > > you don't need to add anything to the Safe CFLAGs page. it recommends -O2 > which will not trigger this. CFLAGS within -O3 as well as within any -Ox should be considered safe, they are NOT random. If they are not safe, this is serious gcc bug.
(In reply to comment #19) > yes, lets add more random CFLAGS. > > you don't need to add anything to the Safe CFLAGs page. it recommends -O2 > which will not trigger this. > This problem not for -O3 / vectorizer. For this case there are only easy visible. But even with -O2 sse code may be dangerous.
Created attachment 205198 [details, diff] sse & 32bit -> -mstackrealign There are my experimental patch to make stack realign default equal to SSE & 32bit (exclude gcc libs while, may be fixed - "Stack alignment in unwind library is unsupported."). IMHO it works.
when it goes in upstream, we'll add it.
*** Bug 286189 has been marked as a duplicate of this bug. ***
I found, at least on new AMD CPUs this problem absent. You may run same "broken" 32bit code on AMD without bugs. I found no documents, only Athlon 7550 and /proc/cpuinfo examples on net. IMHO there are shown by CPU flag "misalignsse", also IMHO sse4a are satellite (may be used as cname) for this feature. New patches with descriptions posted here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156 PS This changes looks like not actual for 64bit CPU, but 32bit code is less RAM-aggressive...
*** Bug 317603 has been marked as a duplicate of this bug. ***
*** Bug 323431 has been marked as a duplicate of this bug. ***
(In reply to comment #25) > I found, at least on new AMD CPUs this problem absent. Denis, on some cores aligned accesses are aliases to unaligned ones. Other cores would trigger an exception. For example, with AMD, this is one of differences betweeen Athlons and Opterons. What makes me very unhappy is that I vastly depend on the vectorizer in some computational code. Most of the time I run it on amd64 but sometimes on x86. Enabling -ftree-vectorize for anything in sci-libs/ would be inelegant. In my opinion, having this kind of problems is enough reason to hard mask the package.
GCC people discuss the situation when stack is unaligned on the entry. The problem, which I observe (see Bug #323431), happens even if stack is aligned on the entry to function. Correct me, if I miscounted: inflate_table: .LFB45: .file 1 "inftrees.c" .loc 1 39 0 .LVL0: pushl %ebp ; -4 .LCFI0: .loc 1 108 0 pxor %xmm0, %xmm0 .loc 1 39 0 movl %esp, %ebp .LCFI1: pushl %edi .LCFI2: pushl %esi .LCFI3: pushl %ebx .LCFI4: call .L101 .L101: popl %ebx addl $_GLOBAL_OFFSET_TABLE_+[.-.L101], %ebx subl $188, %esp .LCFI5: .loc 1 108 0 movdqa %xmm0, -56(%ebp) ; -56-4=-60, unaligned The problem which is closer to the problem which at least I have (that is, when everything is compiled with gcc-4.4.3-r2 with -ftree-vectorize is discussed here http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156
Alexander, you're omitting to consider that call pushes EIP + 4 onto the stack. Thus on entry the stack is already misaligned by -4, so gcc is correct. If the crash is happening in firefox, it's because some hand-rolled asm (usually in script languages' native interfaces, e.g. JS XPCOM, or Java JNI) is failing to preserve alignment. If you climb the stack, looking at %ebp, you should be able to find the bad code. The correct alignment of %ebp is -8. I've found that a combination of fixing asm where possible, and applying -mstackrealign where not, suffices to prevent all crashes.
Thank you, Ed. I missed that. The life on caffeine doesn't promote a sharp mind :) Would you please share, why the problem is x86 specific -- by occasion or there is some cause which makes stack never misaligned on amd64?
Stack alignment (especially userspace) appears to be a matter of historical consensus (and lack of) between compiler and OS vendors, rather than having any design or standards. The fact that vendors have settled on 16-byte alignment on x86-64 seems driven by the preexistence of sse2 and other 16-byte aligned instruction sets; I haven't found any reference to it in AMD material (admittedly, I haven't looked too hard). Sadly, I expect that when some future super-vectorised instruction set extension requires 32- or 64-byte alignment, we're going to see this issue all over again :)
Upgraded to gcc-4.4.3-r2 & recompiled system. Seamonkey & Icecat, basically all Mozilla GUI browsers segfault with little to no explanation. After recompiling sys-libs/zlib-1.2.3-r1 using debug cflags, my @#$@#$ unexplained segfaults were immediately resolved. Is there a standard work around for this?? Can a user upgrade to 1.2.4 or 1.2.5 to evade this GCC bug with zlib?
there is no "working" version of gcc, thus masking any package or version makes no sense. the workaround is to not use -ftree-vectorize. complaining on this bug will also make no difference to the issue. we know the issue exists, we arent experts (or even passing knowledgeable) in the code in question to attempt changing anything, nor are there any real patches that can be considered for us to merge. so track it in the already linked upstream bug if you want to keep up-to-date.
aka "-fno-tree-vectorize" # cat /etc/portage/env/sys-libs/zlib CFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointer -fno-tree-vectorize" CXXFLAGS="${CFLAGS}" LDFLAGS="" However, granted as you stated, the bug still exists and this is only one package worked-around versus recompiling the entire system to use "-fno-tree-vectorize". Using 32 bit x86 (P3) here. Watching http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156 here.
Roger: Please check that you have the patch from https://bugzilla.mozilla.org/show_bug.cgi?id=552627 - this should fix xulrunner (firefox etc.) crashes that are caused by JS reflection.
(In reply to comment #35) > aka "-fno-tree-vectorize" Do not forget, that -fno-tree-vectorize is not a complete workaround. Tree vectorizer only tends to misalign SSE and alike code on x86 frequently. There is no guaranty at all that misalignment will not occur under some other conditions. -mstackrealing is not an ultimate solution also as described above. So the only solution is to pray for gcc guys fixing this someday before the end of the world 8-).
(In reply to comment #37) > -mstackrealing is not an ultimate solution also as described above. So the only > solution is to pray for gcc guys fixing this someday before the end of the > world 8-). Waiting for changes in gcc is unlikely to achieve anything; the ABI is now 16 bytes and packages are expected to work with it or use -mstackrealign (what are you referring to "above"?). There's not much need for this bug to remain open now; what we need is a tracker bug to individually fix broken packages (mozilla, OOo, java, etc.).
*** Bug 326579 has been marked as a duplicate of this bug. ***
*** Bug 333307 has been marked as a duplicate of this bug. ***
*** Bug 308133 has been marked as a duplicate of this bug. ***
*** Bug 341725 has been marked as a duplicate of this bug. ***
I tested everything what what is breaking it for me is -funroll-loops. The rest, including -ftree-vectorize, has no influence. I have also not observed the crashes reported and I've recompiled the whole installation with: -O2 -march=i686 -mtune=athlon-xp -funroll-loops -ftree-vectorize Please check if -funroll-loops is also breaking it for you. Best regards, Tiago
(In reply to comment #43) > Please check if -funroll-loops is also breaking it for you. I do not use -funroll-loops. As of gcc-4.5.1-r1 -ftree-vectorize is still broken, or precisely speaking SSE alignment is still broken. If you have not hit this you may be just lucky. However, -ftree-vectorize together with -mstackrealign (as described in the upstream bug) works fine for me on both Athlon-XP and Atom N270.
*** Bug 356159 has been marked as a duplicate of this bug. ***
hopefully should be addressed in gcc-4.5+, and since 4.5.3 is stable now, close this out
confirmed: sys-libs/zlib-1.2.5-r2 compiled fine here using CFLAGS fno-tree-vectorize with gcc-4.5.3 (i686-pc-linux-gnu-4.5.3)
With gcc-4.5.3 -ftree-vectorize works for me only with -mstack-realign, otherwise random packages fail.
The bug is NOT FIXED in gcc-4.5. I am using ~x86 on P4 and still need a workaround. CFLAGS -fno-tree-vectorize or -mstackrealing were always been just workarounds. Also note that the upstream bug reffered-to in the URL is still open, but there is no activity for a year now. Maybe closing this bug as WONTFIX would be appropriate, since it affects custom CFLAGS (-O3 or -ftree-vectorize) only.
the upstream bug is not relevant. that talks about assembly functions that misalign the stack in violation of the x86 ABI. -mstackrealign is a cudgel to workaround those. the upstream reporter wants all functions to check their stack alignment instead of requiring hand written asm to realign their stack. if zlib is misaligning the stack in assembly code, then that's a bug in zlib. if other projects are misaligning the stack before calling zlib and then crashing, that's a bug in those other projects. if gcc is actually misaligning the stack, then that's a bug in gcc. so at this point in time, we need to narrow down exactly what's crashing.
(In reply to comment #50) > if zlib is misaligning the stack in assembly code, then that's a bug in > zlib. if other projects are misaligning the stack before calling zlib and > then crashing, that's a bug in those other projects. if gcc is actually > misaligning the stack, then that's a bug in gcc. > > so at this point in time, we need to narrow down exactly what's crashing. As mentioned in comment #30, the problem is other projects are misaligning the stack before calling zlib. There are several bugs open in the mozilla's bugzilla regarding stack misalignment and we continue to have issues because of it (eg. avx).
(In reply to comment #51) great. so we agree zlib isn't broken (mozilla is), and the right thing to do is fix those projects. nothing for gcc to do here. people who want to use full tree vectorize flags can turn on -mstackrealign in zlib via /etc/portage/env/. i don't think using that flag in firefox would help if the crash occurs when calling zlib since that flag realigns stack upon function entry. my understanding of the new AVX insns is that their alignment requirements are much less than that of SSE. so as people convert to AVX, the issue will get better. this doesn't help the poor saps on 32bit installs though :).
(In reply to comment #52) > (In reply to comment #51) > > great. so we agree zlib isn't broken (mozilla is), and the right thing to > do is fix those projects. nothing for gcc to do here. > > people who want to use full tree vectorize flags can turn on -mstackrealign > in zlib via /etc/portage/env/. i don't think using that flag in firefox > would help if the crash occurs when calling zlib since that flag realigns > stack upon function entry. > > my understanding of the new AVX insns is that their alignment requirements > are much less than that of SSE. so as people convert to AVX, the issue will > get better. this doesn't help the poor saps on 32bit installs though :). Ok, seems you are right, since all the problematic packages (callers of zlib) afaik mess with asm. So we are supposed to file bugs for firefox and the like. Sorry for the noise.
-ftree-vectorize is badly broken on ~x86 again. Even together with -mstackrealign it miscompiles code. At leasth 3 cases found: 1) net-misc/rsync-3.0.9.r2 Daemon part hangs during data exchange over ssh. 2) dev-lang/python-3.2.3-r1 ./python -E ./setup.py build loops forever during package build at 100% CPU usage. 3) www-client/chromium-20.0.1132.17 fails during linking. All issues above were fixed by purging -ftree-vectorize -mstackrealign from *FLAGS.
... with gcc-4.6.3.
(In reply to comment #54) file a new bug. this one has gone on long enough.