I would like to report the problem with fftw3 library compiled on my laptop (Pentium M 1.6 GHz, Centrino). After emerging it with 'sse' USE flag, one of my C++ aplication (a communication system symulation) that is linked statically with libfftw3.a causes segfaults. After recompiling the fftw3 library with "USE=-sse" the problem does not exist anymore. Here is my 'emerge info' output: Portage 2.0.51-r15 (default-linux/x86/2004.3, gcc-3.3.5, glibc-2.3.4.20040808-r1, 2.6.10-gentoo-r6 i686) ================================================================= System uname: 2.6.10-gentoo-r6 i686 Intel(R) Pentium(R) M processor 1.60GHz Gentoo Base System version 1.4.16 Python: dev-lang/python-2.3.4-r1 [2.3.4 (#1, Feb 13 2005, 11:22:34)] dev-lang/python: 2.3.4-r1 sys-devel/autoconf: 2.59-r6, 2.13 sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4 sys-devel/binutils: 2.15.92.0.2-r1 sys-devel/libtool: 1.5.10-r4 virtual/os-headers: 2.4.21-r1 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms" GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo/ http://ftp.uni-erlangen.de/pub/mirrors/gentoo/ http://src.gentoo.pl http://gentoo.zie.pg.gda.pl" LANG="pl_PL" LC_ALL="pl_PL" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" USE="x86 X acpi acpi4linux alsa apache2 auctex bidi bitmap-fonts cddb cdparanoia cups dhcp divx4linux dvd dvdread edl emacs extras f77 fbcon fftw flac fortran freetype gd gimpprint gpm gtk gtk2 i8x0 jabber java jpeg jpeg2k kadu-modules kadu-voice lcd leim live maildir mailwrapper md5sum mmx motif mozilla moznocompose moznoirc moznomail mpeg mplayer network nls objc oggvorbis opengl pdflib pic plotutils png ppds quicktime readline real rtc sasl sdl sis spell sse ssl tetex tiff truetype truetype-fonts type1 type1-fonts unicode usb wmf xfs xmms xosd xv xvid xvmc video_cards_i810 linguas_pl" Unset: ASFLAGS, CBUILD, CTARGET, LDFLAGS /ediap
Can you reproduce this with less aggressive CFLAGS (for example, -O2)?
I have been using the -O2 optimisation for my system. See the 'emerge info' output in my previous post ;-) CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64"
I noticed that. I meant "-O2 without all that -fwhatever" - I know the flags you are using are considered mostly harmless, but I want to be sure.
Strange... I followed your suggestion, and recompiled fftw with USE=sse and CFLAGS="-march=pentium3 -O2 -pipe" only. And, as you might expect, noticed no problems using it in my simulator. So I added more and more of my previous flags until I ended up with CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64". Unfortunately this time I haven't noticed any segfaults caused by fftw. Maybe it was a false alarm. Sorry for that. Should any segfaults occur in the future, I will let you know by reopening this bug report. However, this time we can close this with a resolution "WORKSFORME".
OK
I encounter the segfault problems once again. Now I have: USE=sse CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64" and my program segfaults, which is shown here: Before FFT operation (N_FFT = 2048) After FFT operation Before FFT operation (N_FFT = 2048) Segmentation fault The "Before FFT..." and "After FFT..." strings are outputed with std::cerr function just before and after executing the FFT operation in my C++ simulator. std::cerr << "Before FFT operation (N_FFT = " << fft_length * upsampl << ")" << endl; cvec H_vec = fft(h_vec, fft_length * upsampl); std::cerr << "After FFT operation" << endl << endl; The same effect I have with: USE=sse CFLAGS="-march=pentium3 -O2 -pipe". But when I dissable the sse flag: USE=-sse CFLAGS="-march=pentium3 -O2 -pipe" the FFT operaton does not segfault my program: Before FFT operation (N_FFT = 2048) After FFT operation Before FFT operation (N_FFT = 2048) After FFT operation Finally, I have tried the fftw with: USE=-sse CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -fforce-addr -frename-registers -fprefetch-loop-arrays -falign-functions=64" and it also worked OK. So my proposal here is to disable 'sse' flag for the whole fftw. Just to remind, my architecture is based on Pentium M (Centrino), but since gcc-3.3.5 does not have special optimisation flag for such processor, I use -march=pentium3.
One more comment. From the documentation of the FFTW: --enable-sse, --enable-sse2, --enable-k7, --enable-altivec: Enable the compilation of SIMD code for SSE (Pentium III+), SSE2 (Pentium IV+), 3dNow! (AMD K7 and others), or AltiVec (PowerPC G4+). SSE, 3dNow!, and AltiVec only work with --enable-float (above), while SSE2 only works in double precision (the default). The resulting code will still work on earlier CPUs lacking the SIMD extensions (SIMD is automatically disabled, although the FFTW library is still larger). Which library is single precision and, which double in Gentoo? Maybe it is the reason it segfaults... I will check it later with my program and try to comment on it.
I think the problem is in the ebuild: if use sse; then myconfsingle="$myconfsingle --enable-sse" myconfdouble="$myconfdouble --enable-sse2" elif [...] SSE can be used on P3+ processors and above, but SSE2 is only for P4+ processors. So I suggest adding an additional 'sse2' flag to USE flags and write this part of ebuild: if use sse; then myconfsingle="$myconfsingle --enable-sse" elif use sse2; then myconfsingle="$myconfsingle --enable-sse" myconfdouble="$myconfdouble --enable-sse2" elif [...] What do you think?
Sounds reasonable... there are already two other packages which have this as a local USE flag. But I am afraid the way you proposed it, sse2 will not be evaluated if sse is in USE?
This should be fixed now - just flip the conditionals around and it evaluates everything fine :) Please test out sc-libs/fftw-3.0.1-r1 and let me know if that solves your problems. Only -O2 is allowed currently - I would like to test with less filtering when I get the chance as it mentions GCC 3.2 as the reason for filtering when the sse flag is set.
Thanks for the ebuild. It seems that now it is OK. After having compiled using "USE=sse emerge fftw" on my Pentium M platform, there are no segfaults when I link this library to my C++ simulation program. One question, by the way. I have set a global 'sse' flag in the make.conf: #v+ USE="-* X acpi acpi4linux alsa apache2 auctex avi bash-completion berkdb bidi bitmap-fonts blas cddb cdparanoia cscope cups dhcp divx4linux dvd dvdread edl emacs extras f77 fbcon fftw flac fortran freetype gcj gd gif gimpprint gpm gtk gtk2 i8x0 jabber java jpeg jpeg2k kadu-modules kadu-voice lcd leim live mad maildir mailwrapper md5sum mmx motif mozilla moznocompose moznoirc moznomail mp3 mpeg mplayer network nls objc oggvorbis opengl pdflib perl pic plotutils png ppds quicktime readline real rtc sasl sdl sis spell sse ssl tetex tiff truetype truetype-fonts type1 type1-fonts unicode usb userlocales wmf xfs xosd xv xvid xvmc zlib" #v- But after performing the following command: #v+ ediap@lespaul etc $ emerge -pv fftw These are the packages that I would merge, in order: Calculating dependencies ...done! [ebuild R ] sci-libs/fftw-3.0.1-r1 -3dnow (-altivec) -debug -mpi -sse* -sse2 0 kB Total size of downloads: 0 kB #v- The 'sse' flag seems to be not set. Do you happen to know why is is so? /ediap