If I run the small program (C-sources will be attached) it hangs at MPI_Init, at MPI_Finalize or it crashes with Segfaults: rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(26)$ mpirun -np 2 ex1a argc= 1 argv=0x7fff264db5a8 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' argc= 1 argv=0x7fff91bfc8b8 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' ^C^\Verlassen rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(27)$ mpirun -np 2 ex1a argc= 1 argv=0x7fff030d23e8 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' argc= 1 argv=0x7fff2d2cfdc8 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' rc=0 MPI_SUCCESS=0 rc=0 MPI_SUCCESS=0 Hello, world. I am 1 of 2 on lynx Hello, world. I am 0 of 2 on lynx WE have 2 processes Hello 1 Processor 1 at node lynx reporting for duty rank= 1 numtask= 2 processor_name=lynx, before 'sleep(10)' rank= 0 numtask= 2 processor_name=lynx, before 'sleep(10)' rank= 1 numtask= 2 processor_name=lynx, after 'sleep(10)' rank= 0 numtask= 2 processor_name=lynx, after 'sleep(10)' rank= 1, before 'MPI_Finalize' rank= 0, before 'MPI_Finalize' ^C rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(28)$ mpirun -V mpirun (Open MPI) 1.5.3 rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(29)$ emerge -pvD openmpi These are the packages that would be merged, in order: Calculating dependencies... done! [ebuild R ] sys-cluster/openmpi-1.5.3-r1 USE="cxx fortran ipv6 mpi-threads romio threads -heterogeneous -infiniband -pbs -sctp -vt" 0 kB Total: 1 package (1 reinstall), Size of downloads: 0 kB rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(4)$ mpirun -np 2 ex1a argc= 1 argv=0x7fff8e92e088 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' argc= 1 argv=0x7fff8e5aa098 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' rc=0 MPI_SUCCESS=0 rc=0 MPI_SUCCESS=0 Hello, world. I am 0 of 2 on orca Hello, world. I am 1 of 2 on orca WE have 2 processes Hello 1 Processor 1 at node orca reporting for duty rank= 1 numtask= 2 processor_name=orca, before 'sleep(10)' rank= 0 numtask= 2 processor_name=orca, before 'sleep(10)' rank= 1 numtask= 2 processor_name=orca, after 'sleep(10)' rank= 0 numtask= 2 processor_name=orca, after 'sleep(10)' rank= 1, before 'MPI_Finalize' rank= 0, before 'MPI_Finalize' [orca:06796] *** Process received signal *** [orca:06796] Signal: Segmentation fault (11) [orca:06796] Signal code: Address not mapped (1) [orca:06796] Failing at address: 0x7fb61c84d460 Speicherzugriffsfehler rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(5)$ mpirun -V mpirun (Open MPI) 1.5.3 If I run the same program with openmpi-1.4.3, it works without any problems: rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(18)$ time mpirun -np 2 ex1a argc= 1 argv=0x7fff2b732b28 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' argc= 1 argv=0x7fffe6d403b8 i= 0[ 1] argv[i]=|ex1a| before 'MPI_Init(&argc,&argv)' rc=0 MPI_SUCCESS=0 Hello, world. I am 0 of 2 on caiman WE have 2 processes rc=0 MPI_SUCCESS=0 Hello, world. I am 1 of 2 on caiman Hello 1 Processor 1 at node caiman reporting for duty rank= 1 numtask= 2 processor_name=caiman, before 'sleep(10)' rank= 0 numtask= 2 processor_name=caiman, before 'sleep(10)' rank= 1 numtask= 2 processor_name=caiman, after 'sleep(10)' rank= 0 numtask= 2 processor_name=caiman, after 'sleep(10)' rank= 1, before 'MPI_Finalize' rank= 0, before 'MPI_Finalize' [3]+ Fertig emacs -i $GEOMETRY $NO_DOS_CONV -name "$BASENAME" "$*" real 0m23.077s user 0m0.087s sys 0m0.277s Reproducible: Always rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(31)$ emerge --info Portage 2.1.10.3 (default/linux/amd64/10.0/desktop, gcc-4.5.2, glibc-2.13-r2, 2.6.39.2 x86_64) ================================================================= System uname: Linux-2.6.39.2-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T8300_@_2.40GHz-with-gentoo-2.0.3 Timestamp of tree: Sun, 03 Jul 2011 07:00:01 +0000 app-shells/bash: 4.2_p10 dev-java/java-config: 2.1.11-r3 dev-lang/python: 2.7.2, 3.2 dev-util/cmake: 2.8.4-r1 dev-util/pkgconfig: 0.26 sys-apps/baselayout: 2.0.3 sys-apps/openrc: 0.8.3-r1 sys-apps/sandbox: 2.5 sys-devel/autoconf: 2.13, 2.68 sys-devel/automake: 1.9.6-r3, 1.10.3, 1.11.1-r1 sys-devel/binutils: 2.21.1 sys-devel/gcc: 4.5.2 sys-devel/gcc-config: 1.4.1-r1 sys-devel/libtool: 2.4-r1 sys-devel/make: 3.82-r1 sys-kernel/linux-headers: 2.6.38 (virtual/os-headers) sys-libs/glibc: 2.13-r2 Repositories: gentoo lordvan x11 java-overlay sunrise arcon science qting-edge ibormuth bicatali local x-cpan g-octave ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="* -@EULA PUEL dlj-1.1 skype-eula googleearth AdobeFlash-10.1 cadsoft" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/maven-bin-3.0/conf /usr/share/openvpn/easy-rsa /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.2/ext-active/ /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.2/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.2/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=native -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests binpkg-logs distlocks ebuild-locks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch" FFLAGS="-march=native -O2 -pipe" GENTOO_MIRRORS="http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo ftp://ftp.easynet.nl/mirror/gentoo/ " LANG="de_DE.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="de fr" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/var/lib/layman/lordvan /var/lib/layman/x11 /var/lib/layman/java-overlay /var/lib/layman/sunrise /var/lib/layman/arcon /var/lib/layman/science /var/lib/layman/qting-edge /var/lib/layman/ibormuth /var/lib/layman/bicatali /usr/local/portage /var/lib/cpan /var/lib/g-octave" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="64bit R X Xaw3d a52 aac abiword acl acpi admin afs alsa amd64 ao apache2 applet archive arpack asf aspell assistant audacious audiofile automap automount bash-completion beagle berkdb blas blast bluetooth boo boost branding bzip2 cairo cardbus cdda cddb cdf cdio cdparanoia cdr cg cgi chm cli consolekit corba cracklib crypt css cuda cups curl cxx daap db dbase dbi dbm dbus declarative designer devhelp device-mapper dga dia djvu doc dri ds2490 ds9097 ds9097u dts dv dvb dvd dvdr dvi dynamicplugin eds elf emacs emboss emf encode epiphany evo examples exif expat extensions extra extras fam fame ffmpeg fftw firefox fits flac fltk fontconfig foomaticdb fortran fortran95 fpx fts3 fuse galago garmin gcj gd gdal gdbm gdu gedit geoip geolocation geos gfortran gif gimp ginac git glade glib gml gmp gmtsuppl gnome gnome-keyring gnome-print gnuplot gnutls gphoto2 gpm grammar graphics graphtft graphviz grass gs gsl gsm gstreamer gtk gudev guile harness hddtemp hdf hdf5 hdri http httpd hvm hwdb iconv icq icu id3 ide imagemagick imap innodb inotify ipod ipv6 irda ithreads jabber jadetex java java6 jbig john jpeg jpeg2k kdrive kerberos kpathsea kqemu kvm ladspa lame lapack laptop latex latex3 lcms ldap lensfun libffi libgda libnotify libsamplerate lirc lua lzo mad mail maildir mapnik math matroska mkl mmx mmxext mng modules mono moonlight motif mozilla mp3 mp4 mpeg mpi mpi-threads mplayer mtp mudflap multilib musicbrainz mysql mysqli nautilus ncurses neXt netcdf netpbm network networking networkmanager nfs nls nntp nptl nptlonly nsplugin ntfs ntp numpy obex objc ocaml octave odbc ogdi ogg ole openexr opengl openmp overview pae pam pango pcre pda pdf perl plotutils plugins png podcast policykit portaudio posix postgres postscript ppds pppd preview-latex proj projectx pstricks pulseaudio python python-bindings q16 q32 qemu qhull qt3support qt4 quicktime readline reiserfs reports rle romio rpc rrdcgi rrdtool samba sasl science sdk sdl secure-delete semantic-desktop server session sip slang slp smbclient smp sms sndfile snmp soup sox speex spell sql sqlite sse sse2 sse4 ssl ssse3 startup-notification stlport subtitles subversion suexec svg svm swig sysfs szip t1lib tcl tcpd tex tex4ht texmacs tgif theora thinkpad threads thunderbird tidy tiff tk tntc tools truetype udev unicode usb userlocales utempter v4l2 video virtualbox vorbis wav webdav webdav-serf webkit wifi wmf wxwidgets x264 xattr xcb xemacs xext xine xml xmlreader xmlrpc xorg xpm xulrunner xv xvid xvmc yaml zlib zvbi" ALSA_CARDS="intel8x0" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgid dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="braindump flow karbon kexi kpresenter krita tables words" CAMERAS="canon" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" DVB_CARDS="usb-wt220u" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de fr" NETBEANS_MODULES="apisupport cnd dlight enterprise ergonomics groovy gsf harness ide identity j2ee java mobility nb php profiler ruby websvccommon xml" PHP_TARGETS="php5-3" QEMU_SOFTMMU_TARGETS="i386 ppc ppc64 x86_64" QEMU_USER_TARGETS="arm i386 x86_64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nv nouveau vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Created attachment 278935 [details] ex1a.c
Created attachment 278937 [details] Makefile to generate ex1a
The same happens with openmpi-1.5.3-r2.
Hello. I tried to reproduce this error but did not succeed. Could you try to run your example with a debugger (e.g. ddd) to see exactly where the problem comes from. Printf debugging is always very bad, as e.g. stdout is buffered and messages do not appear in the order they should. ;-] # compile with mpicc -g ex1a.c -o ex1a # then run with mpirun -np 2 ddd ex1a It could also be helpful, if you build Open MPI by hand with the --enable-debug configure option to locate the problem, if it is not ebuild related.
Hallo Christoph, if I run the program under DDD: rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(11)$ mpirun -np 2 ddd ex1a_debug I see in the first DDD window: (gdb) run [Thread debugging using libthread_db enabled] argc= 1 argv=0x7fffffffc628 i= 0[ 1] argv[i]=|/home_orca/rose/Txt/src/Test/C/MPI/Ex1a/ex1a_debug| before 'MPI_Init(&argc,&argv)' [New Thread 0x7ffff40cd700 (LWP 14357)] [New Thread 0x7fffef6c6700 (LWP 14358)] rc=0 MPI_SUCCESS=0 Hello, world. I am 0 of 2 on orca WE have 2 processes Hello 1 Processor 1 at node orca reporting for duty rank= 0 numtask= 2 processor_name=orca, before 'sleep(10)' rank= 0 numtask= 2 processor_name=orca, after 'sleep(10)' rank= 0, before 'MPI_Finalize' [Thread 0x7ffff40cd700 (LWP 14357) exited] [Thread 0x7fffef6c6700 (LWP 14358) exited] [New Thread 0x7fffef6c6700 (LWP 14365)] [Thread 0x7fffef6c6700 (LWP 14365) exited] [New Thread 0x7fffef6c6700 (LWP 14367)] [Thread 0x7fffef6c6700 (LWP 14367) exited] Program exited normally. (gdb) In the second DDD window: I see: (gdb) run [Thread debugging using libthread_db enabled] argc= 1 argv=0x7fffffffc628 i= 0[ 1] argv[i]=|/home_orca/rose/Txt/src/Test/C/MPI/Ex1a/ex1a_debug| before 'MPI_Init(&argc,&argv)' [New Thread 0x7ffff40cd700 (LWP 14362)] [New Thread 0x7fffef6c6700 (LWP 14363)] rc=0 MPI_SUCCESS=0 Hello, world. I am 1 of 2 on orca rank= 1 numtask= 2 processor_name=orca, before 'sleep(10)' rank= 1 numtask= 2 processor_name=orca, after 'sleep(10)' rank= 1, before 'MPI_Finalize' [Thread 0x7ffff40cd700 (LWP 14362) exited] [Thread 0x7fffef6c6700 (LWP 14363) exited] [New Thread 0x7fffef6c6700 (LWP 14364)] [Thread 0x7fffef6c6700 (LWP 14364) exited] [New Thread 0x7fffef6c6700 (LWP 14366)] [Thread 0x7fffef6c6700 (LWP 14366) exited] Program exited normally. (gdb) So everything seems to be fine. If I run the program compiled without "-g" directly, it hangs at the beginning, I see only: rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(12)$ mpirun -np 2 ex1a argc= 1 argv=0x7fff0e06d478 i= 0[ 1] argv[i]=|ex1a| ^C^\Verlassen I can only kill it with ^\. What a version of gcc and glibc do you use? What can I still test? Regards Juergen
If I run the ex1a version compiled with the "-g" flag without the debugger, it hangs as well: rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(20)$ mpirun -np 2 ex1a_debug argc= 1 argv=0x7fff920e66e8 i= 0[ 1] argv[i]=|ex1a_debug| ^\Verlassen
Still some additional information, if I run 'mpirun -np 2 ddd ex1a_debug' with openmpi-1.5.3-r2, mpirun does not finish. I do not get a prompt after quitting ddd. I have to kill mpirun with ^C: rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(11)$ mpirun -np 2 ddd ex1a_debug ^C rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(12)$ If I run with openmpi-1.4.3, mpirun finishes after quitting ddd: rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(7)$ mpirun -np 2 ddd ex1a_debug rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(8)$ So I assume that the error is not in ex1a_debug but in mpirun belonging to openmpi-1.5.3-r2.
Hello Juergen, The problem seems to be the mpi-thread use flag which enables mpi-threads and progress threads at the same time. Maybe there should be a separate use flag for progress-threads in the Open MPI ebuild as they have nothing to do with mpi-threads in the future. Disabling mpi-threads solved the problem for me so far. Regards Christoph
Danke Christoph, your hint works. After removing the mpi-threads flag from USE in /etc/make.conf, doing 'emerge -uvND world' and recompiling my test program it seems to work correctly.
No longer reproducible with openmpi-1.8.7 (with mpi-threads enabled).