Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 373923 - sys-cluster/openmpi-1.5.3-r1: mpirun hangs or fails with Segmentation faults
Summary: sys-cluster/openmpi-1.5.3-r1: mpirun hangs or fails with Segmentation faults
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Justin Bronder (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-03 15:47 UTC by Juergen Rose
Modified: 2015-08-05 12:44 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
ex1a.c (ex1a.c,1.79 KB, text/plain)
2011-07-03 15:49 UTC, Juergen Rose
Details
Makefile to generate ex1a (Makefile,60 bytes, text/plain)
2011-07-03 15:49 UTC, Juergen Rose
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Juergen Rose 2011-07-03 15:47:30 UTC
If I run the small program (C-sources will be attached) it hangs at MPI_Init, at MPI_Finalize or it crashes with Segfaults:

rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(26)$ mpirun -np 2  ex1a
argc= 1 argv=0x7fff264db5a8
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
argc= 1 argv=0x7fff91bfc8b8
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
^C^\Verlassen
rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(27)$ mpirun -np 2  ex1a
argc= 1 argv=0x7fff030d23e8
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
argc= 1 argv=0x7fff2d2cfdc8
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
rc=0   MPI_SUCCESS=0
rc=0   MPI_SUCCESS=0
Hello, world.  I am 1 of 2 on lynx
Hello, world.  I am 0 of 2 on lynx
WE have 2 processes
Hello 1 Processor 1 at node lynx reporting for duty

rank= 1  numtask= 2   processor_name=lynx, before 'sleep(10)'
rank= 0  numtask= 2   processor_name=lynx, before 'sleep(10)'
rank= 1  numtask= 2   processor_name=lynx, after  'sleep(10)'
rank= 0  numtask= 2   processor_name=lynx, after  'sleep(10)'
rank= 1,  before 'MPI_Finalize'
rank= 0,  before 'MPI_Finalize'
^C
rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(28)$ mpirun -V
mpirun (Open MPI) 1.5.3

rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(29)$ emerge -pvD openmpi

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R    ] sys-cluster/openmpi-1.5.3-r1  USE="cxx fortran ipv6 mpi-threads romio threads -heterogeneous -infiniband -pbs -sctp -vt" 0 kB

Total: 1 package (1 reinstall), Size of downloads: 0 kB


rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(4)$ mpirun -np 2  ex1a
argc= 1 argv=0x7fff8e92e088
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
argc= 1 argv=0x7fff8e5aa098
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
rc=0   MPI_SUCCESS=0
rc=0   MPI_SUCCESS=0
Hello, world.  I am 0 of 2 on orca
Hello, world.  I am 1 of 2 on orca
WE have 2 processes
Hello 1 Processor 1 at node orca reporting for duty

rank= 1  numtask= 2   processor_name=orca, before 'sleep(10)'
rank= 0  numtask= 2   processor_name=orca, before 'sleep(10)'
rank= 1  numtask= 2   processor_name=orca, after  'sleep(10)'
rank= 0  numtask= 2   processor_name=orca, after  'sleep(10)'
rank= 1,  before 'MPI_Finalize'
rank= 0,  before 'MPI_Finalize'
[orca:06796] *** Process received signal ***
[orca:06796] Signal: Segmentation fault (11)
[orca:06796] Signal code: Address not mapped (1)
[orca:06796] Failing at address: 0x7fb61c84d460
Speicherzugriffsfehler
rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(5)$ mpirun -V
mpirun (Open MPI) 1.5.3


If I run the same program with openmpi-1.4.3, it works without any problems:

rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(18)$ time mpirun -np 2  ex1a
argc= 1 argv=0x7fff2b732b28
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
argc= 1 argv=0x7fffe6d403b8
  i= 0[ 1]   argv[i]=|ex1a|
before 'MPI_Init(&argc,&argv)'
rc=0   MPI_SUCCESS=0
Hello, world.  I am 0 of 2 on caiman
WE have 2 processes
rc=0   MPI_SUCCESS=0
Hello, world.  I am 1 of 2 on caiman
Hello 1 Processor 1 at node caiman reporting for duty

rank= 1  numtask= 2   processor_name=caiman, before 'sleep(10)'
rank= 0  numtask= 2   processor_name=caiman, before 'sleep(10)'
rank= 1  numtask= 2   processor_name=caiman, after  'sleep(10)'
rank= 0  numtask= 2   processor_name=caiman, after  'sleep(10)'
rank= 1,  before 'MPI_Finalize'
rank= 0,  before 'MPI_Finalize'
[3]+  Fertig                  emacs -i $GEOMETRY $NO_DOS_CONV -name "$BASENAME" "$*"

real    0m23.077s
user    0m0.087s
sys     0m0.277s



Reproducible: Always




rose@lynx:/home/rose/Txt/src/Test/C/MPI/Ex1a(31)$ emerge --info
Portage 2.1.10.3 (default/linux/amd64/10.0/desktop, gcc-4.5.2, glibc-2.13-r2, 2.6.39.2 x86_64)
=================================================================
System uname: Linux-2.6.39.2-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T8300_@_2.40GHz-with-gentoo-2.0.3
Timestamp of tree: Sun, 03 Jul 2011 07:00:01 +0000
app-shells/bash:          4.2_p10
dev-java/java-config:     2.1.11-r3
dev-lang/python:          2.7.2, 3.2
dev-util/cmake:           2.8.4-r1
dev-util/pkgconfig:       0.26
sys-apps/baselayout:      2.0.3
sys-apps/openrc:          0.8.3-r1
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.13, 2.68
sys-devel/automake:       1.9.6-r3, 1.10.3, 1.11.1-r1
sys-devel/binutils:       2.21.1
sys-devel/gcc:            4.5.2
sys-devel/gcc-config:     1.4.1-r1
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82-r1
sys-kernel/linux-headers: 2.6.38 (virtual/os-headers)
sys-libs/glibc:           2.13-r2
Repositories: gentoo lordvan x11 java-overlay sunrise arcon science qting-edge ibormuth bicatali local x-cpan g-octave
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA PUEL dlj-1.1 skype-eula googleearth AdobeFlash-10.1 cadsoft"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/maven-bin-3.0/conf /usr/share/openvpn/easy-rsa /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.2/ext-active/ /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.2/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.2/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests binpkg-logs distlocks ebuild-locks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo ftp://ftp.easynet.nl/mirror/gentoo/ "
LANG="de_DE.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="de fr"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/lordvan /var/lib/layman/x11 /var/lib/layman/java-overlay /var/lib/layman/sunrise /var/lib/layman/arcon /var/lib/layman/science /var/lib/layman/qting-edge /var/lib/layman/ibormuth /var/lib/layman/bicatali /usr/local/portage /var/lib/cpan /var/lib/g-octave"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="64bit R X Xaw3d a52 aac abiword acl acpi admin afs alsa amd64 ao apache2 applet archive arpack asf aspell assistant audacious audiofile automap automount bash-completion beagle berkdb blas blast bluetooth boo boost branding bzip2 cairo cardbus cdda cddb cdf cdio cdparanoia cdr cg cgi chm cli consolekit corba cracklib crypt css cuda cups curl cxx daap db dbase dbi dbm dbus declarative designer devhelp device-mapper dga dia djvu doc dri ds2490 ds9097 ds9097u dts dv dvb dvd dvdr dvi dynamicplugin eds elf emacs emboss emf encode epiphany evo examples exif expat extensions extra extras fam fame ffmpeg fftw firefox fits flac fltk fontconfig foomaticdb fortran fortran95 fpx fts3 fuse galago garmin gcj gd gdal gdbm gdu gedit geoip geolocation geos gfortran gif gimp ginac git glade glib gml gmp gmtsuppl gnome gnome-keyring gnome-print gnuplot gnutls gphoto2 gpm grammar graphics graphtft graphviz grass gs gsl gsm gstreamer gtk gudev guile harness hddtemp hdf hdf5 hdri http httpd hvm hwdb iconv icq icu id3 ide imagemagick imap innodb inotify ipod ipv6 irda ithreads jabber jadetex java java6 jbig john jpeg jpeg2k kdrive kerberos kpathsea kqemu kvm ladspa lame lapack laptop latex latex3 lcms ldap lensfun libffi libgda libnotify libsamplerate lirc lua lzo mad mail maildir mapnik math matroska mkl mmx mmxext mng modules mono moonlight motif mozilla mp3 mp4 mpeg mpi mpi-threads mplayer mtp mudflap multilib musicbrainz mysql mysqli nautilus ncurses neXt netcdf netpbm network networking networkmanager nfs nls nntp nptl nptlonly nsplugin ntfs ntp numpy obex objc ocaml octave odbc ogdi ogg ole openexr opengl openmp overview pae pam pango pcre pda pdf perl plotutils plugins png podcast policykit portaudio posix postgres postscript ppds pppd preview-latex proj projectx pstricks pulseaudio python python-bindings q16 q32 qemu qhull qt3support qt4 quicktime readline reiserfs reports rle romio rpc rrdcgi rrdtool samba sasl science sdk sdl secure-delete semantic-desktop server session sip slang slp smbclient smp sms sndfile snmp soup sox speex spell sql sqlite sse sse2 sse4 ssl ssse3 startup-notification stlport subtitles subversion suexec svg svm swig sysfs szip t1lib tcl tcpd tex tex4ht texmacs tgif theora thinkpad threads thunderbird tidy tiff tk tntc tools truetype udev unicode usb userlocales utempter v4l2 video virtualbox vorbis wav webdav webdav-serf webkit wifi wmf wxwidgets x264 xattr xcb xemacs xext xine xml xmlreader xmlrpc xorg xpm xulrunner xv xvid xvmc yaml zlib zvbi" ALSA_CARDS="intel8x0" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgid dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="braindump flow karbon kexi kpresenter krita tables words" CAMERAS="canon" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" DVB_CARDS="usb-wt220u" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de fr" NETBEANS_MODULES="apisupport cnd dlight enterprise ergonomics groovy gsf harness ide identity j2ee java mobility nb php profiler ruby websvccommon xml" PHP_TARGETS="php5-3" QEMU_SOFTMMU_TARGETS="i386 ppc ppc64 x86_64" QEMU_USER_TARGETS="arm i386 x86_64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nv nouveau vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Juergen Rose 2011-07-03 15:49:05 UTC
Created attachment 278935 [details]
ex1a.c
Comment 2 Juergen Rose 2011-07-03 15:49:44 UTC
Created attachment 278937 [details]
Makefile to generate ex1a
Comment 3 Juergen Rose 2011-07-04 08:18:12 UTC
The same happens with openmpi-1.5.3-r2.
Comment 4 Christoph Niethammer 2011-07-04 16:24:15 UTC
Hello.

I tried to reproduce this error but did not succeed.

Could you try to run your example with a debugger (e.g. ddd) to see exactly where the problem comes from. Printf debugging is always very bad, as e.g. stdout is buffered and messages do not appear in the order they should. ;-]

# compile with 
mpicc -g ex1a.c -o ex1a
# then run with 
mpirun -np 2 ddd ex1a

It could also be helpful, if you build Open MPI by hand with the --enable-debug configure option to locate the problem, if it is not ebuild related.
Comment 5 Juergen Rose 2011-07-04 17:34:49 UTC
Hallo Christoph,

if I run the program under DDD:

rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(11)$ mpirun -np 2 ddd ex1a_debug

I see in the first DDD window:
(gdb) run
[Thread debugging using libthread_db enabled]
argc= 1 argv=0x7fffffffc628
  i= 0[ 1]   argv[i]=|/home_orca/rose/Txt/src/Test/C/MPI/Ex1a/ex1a_debug|
before 'MPI_Init(&argc,&argv)'
[New Thread 0x7ffff40cd700 (LWP 14357)]
[New Thread 0x7fffef6c6700 (LWP 14358)]
rc=0   MPI_SUCCESS=0
Hello, world.  I am 0 of 2 on orca
WE have 2 processes
Hello 1 Processor 1 at node orca reporting for duty

rank= 0  numtask= 2   processor_name=orca, before 'sleep(10)'
rank= 0  numtask= 2   processor_name=orca, after  'sleep(10)'
rank= 0,  before 'MPI_Finalize'
[Thread 0x7ffff40cd700 (LWP 14357) exited]
[Thread 0x7fffef6c6700 (LWP 14358) exited]
[New Thread 0x7fffef6c6700 (LWP 14365)]
[Thread 0x7fffef6c6700 (LWP 14365) exited]
[New Thread 0x7fffef6c6700 (LWP 14367)]
[Thread 0x7fffef6c6700 (LWP 14367) exited]

Program exited normally.
(gdb)

In the second DDD window: I see:
(gdb) run
[Thread debugging using libthread_db enabled]
argc= 1 argv=0x7fffffffc628
  i= 0[ 1]   argv[i]=|/home_orca/rose/Txt/src/Test/C/MPI/Ex1a/ex1a_debug|
before 'MPI_Init(&argc,&argv)'
[New Thread 0x7ffff40cd700 (LWP 14362)]
[New Thread 0x7fffef6c6700 (LWP 14363)]
rc=0   MPI_SUCCESS=0
Hello, world.  I am 1 of 2 on orca
rank= 1  numtask= 2   processor_name=orca, before 'sleep(10)'
rank= 1  numtask= 2   processor_name=orca, after  'sleep(10)'
rank= 1,  before 'MPI_Finalize'
[Thread 0x7ffff40cd700 (LWP 14362) exited]
[Thread 0x7fffef6c6700 (LWP 14363) exited]
[New Thread 0x7fffef6c6700 (LWP 14364)]
[Thread 0x7fffef6c6700 (LWP 14364) exited]
[New Thread 0x7fffef6c6700 (LWP 14366)]
[Thread 0x7fffef6c6700 (LWP 14366) exited]

Program exited normally.
(gdb) 


So everything seems to be fine. If I run the program compiled without "-g" directly, it hangs at the beginning, I see only:

rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(12)$ mpirun -np 2  ex1a
argc= 1 argv=0x7fff0e06d478
  i= 0[ 1]   argv[i]=|ex1a|
^C^\Verlassen

I can only kill it with ^\.

What a version of gcc and glibc do you use? What can I still test?

Regards Juergen
Comment 6 Juergen Rose 2011-07-04 17:45:33 UTC
If I run the ex1a version compiled with the "-g" flag without the debugger, it hangs as well:

rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(20)$ mpirun -np 2  ex1a_debug
argc= 1 argv=0x7fff920e66e8
  i= 0[ 1]   argv[i]=|ex1a_debug|
^\Verlassen
Comment 7 Juergen Rose 2011-07-04 23:09:48 UTC
Still some additional information, if I run 'mpirun -np 2 ddd ex1a_debug' with openmpi-1.5.3-r2, mpirun does not finish. I do not get a prompt after quitting ddd. I have to kill mpirun with ^C:

rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(11)$ mpirun -np 2 ddd ex1a_debug
^C
rose@orca:/home/rose/Txt/src/Test/C/MPI/Ex1a(12)$

If I run with openmpi-1.4.3, mpirun finishes after quitting ddd:

rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(7)$ mpirun -np 2 ddd ex1a_debug
rose@caiman:/home/rose/Txt/src/Test/C/MPI/Ex1a(8)$ 

So I assume that the error is not in ex1a_debug but in mpirun belonging to openmpi-1.5.3-r2.
Comment 8 Christoph Niethammer 2011-07-05 03:03:11 UTC
Hello Juergen,

The problem seems to be the mpi-thread use flag which enables mpi-threads and progress threads at the same time.
Maybe there should be a separate use flag for progress-threads in the Open MPI ebuild as they have nothing to do with mpi-threads in the future.

Disabling mpi-threads solved the problem for me so far.

Regards Christoph
Comment 9 Juergen Rose 2011-07-05 08:49:19 UTC
Danke Christoph,

your hint works. After removing the mpi-threads flag from USE in /etc/make.conf, doing 'emerge -uvND world' and recompiling my test program it seems to work correctly.
Comment 10 Justin Bronder (RETIRED) gentoo-dev 2015-08-05 12:44:43 UTC
No longer reproducible with openmpi-1.8.7 (with mpi-threads enabled).