Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 291707 - sys-cluster/osc-mpiexec and NCBI blast: the former mishandles command line arguments unlike mpich2-1.0.8/mpiexec
Summary: sys-cluster/osc-mpiexec and NCBI blast: the former mishandles command line ar...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Cluster Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-03 13:59 UTC by Martin Mokrejš
Modified: 2010-09-10 19:01 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Mokrejš 2009-11-03 13:59:21 UTC
Hi,
  this used to work wine with mpiexec distributed with mpich2. With the transition to osc-mpiexec users have to explicitly call osc-mpiexec in the batch file.

$ cat testcase.cmd
#PBS -S /bin/sh
#PBS -l nodes=1:ppn=4
#PBS -M me@foo.bar
#PBS -m ea
source ~/.bashrc
cd $HOME
p="sh -c \"blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt -a 4\""
osc-mpiexec $p
$


I would expect to have the last like the following to work as well:

osc-mpiexec blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt -a 4


Have ~amd64 server and amd64 (stable nodes) with: sys-cluster/osc-mpiexec-0.83, sys-cluster/mpich2-1.0.8-r1, sys-cluster/torque-2.4.1b1.
Comment 1 Justin Bronder (RETIRED) gentoo-dev 2009-11-20 15:35:56 UTC
Sorry for the late reply, I never got an email from bugzilla.

(In reply to comment #0)
> Hi,
>   this used to work wine with mpiexec distributed with mpich2. With the
> transition to osc-mpiexec users have to explicitly call osc-mpiexec in the
> batch file.

Given that sys-cluster/osc-mpiexec and sys-cluster/mpiexec are not the same as the mpiexec that is distributed with mpich2, what exactly are saying is broken?
Comment 2 Martin Mokrejš 2009-11-24 21:10:51 UTC
(In reply to comment #1)
> Sorry for the late reply, I never got an email from bugzilla.
> 
> (In reply to comment #0)
> > Hi,
> >   this used to work wine with mpiexec distributed with mpich2. With the
> > transition to osc-mpiexec users have to explicitly call osc-mpiexec in the
> > batch file.
> 
> Given that sys-cluster/osc-mpiexec and sys-cluster/mpiexec are not the same as
> the mpiexec that is distributed with mpich2, what exactly are saying is broken?

With MPICH2's mpiexec it was enough and easy to use:

$ cat testcase.cmd
#PBS -S /bin/sh
#PBS -l nodes=1:ppn=4
#PBS -M me@foo.bar
#PBS -m ea
source ~/.bashrc
cd $HOME
blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i
/home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt
-a 4
$

I admit prepending osc-mpiexec to that line is fine but it seems argument passing is different. Please not NCBI blast takes multiple target database files but those have to be wrapped by double-quotes in the shell so they appear after the `-d' switch.
Comment 3 Justin Bronder (RETIRED) gentoo-dev 2009-11-24 21:58:44 UTC
(In reply to comment #2)
> With MPICH2's mpiexec it was enough and easy to use:
> 
> $ cat testcase.cmd
> #PBS -S /bin/sh
> #PBS -l nodes=1:ppn=4
> #PBS -M me@foo.bar
> #PBS -m ea
> source ~/.bashrc
> cd $HOME
> blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i
> /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt
> -a 4
> $
> 
> I admit prepending osc-mpiexec to that line is fine but it seems argument
> passing is different. Please not NCBI blast takes multiple target database
> files but those have to be wrapped by double-quotes in the shell so they appear
> after the `-d' switch.
> 

Ah, I see.  NCBI blast is hardcoded to use mpiexec.  Have you checked to see if there is some way to configure NCBI blast to use a different wrapper?  Perhaps there is an environment variable or command line switch?
Comment 4 Martin Mokrejš 2009-11-24 22:38:42 UTC
(In reply to comment #3)
> (In reply to comment #2)

> 
> Ah, I see.  NCBI blast is hardcoded to use mpiexec.  Have you checked to see 

Actually, I did not think of that, probably strace(1) would tell. I thought that it has to do with qsub(1) from torque package handling the commands in the scriptfile ... Sorry, I have no time to study all the docs and get more familiar with all that. :( I just wanted to bring this to you attention.

> there is some way to configure NCBI blast to use a different wrapper?  Perhaps
> there is an environment variable or command line switch?

Truly said not, I did not think of that, again, strace(1) should reveal what is going on. At the moment I cannot test this, though.
Comment 5 Justin Bronder (RETIRED) gentoo-dev 2009-11-24 22:50:02 UTC
You're correct in saying that qsub does not call mpiexec.  Let me know what you find out with NCBI Blast.  Then we can either look into fixing that package or add a warning to osc-mpiexec.
Comment 6 Martin Mokrejš 2009-12-03 20:38:59 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > With MPICH2's mpiexec it was enough and easy to use:
> > 
> > $ cat testcase.cmd
> > #PBS -S /bin/sh
> > #PBS -l nodes=1:ppn=4
> > #PBS -M me@foo.bar
> > #PBS -m ea
> > source ~/.bashrc
> > cd $HOME
> > blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i
> > /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt
> > -a 4
> > $
> > 
> > I admit prepending osc-mpiexec to that line is fine but it seems argument
> > passing is different. Please note NCBI blast takes multiple target database
> > files but those have to be wrapped by double-quotes in the shell so they
> > appear after the `-d' switch.
> > 
> 
> Ah, I see. NCBI blast is hardcoded to use mpiexec. Have you checked to see if
> there is some way to configure NCBI blast to use a different wrapper?  Perhaps
> there is an environment variable or command line switch?

I think you misundertood me here. The double quotes are used to froup together the words so they can be treated as a list of several files.

Anyway, using strace(1) on "blastall -a 4 ..." execution I see it looks for:

.ncbirc, $HOME/.ncbirc, /etc/ncbi/ncbirc, /etc/ncbi/.ncbirc
It tries to connect to the socket, however:

open("/tmp/mpiexec-sock", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_dev=makedev(8, 3), st_ino=5636122, st_mode=S_IFDIR|S_ISVTX|0777, st_nlink=3, st_uid=1013, st_gid=100, st_blksize=4096, st_blocks=8, st_size=4096, st_atime=2009/12/03-20:42:39, st_mtime=2009/12/03-20:42:39, st_ctime=2009/12/03-20:42:39}) = 0
fcntl(3, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
close(3)                                = 0
open("/tmp/mpiexec-sock/mmokrejs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_dev=makedev(8, 3), st_ino=5636123, st_mode=S_IFDIR|0700, st_nlink=2, st_uid=1013, st_gid=100, st_blksize=4096, st_blocks=8, st_size=4096, st_atime=2009/12/03-20:42:39, st_mtime=2009/12/03-21:22:36, st_ctime=2009/12/03-21:22:36}) = 0
close(3)                                = 0
uname({sysname="Linux", nodename="node006", release="2.6.28-gentoo-r5-default", version="#4 SMP Wed May 27 23:14:36 MEST 2009", machine="x86_64"}) = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 3
bind(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 EADDRINUSE (Address already in use)
connect(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 ECONNREFUSED (Connection refused)
nanosleep({0, 300000000}, NULL)         = 0
connect(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 ECONNREFUSED (Connection refused)
nanosleep({0, 300000000}, NULL)         = 0
unlink("/tmp/mpiexec-sock/mmokrejs/5720.node006") = 0
bind(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = 0


Back to my problem, here is how it appears to blastall when called directly from the shell albeit via strace. Please note the three filenames under the "-d" switch (that is what gets screed when fetched from a *.cmd file through qsub(1)
and where some extra quoting and escaping is necessary.

$ strace -v -f -s 128 osc-mpiexec --comm=pmi blastall -p blastp -a 4 -v 3 -b 1 -d "protein.sequences.fa protein.sequences.fa protein.sequences.fa" -i Parameciumtetraurelia.fas -o /tmp/blast.txt
execve("/usr/bin/osc-mpiexec", ["osc-mpiexec", "--comm=pmi", "blastall", "-p", "blastp", "-a", "4", "-v", "3", "-b", "1", "-d", "protein.sequences.fa protein.sequences.fa protein.sequences.fa", "-i", "Parameciumtetraurelia.fas", "-o", "/tmp/blast.txt"], ["MANPATH=/etc/java-config-2/current-system-vm/man:/usr/local/share/man:/usr/share/man:/usr/share/binutils-data/x86_64-pc-linux-gn"..., "NCBI=/etc/ncbi", "MPD_CONF_FILE=@MPD_CONF_FILE_DIR@/mpd.conf", "PBS_VERSION=TORQUE-2.4.1b1", "TERM=xterm", "SHELL=/bin/bash", "PBS_JOBNAME=STDIN", "PBS_ENVIRONMENT=PBS_INTERACTIVE", "PBS_O_WORKDIR=/nfslarge/home/mmokrejs", "PBS_TASKNUM=1", "USER=mmokrejs", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:su=37;41"..., "GDK_USE_XFT=1", "PBS_O_HOME=/nfslarge/home/mmokrejs", "PBSLOGLEVEL=7", "PBSCOREDUMP=1", "PBS_MOMPORT=15003", "PLPLOT_LIB=/usr/share/EMBOSS/", "PAGER=/usr/bin/less", "CONFIG_PROTECT_MASK=/etc/sandbox.d /etc/env.d/java/ /etc/udev/rules.d /etc/fonts/fonts.conf /etc/terminfo /etc/ca-certificates.c"..., "XDG_CONFIG_DIRS=/etc/xdg", "FLTK_DOCDIR=/usr/share/doc/fltk-1.1.9/html", "PBS_O_QUEUE=batch", "PBS_O_LOGNAME=mmokrejs", "PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.2:/var/qmail/bin", "PBS_JOBCOOKIE=05BE1257ECE258C384611EA227E4ABCB", "DISTCC_LOG=", "PWD=/nfslarge/home/mmokrejs", "JAVA_HOME=/etc/java-config-2/current-system-vm", "JAVAC=/etc/java-config-2/current-system-vm/bin/javac", "EDITOR=/usr/bin/vim", "PBS_NODENUM=0", "BLASTDB=/usr/share/ncbi/formatdb", "PBS_O_SHELL=/bin/bash", "PBS_SERVER_HOME=/var/spool/torque", "DISTCC_VERBOSE=0", "DCCC_PATH=/usr/lib64/distcc/bin", "PBS_SERVER=nfssrv.cluster.local", "PBS_JOBID=5720.nfssrv.cluster.local", "PBSDEBUG=0", "EMBOSS_ACDROOT=/usr/share/EMBOSS/acd", "JDK_HOME=/etc/java-config-2/current-system-vm", "SHLVL=1", "HOME=/nfslarge/home/mmokrejs", "QRNADB=/usr/share/qrna/data", "PBS_O_HOST=nfssrv.cluster.local", "PBS_VNODENUM=0", "QMAIL_CONTROLDIR=/var/qmail/control", "LESS=-R -M --shift 5", "LOGNAME=mmokrejs", "GCC_SPECS=", "CVS_RSH=ssh", "TORQUEKEEPCOMPLETED=1", "XDG_DATA_DIRS=/usr/local/share:/usr/share", "CLASSPATH=.", "PBS_QUEUE=batch", "LESSOPEN=|lesspipe.sh %s", "R_HOME=/usr/lib64/R", "BLASTMAT=/usr/share/ncbi/data", "INFOPATH=/usr/share/info:/usr/share/binutils-data/x86_64-pc-linux-gnu/2.18/info:/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.2/in"..., "OPENGL_PROFILE=xorg-x11", "CONFIG_PROTECT=/var/spool/torque /var/qmail/control /var/qmail/alias", "PBS_NODEFILE=/var/spool/torque/aux//5720.nfssrv.cluster.local", "PBS_O_PATH=/nfslarge/x86_64_linux26/usr/bin:/nfslarge/i386_linux26/usr/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-"..., "_=/usr/bin/strace"]) = 0


BTW: MPD_CONF_FILE=@MPD_CONF_FILE_DIR@/mpd.conf environemnt variable content reveals configure broke on the system when mpich2-1.0.8 was compiled and the m4-macro code remained in the generated config.status file. I am not going to inspect this at the moment, sorry, no time. :(
Comment 7 Justin Bronder (RETIRED) gentoo-dev 2009-12-03 20:51:51 UTC
So, you don't like the difference in the way that the mpiexec packaged with mpich2 and osc-mpiexec handle command line arguments?
Comment 8 Martin Mokrejš 2009-12-03 21:05:02 UTC
(In reply to comment #7)
> So, you don't like the difference in the way that the mpiexec packaged with
> mpich2 and osc-mpiexec handle command line arguments?

Exactly.
Comment 9 Justin Bronder (RETIRED) gentoo-dev 2009-12-03 21:10:48 UTC
Alright, this is an upstream issue then.