Summary: | sys-cluster/osc-mpiexec and NCBI blast: the former mishandles command line arguments unlike mpich2-1.0.8/mpiexec | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Martin Mokrejš <mmokrejs> |
Component: | Current packages | Assignee: | Gentoo Cluster Team <cluster> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Martin Mokrejš
2009-11-03 13:59:21 UTC
Sorry for the late reply, I never got an email from bugzilla. (In reply to comment #0) > Hi, > this used to work wine with mpiexec distributed with mpich2. With the > transition to osc-mpiexec users have to explicitly call osc-mpiexec in the > batch file. Given that sys-cluster/osc-mpiexec and sys-cluster/mpiexec are not the same as the mpiexec that is distributed with mpich2, what exactly are saying is broken? (In reply to comment #1) > Sorry for the late reply, I never got an email from bugzilla. > > (In reply to comment #0) > > Hi, > > this used to work wine with mpiexec distributed with mpich2. With the > > transition to osc-mpiexec users have to explicitly call osc-mpiexec in the > > batch file. > > Given that sys-cluster/osc-mpiexec and sys-cluster/mpiexec are not the same as > the mpiexec that is distributed with mpich2, what exactly are saying is broken? With MPICH2's mpiexec it was enough and easy to use: $ cat testcase.cmd #PBS -S /bin/sh #PBS -l nodes=1:ppn=4 #PBS -M me@foo.bar #PBS -m ea source ~/.bashrc cd $HOME blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt -a 4 $ I admit prepending osc-mpiexec to that line is fine but it seems argument passing is different. Please not NCBI blast takes multiple target database files but those have to be wrapped by double-quotes in the shell so they appear after the `-d' switch. (In reply to comment #2) > With MPICH2's mpiexec it was enough and easy to use: > > $ cat testcase.cmd > #PBS -S /bin/sh > #PBS -l nodes=1:ppn=4 > #PBS -M me@foo.bar > #PBS -m ea > source ~/.bashrc > cd $HOME > blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i > /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt > -a 4 > $ > > I admit prepending osc-mpiexec to that line is fine but it seems argument > passing is different. Please not NCBI blast takes multiple target database > files but those have to be wrapped by double-quotes in the shell so they appear > after the `-d' switch. > Ah, I see. NCBI blast is hardcoded to use mpiexec. Have you checked to see if there is some way to configure NCBI blast to use a different wrapper? Perhaps there is an environment variable or command line switch? (In reply to comment #3) > (In reply to comment #2) > > Ah, I see. NCBI blast is hardcoded to use mpiexec. Have you checked to see Actually, I did not think of that, probably strace(1) would tell. I thought that it has to do with qsub(1) from torque package handling the commands in the scriptfile ... Sorry, I have no time to study all the docs and get more familiar with all that. :( I just wanted to bring this to you attention. > there is some way to configure NCBI blast to use a different wrapper? Perhaps > there is an environment variable or command line switch? Truly said not, I did not think of that, again, strace(1) should reveal what is going on. At the moment I cannot test this, though. You're correct in saying that qsub does not call mpiexec. Let me know what you find out with NCBI Blast. Then we can either look into fixing that package or add a warning to osc-mpiexec. (In reply to comment #3) > (In reply to comment #2) > > With MPICH2's mpiexec it was enough and easy to use: > > > > $ cat testcase.cmd > > #PBS -S /bin/sh > > #PBS -l nodes=1:ppn=4 > > #PBS -M me@foo.bar > > #PBS -m ea > > source ~/.bashrc > > cd $HOME > > blastpgp -v 3 -b 1 -d /home/me/protein_sequences.fa -i > > /home/me/Parameciumtetraurelia.fas -o /home/me/Parameciumtetraurelia-blast.txt > > -a 4 > > $ > > > > I admit prepending osc-mpiexec to that line is fine but it seems argument > > passing is different. Please note NCBI blast takes multiple target database > > files but those have to be wrapped by double-quotes in the shell so they > > appear after the `-d' switch. > > > > Ah, I see. NCBI blast is hardcoded to use mpiexec. Have you checked to see if > there is some way to configure NCBI blast to use a different wrapper? Perhaps > there is an environment variable or command line switch? I think you misundertood me here. The double quotes are used to froup together the words so they can be treated as a list of several files. Anyway, using strace(1) on "blastall -a 4 ..." execution I see it looks for: .ncbirc, $HOME/.ncbirc, /etc/ncbi/ncbirc, /etc/ncbi/.ncbirc It tries to connect to the socket, however: open("/tmp/mpiexec-sock", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_dev=makedev(8, 3), st_ino=5636122, st_mode=S_IFDIR|S_ISVTX|0777, st_nlink=3, st_uid=1013, st_gid=100, st_blksize=4096, st_blocks=8, st_size=4096, st_atime=2009/12/03-20:42:39, st_mtime=2009/12/03-20:42:39, st_ctime=2009/12/03-20:42:39}) = 0 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) close(3) = 0 open("/tmp/mpiexec-sock/mmokrejs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_dev=makedev(8, 3), st_ino=5636123, st_mode=S_IFDIR|0700, st_nlink=2, st_uid=1013, st_gid=100, st_blksize=4096, st_blocks=8, st_size=4096, st_atime=2009/12/03-20:42:39, st_mtime=2009/12/03-21:22:36, st_ctime=2009/12/03-21:22:36}) = 0 close(3) = 0 uname({sysname="Linux", nodename="node006", release="2.6.28-gentoo-r5-default", version="#4 SMP Wed May 27 23:14:36 MEST 2009", machine="x86_64"}) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 3 bind(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 EADDRINUSE (Address already in use) connect(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 ECONNREFUSED (Connection refused) nanosleep({0, 300000000}, NULL) = 0 connect(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = -1 ECONNREFUSED (Connection refused) nanosleep({0, 300000000}, NULL) = 0 unlink("/tmp/mpiexec-sock/mmokrejs/5720.node006") = 0 bind(3, {sa_family=AF_FILE, path="/tmp/mpiexec-sock/mmokrejs/5720.node006"...}, 110) = 0 Back to my problem, here is how it appears to blastall when called directly from the shell albeit via strace. Please note the three filenames under the "-d" switch (that is what gets screed when fetched from a *.cmd file through qsub(1) and where some extra quoting and escaping is necessary. $ strace -v -f -s 128 osc-mpiexec --comm=pmi blastall -p blastp -a 4 -v 3 -b 1 -d "protein.sequences.fa protein.sequences.fa protein.sequences.fa" -i Parameciumtetraurelia.fas -o /tmp/blast.txt execve("/usr/bin/osc-mpiexec", ["osc-mpiexec", "--comm=pmi", "blastall", "-p", "blastp", "-a", "4", "-v", "3", "-b", "1", "-d", "protein.sequences.fa protein.sequences.fa protein.sequences.fa", "-i", "Parameciumtetraurelia.fas", "-o", "/tmp/blast.txt"], ["MANPATH=/etc/java-config-2/current-system-vm/man:/usr/local/share/man:/usr/share/man:/usr/share/binutils-data/x86_64-pc-linux-gn"..., "NCBI=/etc/ncbi", "MPD_CONF_FILE=@MPD_CONF_FILE_DIR@/mpd.conf", "PBS_VERSION=TORQUE-2.4.1b1", "TERM=xterm", "SHELL=/bin/bash", "PBS_JOBNAME=STDIN", "PBS_ENVIRONMENT=PBS_INTERACTIVE", "PBS_O_WORKDIR=/nfslarge/home/mmokrejs", "PBS_TASKNUM=1", "USER=mmokrejs", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:su=37;41"..., "GDK_USE_XFT=1", "PBS_O_HOME=/nfslarge/home/mmokrejs", "PBSLOGLEVEL=7", "PBSCOREDUMP=1", "PBS_MOMPORT=15003", "PLPLOT_LIB=/usr/share/EMBOSS/", "PAGER=/usr/bin/less", "CONFIG_PROTECT_MASK=/etc/sandbox.d /etc/env.d/java/ /etc/udev/rules.d /etc/fonts/fonts.conf /etc/terminfo /etc/ca-certificates.c"..., "XDG_CONFIG_DIRS=/etc/xdg", "FLTK_DOCDIR=/usr/share/doc/fltk-1.1.9/html", "PBS_O_QUEUE=batch", "PBS_O_LOGNAME=mmokrejs", "PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.2:/var/qmail/bin", "PBS_JOBCOOKIE=05BE1257ECE258C384611EA227E4ABCB", "DISTCC_LOG=", "PWD=/nfslarge/home/mmokrejs", "JAVA_HOME=/etc/java-config-2/current-system-vm", "JAVAC=/etc/java-config-2/current-system-vm/bin/javac", "EDITOR=/usr/bin/vim", "PBS_NODENUM=0", "BLASTDB=/usr/share/ncbi/formatdb", "PBS_O_SHELL=/bin/bash", "PBS_SERVER_HOME=/var/spool/torque", "DISTCC_VERBOSE=0", "DCCC_PATH=/usr/lib64/distcc/bin", "PBS_SERVER=nfssrv.cluster.local", "PBS_JOBID=5720.nfssrv.cluster.local", "PBSDEBUG=0", "EMBOSS_ACDROOT=/usr/share/EMBOSS/acd", "JDK_HOME=/etc/java-config-2/current-system-vm", "SHLVL=1", "HOME=/nfslarge/home/mmokrejs", "QRNADB=/usr/share/qrna/data", "PBS_O_HOST=nfssrv.cluster.local", "PBS_VNODENUM=0", "QMAIL_CONTROLDIR=/var/qmail/control", "LESS=-R -M --shift 5", "LOGNAME=mmokrejs", "GCC_SPECS=", "CVS_RSH=ssh", "TORQUEKEEPCOMPLETED=1", "XDG_DATA_DIRS=/usr/local/share:/usr/share", "CLASSPATH=.", "PBS_QUEUE=batch", "LESSOPEN=|lesspipe.sh %s", "R_HOME=/usr/lib64/R", "BLASTMAT=/usr/share/ncbi/data", "INFOPATH=/usr/share/info:/usr/share/binutils-data/x86_64-pc-linux-gnu/2.18/info:/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.2/in"..., "OPENGL_PROFILE=xorg-x11", "CONFIG_PROTECT=/var/spool/torque /var/qmail/control /var/qmail/alias", "PBS_NODEFILE=/var/spool/torque/aux//5720.nfssrv.cluster.local", "PBS_O_PATH=/nfslarge/x86_64_linux26/usr/bin:/nfslarge/i386_linux26/usr/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-"..., "_=/usr/bin/strace"]) = 0 BTW: MPD_CONF_FILE=@MPD_CONF_FILE_DIR@/mpd.conf environemnt variable content reveals configure broke on the system when mpich2-1.0.8 was compiled and the m4-macro code remained in the generated config.status file. I am not going to inspect this at the moment, sorry, no time. :( So, you don't like the difference in the way that the mpiexec packaged with mpich2 and osc-mpiexec handle command line arguments? (In reply to comment #7) > So, you don't like the difference in the way that the mpiexec packaged with > mpich2 and osc-mpiexec handle command line arguments? Exactly. Alright, this is an upstream issue then. |