Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 717796

Summary: sys-cluster/openmpi Use of eselect-ldso USE flag does not select correct BLAS/LAPACK version when run with OpenMPI
Product: Gentoo Linux Reporter: hfk22 <gentoo>
Component: Current packagesAssignee: Gentoo Science Related Packages <sci>
Status: UNCONFIRMED ---    
Severity: normal CC: cluster, jstein
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 747136    

Description hfk22 2020-04-17 04:53:59 UTC
It appears that the use of the eselect-ldso USE flag, along with eselect blas and eselect lapack, does not work properly when combined with OpenMPI.  At issue is that mpirun (orterun) automatically prepends PREFIX/lib:PREFIX/lib64, where PREFIX is normally /usr, to LD_LIBRARY_PATH in the environment which mpirun executes the given program.  This circumvents the scheme that the eselect-ldso USE flag uses to specify the chosen version of BLAS/LAPACK.  Currently, these paths are specified in /etc/ld.so.conf.d/ and they are included prior to /usr/lib.  For example, in my ld.so.conf:

/usr/lib64/opengl/xorg-x11/lib
include ld.so.conf.d/*.conf
/lib64
/usr/lib64
/usr/local/lib64
/lib32
/usr/lib32
/usr/local/lib32
/lib
/usr/lib

In any case, when mpirun prepends /usr/lib:/usr/lib64 to LD_LIBRARY_PATH then executables that depend on liblapack.so or libblas.so will use /usr/lib/liblapack.so, which is the reference LAPACK.  This is not the desired or intended affect that should be expected from eselect lapack.

Now, this can be fixed on the command line by giving mpirun the --noprefix flag.  I'll contend this is not really a desirable solution as it requires the user to know about this problem.

In order to see this, we can simply use the command:

mpirun -n 1 --xterm 0 /bin/bash

to open a shell in the MPI environment.  Then, we see:

$ printenv | grep LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/lib:/usr/lib64:/usr/lib/lapack/mkl-rt/

In this case, my desired LAPACK version is MKL, but the path will not find it because it will find the reference LAPACK first in /usr/lib.  In order to verify that this is a problem with the prefix, run:

$ mpirun --prefix /foo -n 1 --xterm 0 /bin/bash

Followed by

$ printenv | grep LD_LIBRARY_PATH
LD_LIBRARY_PATH=/foo/lib:/foo/lib64:/usr/lib/lapack/mkl-rt/

Here, we can see that lib and lib64 are now prepended by /foo.  Then, note that we can fix this with:

$ mpirun --noprefix -n 1 --xterm 0 /bin/bash

followed by:

$ printenv | grep LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/lib/lapack/mkl-rt/

At this point, we'd find the correct library.

Someone else would have to check, but I thought the old app-eselect/eselect-lapack package did not have the same problem.  If I recall, the old scheme created symbolic links in /usr/lib to the chosen LAPACK version.  The new scheme  under eselect-ldso changes the dynamic library search order rather than modifying a symbolic link.  That, or my memory is incorrect.

Anyway, I'm not sure if this should be filed under sys-cluster/openmpi or under any of the packages that support the eselect-ldso flag.  It's really the combination that's the problem.  As a side note, it's pretty common for MPI programs to also use BLAS/LAPACK and I found this problem while running the LINPACK benchmark.
Comment 1 Aisha Tammy 2020-12-01 14:29:21 UTC
haha, this is amazing...
I have no clue what a nice solution would be (or if it even exists), which can satisfy everybody...

> Someone else would have to check, but I thought the old 
> app-eselect/eselect-lapack package did not have the same problem.
> If I recall, the old scheme created symbolic links in /usr/lib
> to the chosen LAPACK version.  The new scheme  under eselect-ldso
> changes the dynamic library search order rather than modifying a 
> symbolic link.  That, or my memory is incorrect.

So the problems with this approach:
(1) you cannot use Intel MKL as a provider anymore. 
(2) All the BLAS/LAPACK providers have non-trivial patches to create extra libblas.so, etc libraries. These patches need to keep being refreshed every release, and these are not validated by upstreams. There is no way to check if the patches are correct or not (except by doing testing by running some package which depends uses BLAS/LAPACK like numpy or something).


I do not have enough experience with MPI yet, but prefixing with /usr/lib and /usr/lib64 seems a bit pointless. Maybe there are deeper underlying reasons, so I am willing to be enlightened. :D

> Now, this can be fixed on the command line by giving mpirun the --noprefix
> flag.  I'll contend this is not really a desirable solution as it
> requires the user to know about this problem.

Agreed. Is there any degradation if we ask upstream to make this the default behavior? Maybe if we ask nicely... 

Not sure about what to do.

PS: Curious what happens with sci-libs/scalapack...
Comment 2 Aisha Tammy 2020-12-01 14:57:50 UTC
Curious how debian does this. Anyone knows?