too long lines were shrinked: from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/datatype/opal_convertor.h:35, from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/datatype/ompi_datatype.h:38, from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c:31: /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c: In function ‘mca_coll_base_alltoall_intra_basic_inplace’: /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c:81:85: error: invalid use of undefined type ‘struct opal_convertor_master_t’ 81 | if( OPAL_UNLIKELY(opal_local_arch != ompi_proc->super.proc_convertor->master->remote_arch)) { | ^~ /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/include/opal/prefetch.h:44:55: note: in definition of macro ‘OPAL_UNLIKELY’ 44 | #define OPAL_UNLIKELY(expression) __builtin_expect(!!(expression), 0) ------------------------------------------------------------------- This is an unstable amd64 chroot image at a tinderbox (==build bot) name: 17.1_desktop-j4-20211202-110142 ------------------------------------------------------------------- gcc-config -l: [1] x86_64-pc-linux-gnu-11.2.1 * clang version 13.0.0 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/lib/llvm/13/bin /usr/lib/llvm/13 13.0.0 Python 3.9.9 Available Ruby profiles: [1] ruby26 (with Rubygems) [2] ruby27 (with Rubygems) [3] ruby30 (with Rubygems) * Available Rust versions: [1] rust-bin-1.56.1 * The following VMs are available for generation-2: 1) OpenJDK 8.312_p07 [openjdk-8] *) AdoptOpenJDK 8.312_p07 [openjdk-bin-8] Available Java Virtual Machines: [1] openjdk-8 [2] openjdk-bin-8 system-vm The Glorious Glasgow Haskell Compilation System, version 8.10.4 php cli: [1] php7.3 [2] php7.4 [3] php8.1 * HEAD of ::gentoo commit c8e2fb3130882c282cf7027199f0c447cde2d5f0 Author: Repository mirror & CI <repomirrorci@gentoo.org> Date: Fri Dec 3 23:51:40 2021 +0000 2021-12-03 23:51:39 UTC emerge -qpvO sys-cluster/openmpi [ebuild N ] sys-cluster/openmpi-4.1.2 USE="fortran heterogeneous ipv6 -cma (-cuda) -cxx -java -libompitrace -peruse -romio" ABI_X86="(64) -32 (-x32)" OPENMPI_FABRICS="-knem -ofed -psm" OPENMPI_OFED_FEATURES="-control-hdr-padding -dynamic-sl -rdmacm -udcm" OPENMPI_RM="-pbs -slurm"
Created attachment 757391 [details] emerge-info.txt
Created attachment 757392 [details] emerge-history.txt
Created attachment 757393 [details] environment
Created attachment 757394 [details] etc.portage.tar.bz2
Created attachment 757395 [details] logs.tar.bz2
Created attachment 757396 [details] sys-cluster:openmpi-4.1.2:20211204-031308.log.bz2
Created attachment 757397 [details] temp.tar.bz2
(In reply to Toralf Förster from comment #0) > too long lines were shrinked: > > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/datatype/ > opal_convertor.h:35, > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/datatype/ > ompi_datatype.h:38, > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c:31: > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c: In function > ‘mca_coll_base_alltoall_intra_basic_inplace’: > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c:81:85: error: invalid use of undefined type > ‘struct opal_convertor_master_t’ > 81 | if( OPAL_UNLIKELY(opal_local_arch != > ompi_proc->super.proc_convertor->master->remote_arch)) { > | > ^~ > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/include/ > opal/prefetch.h:44:55: note: in definition of macro ‘OPAL_UNLIKELY’ > 44 | #define OPAL_UNLIKELY(expression) __builtin_expect(!!(expression), 0) > > ------------------------------------------------------------------- > > This is an unstable amd64 chroot image at a tinderbox (==build bot) > name: 17.1_desktop-j4-20211202-110142 > > ------------------------------------------------------------------- > > gcc-config -l: > [1] x86_64-pc-linux-gnu-11.2.1 * > clang version 13.0.0 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/lib/llvm/13/bin > /usr/lib/llvm/13 > 13.0.0 > Python 3.9.9 > Available Ruby profiles: > [1] ruby26 (with Rubygems) > [2] ruby27 (with Rubygems) > [3] ruby30 (with Rubygems) * > Available Rust versions: > [1] rust-bin-1.56.1 * > The following VMs are available for generation-2: > 1) OpenJDK 8.312_p07 [openjdk-8] > *) AdoptOpenJDK 8.312_p07 [openjdk-bin-8] > Available Java Virtual Machines: > [1] openjdk-8 > [2] openjdk-bin-8 system-vm > > The Glorious Glasgow Haskell Compilation System, version 8.10.4 > php cli: > [1] php7.3 > [2] php7.4 > [3] php8.1 * > > HEAD of ::gentoo > commit c8e2fb3130882c282cf7027199f0c447cde2d5f0 > Author: Repository mirror & CI <repomirrorci@gentoo.org> > Date: Fri Dec 3 23:51:40 2021 +0000 > > 2021-12-03 23:51:39 UTC > > emerge -qpvO sys-cluster/openmpi > [ebuild N ] sys-cluster/openmpi-4.1.2 USE="fortran heterogeneous ipv6 > -cma (-cuda) -cxx -java -libompitrace -peruse -romio" ABI_X86="(64) -32 > (-x32)" OPENMPI_FABRICS="-knem -ofed -psm" > OPENMPI_OFED_FEATURES="-control-hdr-padding -dynamic-sl -rdmacm -udcm" > OPENMPI_RM="-pbs -slurm" Same here, and already fired a bug... https://bugs.gentoo.org/827810 but no one helps us... Iade
(In reply to Iade Gesso from comment #8) You didn't reply to a request for more information?
It looks like upstream has fixed this by adding #include "opal/datatype/opal_convertor_internal.h" to line 32 of ompi/ompi/mca/coll/base/coll_base_alltoall.c ompi/ompi/mca/coll/base/coll_base_alltoallv.c
(In reply to Jeremy Stent from comment #10) > It looks like upstream has fixed this by adding > > #include "opal/datatype/opal_convertor_internal.h" > > to line 32 of > ompi/ompi/mca/coll/base/coll_base_alltoall.c > ompi/ompi/mca/coll/base/coll_base_alltoallv.c Was there a bug / commit you can link to?
(In reply to Sam James from comment #11) > (In reply to Jeremy Stent from comment #10) > > It looks like upstream has fixed this by adding > > > > #include "opal/datatype/opal_convertor_internal.h" > > > > to line 32 of > > ompi/ompi/mca/coll/base/coll_base_alltoall.c > > ompi/ompi/mca/coll/base/coll_base_alltoallv.c > > Was there a bug / commit you can link to? It's https://github.com/open-mpi/ompi/commit/927e9aa97373dac652f9cba4813e6ee609ca2830 but it's not been backported.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fab20bbcbf246d0868b8d70b02ced33972f7c137 commit fab20bbcbf246d0868b8d70b02ced33972f7c137 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-01-02 03:02:35 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-01-02 03:02:35 +0000 sys-cluster/openmpi: add upstream patch for build failure Closes: https://bugs.gentoo.org/828123 Signed-off-by: Sam James <sam@gentoo.org> .../files/openmpi-4.1.2-missing-includes.patch | 32 ++++++++++++++++++++++ sys-cluster/openmpi/openmpi-4.1.2.ebuild | 6 +++- 2 files changed, 37 insertions(+), 1 deletion(-)
I am one of the Open MPI maintainers. You should not enable the heterogeneous functionality in Open MPI v4.0.x or v4.1.x -- it is currently known to be broken. Specifically, you should *not* include --enable-heterogeneous when building the Open MPI package. Unfortunately, it looks like we had a minor glitch in our README such that the "Do not use the this functionality!" was accidentally located in the wrong section, so you had no realistic way of knowing this. :-( I just posted this upstream at https://github.com/open-mpi/ompi/issues/9697#issuecomment-1003746357, but wanted to make sure it was known downstream here, too.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1e46e06ae70156fb4d4db508c727b1812e6a7aa4 commit 1e46e06ae70156fb4d4db508c727b1812e6a7aa4 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-01-03 00:20:38 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-01-03 00:21:48 +0000 sys-cluster/openmpi: disable heterogeneous (unsupported, broken) Upstream have let us know (thank you!) that heterogeneous should _not_ be used for anything before 5.0.x (which is not out yet). We can look at restoring support in the future once it is ready upstream. Upstream documentation has been fixed to reflect this too. Closes: https://bugs.gentoo.org/828123 Thanks-to: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Sam James <sam@gentoo.org> .../files/openmpi-4.1.2-missing-includes.patch | 32 ---------------------- sys-cluster/openmpi/openmpi-4.0.2-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.3-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.4-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.5-r2.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.5-r3.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.6-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.7.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.1.1-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.1.2.ebuild | 12 ++++---- 10 files changed, 30 insertions(+), 62 deletions(-)