too long lines were shrinked: from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/datatype/opal_convertor.h:35, from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/datatype/ompi_datatype.h:38, from /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c:31: /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c: In function ‘mca_coll_base_alltoall_intra_basic_inplace’: /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/base/coll_base_alltoall.c:81:85: error: invalid use of undefined type ‘struct opal_convertor_master_t’ 81 | if( OPAL_UNLIKELY(opal_local_arch != ompi_proc->super.proc_convertor->master->remote_arch)) { | ^~ /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/include/opal/prefetch.h:44:55: note: in definition of macro ‘OPAL_UNLIKELY’ 44 | #define OPAL_UNLIKELY(expression) __builtin_expect(!!(expression), 0) ------------------------------------------------------------------- This is an unstable amd64 chroot image at a tinderbox (==build bot) name: 17.1_desktop-j4-20211202-110142 ------------------------------------------------------------------- gcc-config -l: [1] x86_64-pc-linux-gnu-11.2.1 * clang version 13.0.0 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/lib/llvm/13/bin /usr/lib/llvm/13 13.0.0 Python 3.9.9 Available Ruby profiles: [1] ruby26 (with Rubygems) [2] ruby27 (with Rubygems) [3] ruby30 (with Rubygems) * Available Rust versions: [1] rust-bin-1.56.1 * The following VMs are available for generation-2: 1) OpenJDK 8.312_p07 [openjdk-8] *) AdoptOpenJDK 8.312_p07 [openjdk-bin-8] Available Java Virtual Machines: [1] openjdk-8 [2] openjdk-bin-8 system-vm The Glorious Glasgow Haskell Compilation System, version 8.10.4 php cli: [1] php7.3 [2] php7.4 [3] php8.1 * HEAD of ::gentoo commit c8e2fb3130882c282cf7027199f0c447cde2d5f0 Author: Repository mirror & CI <repomirrorci@gentoo.org> Date: Fri Dec 3 23:51:40 2021 +0000 2021-12-03 23:51:39 UTC emerge -qpvO sys-cluster/openmpi [ebuild N ] sys-cluster/openmpi-4.1.2 USE="fortran heterogeneous ipv6 -cma (-cuda) -cxx -java -libompitrace -peruse -romio" ABI_X86="(64) -32 (-x32)" OPENMPI_FABRICS="-knem -ofed -psm" OPENMPI_OFED_FEATURES="-control-hdr-padding -dynamic-sl -rdmacm -udcm" OPENMPI_RM="-pbs -slurm"
Created attachment 757391 [details] emerge-info.txt
Created attachment 757392 [details] emerge-history.txt
Created attachment 757393 [details] environment
Created attachment 757394 [details] etc.portage.tar.bz2
Created attachment 757395 [details] logs.tar.bz2
Created attachment 757396 [details] sys-cluster:openmpi-4.1.2:20211204-031308.log.bz2
Created attachment 757397 [details] temp.tar.bz2
(In reply to Toralf Förster from comment #0) > too long lines were shrinked: > > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/datatype/ > opal_convertor.h:35, > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/datatype/ > ompi_datatype.h:38, > from > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c:31: > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c: In function > ‘mca_coll_base_alltoall_intra_basic_inplace’: > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/ompi/mca/coll/ > base/coll_base_alltoall.c:81:85: error: invalid use of undefined type > ‘struct opal_convertor_master_t’ > 81 | if( OPAL_UNLIKELY(opal_local_arch != > ompi_proc->super.proc_convertor->master->remote_arch)) { > | > ^~ > /var/tmp/portage/sys-cluster/openmpi-4.1.2/work/openmpi-4.1.2/opal/include/ > opal/prefetch.h:44:55: note: in definition of macro ‘OPAL_UNLIKELY’ > 44 | #define OPAL_UNLIKELY(expression) __builtin_expect(!!(expression), 0) > > ------------------------------------------------------------------- > > This is an unstable amd64 chroot image at a tinderbox (==build bot) > name: 17.1_desktop-j4-20211202-110142 > > ------------------------------------------------------------------- > > gcc-config -l: > [1] x86_64-pc-linux-gnu-11.2.1 * > clang version 13.0.0 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/lib/llvm/13/bin > /usr/lib/llvm/13 > 13.0.0 > Python 3.9.9 > Available Ruby profiles: > [1] ruby26 (with Rubygems) > [2] ruby27 (with Rubygems) > [3] ruby30 (with Rubygems) * > Available Rust versions: > [1] rust-bin-1.56.1 * > The following VMs are available for generation-2: > 1) OpenJDK 8.312_p07 [openjdk-8] > *) AdoptOpenJDK 8.312_p07 [openjdk-bin-8] > Available Java Virtual Machines: > [1] openjdk-8 > [2] openjdk-bin-8 system-vm > > The Glorious Glasgow Haskell Compilation System, version 8.10.4 > php cli: > [1] php7.3 > [2] php7.4 > [3] php8.1 * > > HEAD of ::gentoo > commit c8e2fb3130882c282cf7027199f0c447cde2d5f0 > Author: Repository mirror & CI <repomirrorci@gentoo.org> > Date: Fri Dec 3 23:51:40 2021 +0000 > > 2021-12-03 23:51:39 UTC > > emerge -qpvO sys-cluster/openmpi > [ebuild N ] sys-cluster/openmpi-4.1.2 USE="fortran heterogeneous ipv6 > -cma (-cuda) -cxx -java -libompitrace -peruse -romio" ABI_X86="(64) -32 > (-x32)" OPENMPI_FABRICS="-knem -ofed -psm" > OPENMPI_OFED_FEATURES="-control-hdr-padding -dynamic-sl -rdmacm -udcm" > OPENMPI_RM="-pbs -slurm" Same here, and already fired a bug... https://bugs.gentoo.org/827810 but no one helps us... Iade
(In reply to Iade Gesso from comment #8) You didn't reply to a request for more information?
It looks like upstream has fixed this by adding #include "opal/datatype/opal_convertor_internal.h" to line 32 of ompi/ompi/mca/coll/base/coll_base_alltoall.c ompi/ompi/mca/coll/base/coll_base_alltoallv.c
(In reply to Jeremy Stent from comment #10) > It looks like upstream has fixed this by adding > > #include "opal/datatype/opal_convertor_internal.h" > > to line 32 of > ompi/ompi/mca/coll/base/coll_base_alltoall.c > ompi/ompi/mca/coll/base/coll_base_alltoallv.c Was there a bug / commit you can link to?
(In reply to Sam James from comment #11) > (In reply to Jeremy Stent from comment #10) > > It looks like upstream has fixed this by adding > > > > #include "opal/datatype/opal_convertor_internal.h" > > > > to line 32 of > > ompi/ompi/mca/coll/base/coll_base_alltoall.c > > ompi/ompi/mca/coll/base/coll_base_alltoallv.c > > Was there a bug / commit you can link to? It's https://github.com/open-mpi/ompi/commit/927e9aa97373dac652f9cba4813e6ee609ca2830 but it's not been backported.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fab20bbcbf246d0868b8d70b02ced33972f7c137 commit fab20bbcbf246d0868b8d70b02ced33972f7c137 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-01-02 03:02:35 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-01-02 03:02:35 +0000 sys-cluster/openmpi: add upstream patch for build failure Closes: https://bugs.gentoo.org/828123 Signed-off-by: Sam James <sam@gentoo.org> .../files/openmpi-4.1.2-missing-includes.patch | 32 ++++++++++++++++++++++ sys-cluster/openmpi/openmpi-4.1.2.ebuild | 6 +++- 2 files changed, 37 insertions(+), 1 deletion(-)
I am one of the Open MPI maintainers. You should not enable the heterogeneous functionality in Open MPI v4.0.x or v4.1.x -- it is currently known to be broken. Specifically, you should *not* include --enable-heterogeneous when building the Open MPI package. Unfortunately, it looks like we had a minor glitch in our README such that the "Do not use the this functionality!" was accidentally located in the wrong section, so you had no realistic way of knowing this. :-( I just posted this upstream at https://github.com/open-mpi/ompi/issues/9697#issuecomment-1003746357, but wanted to make sure it was known downstream here, too.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1e46e06ae70156fb4d4db508c727b1812e6a7aa4 commit 1e46e06ae70156fb4d4db508c727b1812e6a7aa4 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-01-03 00:20:38 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-01-03 00:21:48 +0000 sys-cluster/openmpi: disable heterogeneous (unsupported, broken) Upstream have let us know (thank you!) that heterogeneous should _not_ be used for anything before 5.0.x (which is not out yet). We can look at restoring support in the future once it is ready upstream. Upstream documentation has been fixed to reflect this too. Closes: https://bugs.gentoo.org/828123 Thanks-to: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Sam James <sam@gentoo.org> .../files/openmpi-4.1.2-missing-includes.patch | 32 ---------------------- sys-cluster/openmpi/openmpi-4.0.2-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.3-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.4-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.5-r2.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.5-r3.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.6-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.0.7.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.1.1-r1.ebuild | 6 ++-- sys-cluster/openmpi/openmpi-4.1.2.ebuild | 12 ++++---- 10 files changed, 30 insertions(+), 62 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=496a4f0ce86f43da3fe77ffd6c9bef2e41cf3852 commit 496a4f0ce86f43da3fe77ffd6c9bef2e41cf3852 Author: Eli Schwartz <eschwartz93@gmail.com> AuthorDate: 2024-06-10 04:05:03 +0000 Commit: Eli Schwartz <eschwartz@gentoo.org> CommitDate: 2024-07-12 05:54:15 +0000 sys-cluster/openmpi: add 5.0.3 A bunch of upstream changes occurred. In particular: - openmpi drops ALL support for 32-bit, and errors out in ./configure if you try. This follows pmix. Rip out all the multilib-minimal scaffolding. - libompitrace "was incomplete and unmaintained" and is now removed from the sources - upstream now defaults to --disable-dlopen, and configuring with libltdl enabled externally returns errors saying a non libltdl header doesn't exist. Unclear if it actually supports this - a couple dependencies can now be configured --with-*=external instead of passing paths - libibverbs handling is gone upstream and no longer makes sense to configure via USE flags (or at all): https://github.com/open-mpi/ompi/commit/59c8ab6da4276ff398453a54910c6c0fb67a153c Delayed: - heterogeneous was broken in older versions, and its USE flag is supposed to be restored. But the upstream docs still suggest it is broken. Independent of upstream rework of pmix, we take the opportunity of a version bump to build against the system pmix, resolving a longstanding bug due to openmpi publicly shipping its own pmix installation that stomps all over the global system namespace. Temporarily drop keywords which the pmix package lacks. Bug: https://bugs.gentoo.org/828123 Closes: https://bugs.gentoo.org/652432 Closes: https://bugs.gentoo.org/927828 Closes: https://bugs.gentoo.org/930362 Signed-off-by: Eli Schwartz <eschwartz93@gmail.com> Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> sys-cluster/openmpi/Manifest | 1 + sys-cluster/openmpi/openmpi-5.0.3.ebuild | 141 +++++++++++++++++++++++++++++++ 2 files changed, 142 insertions(+)