Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 946133 - sys-cluster/openmpi-5.0.6 fails with USE=cuda and dev-util/nvidia-cuda-toolkit-12.6.1
Summary: sys-cluster/openmpi-5.0.6 fails with USE=cuda and dev-util/nvidia-cuda-toolki...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Cluster Team
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2024-12-09 06:58 UTC by Benjamin Schulz
Modified: 2025-03-01 16:45 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log as tar.gz (build.tar.gz,72.24 KB, application/octet-stream)
2024-12-09 07:00 UTC, Benjamin Schulz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Schulz 2024-12-09 06:58:37 UTC
sys-cluster/openmpi-5.0.6 fails with USE=cuda
and dev-util/nvidia-cuda-toolkit-12.6.1

Reproducible: Always




emerge --info
Portage 3.0.66.1 (python 3.12.7-final-0, default/linux/amd64/23.0/desktop/plasma, gcc-14, glibc-2.40-r7, 6.12.3-gentoo-dist x86_64)
=================================================================
System uname: Linux-6.12.3-gentoo-dist-x86_64-AMD_Ryzen_9_3900X_12-Core_Processor-with-glibc2.40
KiB Mem:    32774864 total,  22740244 free
KiB Swap:   31249404 total,  31249404 free
Timestamp of repository gentoo: Mon, 09 Dec 2024 04:30:00 +0000
Head commit of repository gentoo: 502d9fddee36a8591daba07cbcdca75650209a4a
Timestamp of repository escpr2: Thu, 31 Oct 2024 18:34:26 +0000
Head commit of repository escpr2: f2c923b8c651f1d14744975f879c2c78f6e5f8f9

sh bash 5.2_p37
ld GNU ld (Gentoo 2.43 p3) 2.43.1
app-misc/pax-utils:        1.3.8::gentoo
app-shells/bash:           5.2_p37::gentoo
dev-build/autoconf:        2.71-r7::gentoo, 2.72-r1::gentoo
dev-build/automake:        1.17-r1::gentoo
dev-build/cmake:           3.31.2::gentoo
dev-build/libtool:         2.5.4::gentoo
dev-build/make:            4.4.1-r100::gentoo
dev-build/meson:           1.6.0-r1::gentoo
dev-java/java-config:      2.3.4::gentoo
dev-lang/perl:             5.40.0-r1::gentoo
dev-lang/python:           3.12.7_p1::gentoo, 3.13.0::gentoo
dev-lang/rust-bin:         1.83.0::gentoo
sys-apps/baselayout:       2.17::gentoo
sys-apps/openrc:           0.55.1::gentoo
sys-apps/sandbox:          2.40::gentoo
sys-devel/binutils:        2.43-r2::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/clang:           18.1.8-r6::gentoo, 19.1.5::gentoo, 20.0.0.9999::gentoo
sys-devel/gcc:             13.3.1_p20241115::gentoo, 14.2.1_p20241116::gentoo
sys-devel/gcc-config:      2.12.1::gentoo
sys-devel/llvm:            18.1.8-r6::gentoo, 19.1.5::gentoo, 20.0.0.9999::gentoo
sys-kernel/linux-headers:  6.12::gentoo (virtual/os-headers)
sys-libs/glibc:            2.40-r7::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    volatile: False
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-max-age: 3
    sync-rsync-extra-opts: 

crossdev
    location: /var/db/repos/crossdev
    masters: gentoo
    volatile: False

escpr2
    location: /var/db/repos/escpr2
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/escpr2.git
    masters: gentoo
    volatile: False

Binary Repositories:

gentoobinhost
    priority: 9999
    sync-uri: https://distfiles.gentoo.org/releases/amd64/binpackages/23.0/x86-64

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="@FREE @BINARY-REDISTRIBUTABLE NVIDIA-CUDA NVIDIA EPSON-EULA NVIDIA-cuDNN google-chrome all-rights-reserved android"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O3 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O3 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -O3 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance binpkg-request-signature buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles getbinpkg ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O3 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="de_DE.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs"
LEX="flex"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="X a52 aac acl acpi activities alsa amd64 blas bluetooth branding bzip2 cairo cdda cdr cet contrib crypt cuda cudnn cups d dbus declarative dist-kernel dracut dri dts dvd dvdr dvi elogind encode eps eselect-ldso exif fits flac fortran gdbm gdbui gif gnutls go gphoto2 gpm graphite grub gtk gui iconv icu ipv6 jit jpeg kde kf6compat kwallet lapack lapacke lcms libnotify libtirpc lm-sensors lto mad mng modula2 modules-sign mp3 mp4 mpeg multilib ncurses networkmanager nls nvenc objc objc++ offload ogg ompt opencl opencv opengl openmp pam pango pcre pdf pipewire plasma png policykit postscript ppds pulseaudio qml qt5 qt6 raw readline rust screencast sdl seccomp semantic-desktop sound spell ssl startup-notification subversion svg systemtap test-rust threads tiff tpm truetype udev udisks uefi unicode upower usb v4l vorbis vtv vulkan wayland wcs widgets wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_12" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx avx2 f16c fma3 pclmul popcnt rdrand sha sse3 sse4_1 sse4_2 sse4a ssse3" CURL_QUIC="ngtcp2" CURL_SSL="gnutls" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" L10N="de" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LLVM_TARGETS="X86 NVPTX WEBASSEMBLY" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres16" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby32" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, MAKEOPTS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 Benjamin Schulz 2024-12-09 07:00:30 UTC
Created attachment 913611 [details]
build.log as tar.gz
Comment 2 Benjamin Schulz 2024-12-09 07:00:53 UTC
it compiles with USE=-cuda at least.
Comment 3 Benjamin Schulz 2024-12-09 08:57:49 UTC
aliasing -finline-functions -c coll_cuda_module.c  -fPIC -DPIC -o .libs/coll_cuda_module.o
coll_cuda_module.c: In function ‘mca_coll_cuda_comm_query’:
coll_cuda_module.c:107:42: error: assignment to ‘mca_coll_base_module_reduce_local_fn_t’ {aka ‘int (*)(const void *, void *, int,  struct ompi_datatype_t *, struct ompi_op_t *, struct mca_coll_base_module_2_4_0_t *)’} from incompatible pointer type ‘int (*)(const void *, void *, size_t,  struct ompi_datatype_t *, struct ompi_op_t *, mca_coll_base_module_t *)’ {aka ‘int (*)(const void *, void *, long unsigned int,  struct ompi_datatype_t *, struct ompi_op_t *, struct mca_coll_base_module_2_4_0_t *)’} [-Wincompatible-pointer-types]
  107 |     cuda_module->super.coll_reduce_local = mca_coll_cuda_reduce_local;
      |                                          ^
make[2]: *** [Makefile:1556: coll_cuda_module.lo] Error 1
make[2]: Leaving directory '/var/tmp/portage/sys-cluster/openmpi-5.0.6/work/openmpi-5.0.6/ompi/mca/coll/cuda'
make[2]: *** Waiting for unfinished jobs....
make[2]: Entering directory '/var/tmp/portage/sys-cluster/openmpi-5.0.6/work/openmpi-5.0.6/ompi/mca/coll/cuda'
/bin/bash ../../../../libtool  --tag=CC   --mode=compile x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c   -iquote../../../..  -I/usr/include/pmix  -DNDEBUG -march=native -O3 -pipe -fno-strict-aliasing -finline-fu
Comment 4 foufou33 2024-12-21 18:48:37 UTC
it looks looks more 5.0.6's faults than 12.6.1. I went back to cuda 12.4.x and still  the same probleme.
Tried my luck on their bugtracker and found this :
 https://github.com/open-mpi/ompi/issues/12924

and here's their patch (untested):
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/12934.patch
Comment 5 Benjamin Schulz 2024-12-24 11:56:51 UTC
creating the directory /etc/portage/patches/sys-cluster/openmpi/ and putting that patch in there works for me. It compiles with the recent cuda then. So please include it and stabilize