Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 289041 - [gentoo-science] mlx4 there is a mismatch between the kernel and the userspace libraries
Summary: [gentoo-science] mlx4 there is a mismatch between the kernel and the userspac...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Cluster Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-14 13:10 UTC by Vittorio
Modified: 2010-10-01 17:42 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vittorio 2009-10-14 13:10:57 UTC
when running a high level application on the infiniband layer, such us a stat with ibv_devinfo or an application launch with openmpi, there is an error reported: "mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting."
Other message may be "Failed to open device" or "CMA: unable to open RDMA device".

Reproducible: Always

Steps to Reproduce:
1. launch ibv_devinfo

Actual Results:  
mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting.


Expected Results:  
depends on the application, but ibv_devinfo should report some statistics about the infiniband cards installed


Portage 2.1.6.13 (default/linux/amd64/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.30-gentoo-r5.0 x86_64)
=================================================================
System uname: Linux-2.6.30-gentoo-r5.0-x86_64-Intel-R-_Xeon-R-_CPU_E5420_@_2.50GHz-with-gentoo-1.12.11.1
Timestamp of tree: Wed, 14 Oct 2009 01:45:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     4.0_p28
dev-java/java-config: 2.1.8-r1
dev-lang/python:     2.6.2-r1
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.63-r1
sys-devel/automake:  1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6a
virtual/os-headers:  2.6.27-r2
ABI="amd64"
ACCEPT_KEYWORDS="amd64"
ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci"
ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol"
APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias"
ARCH="amd64"
ASFLAGS_x86="--32"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CCACHE_SIZE="2G"
CDEFINE_amd64="__x86_64__"
CDEFINE_x86="__i386__"
CFLAGS="-O2 -march=native -pipe -fomit-frame-pointer"
CFLAGS_x86="-m32"
CHOST="x86_64-pc-linux-gnu"
CHOST_amd64="x86_64-pc-linux-gnu"
CHOST_x86="i686-pc-linux-gnu"
CLEAN_DELAY="5"
COLLISION_IGNORE="/lib/modules"
CONFIG_PROTECT="/etc /var/spool/torque"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d"
CVS_RSH="ssh"
CXXFLAGS="-O2 -march=native -pipe -fomit-frame-pointer"
DEFAULT_ABI="amd64"
DISTDIR="/usr/portage/distfiles"
EDITOR="/usr/bin/vim"
ELIBC="glibc"
EMERGE_DEFAULT_OPTS="--ask --verbose"
EMERGE_WARNING_DELAY="10"
ESELECT_MPI_IMP="mpi-ompi"
FEATURES="ccache distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
FETCHCOMMAND="/usr/bin/wget -t 5 -T 60 --passive-ftp -O "${DISTDIR}/${FILE}" "${URI}""
GCC_SPECS=""
GDK_USE_XFT="1"
GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo/ http://files.gentoo.gr/ ftp://de-mirror.org/distro/gentoo/ http://gentoo.tups.lv/source/ http://gentoo.mirrors.tds.net/gentoo/ ftp://ftp.unina.it/pub/linux/distributions/gentoo"
HOME="/root"
IBPATH="/usr/bin"
INFOPATH="/usr/share/info:/usr/share/binutils-data/x86_64-pc-linux-gnu/2.18/info:/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.2/info"
INPUT_DEVICES="evdev keyboard mouse"
JAVAC="/etc/java-config-2/current-system-vm/bin/javac"
JAVA_HOME="/etc/java-config-2/current-system-vm"
JDK_HOME="/etc/java-config-2/current-system-vm"
KERNEL="linux"
LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text"
LDFLAGS="-Wl,-O1"
LDFLAGS_x86="-m elf_i386"
LD_LIBRARY_PATH="/usr/lib64/mpi/mpi-ompi/usr/lib64:"
LESS="-R -M --shift 5"
LESSOPEN="|lesspipe.sh %s"
LIBDIR_amd64="lib64"
LIBDIR_amd64_fbsd="lib64"
LIBDIR_ppc="lib32"
LIBDIR_ppc64="lib64"
LIBDIR_sparc32="lib32"
LIBDIR_sparc64="lib64"
LIBDIR_x86="lib32"
LIBDIR_x86_fbsd="lib32"
LINGUAS="it en"
LOGNAME="root"
LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.pdf=00;32:*.ps=00;32:*.txt=00;32:*.patch=00;32:*.diff=00;32:*.log=00;32:*.tex=00;32:*.doc=00;32:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:"
MAIL="/var/mail/root"
MAKEOPTS="-j9"
MANPATH="/usr/lib64/mpi/mpi-ompi/usr/share/man:/etc/java-config-2/current-system-vm/man:/usr/local/share/man:/usr/share/man:/usr/share/binutils-data/x86_64-pc-linux-gnu/2.18/man:/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.2/man:/etc/java-config/system-vm/man/"
MULTILIB_ABIS="amd64 x86"
MULTILIB_STRICT_DENY="64-bit.*shared object"
MULTILIB_STRICT_DIRS="/lib32 /lib /usr/lib32 /usr/lib /usr/kde/*/lib32 /usr/kde/*/lib /usr/qt/*/lib32 /usr/qt/*/lib /usr/X11R6/lib32 /usr/X11R6/lib"
MULTILIB_STRICT_EXEMPT="(perl5|gcc|gcc-lib|binutils|eclipse-3|debug|portage)"
NETBEANS="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml"
OPENGL_PROFILE="xorg-x11"
PAGER="/usr/bin/less"
PATH="/usr/lib64/mpi/mpi-ompi/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.2"
PBS_SERVER_HOME="/var/spool/torque"
PKGDIR="/usr/portage/packages"
PORTAGE_ARCHLIST="ppc x86-openbsd ppc-openbsd ppc64 x86-winnt x86-fbsd ppc-aix alpha arm x86-freebsd s390 amd64 x86-macos x64-openbsd ia64-hpux hppa x86-netbsd amd64-linux ia64-linux x86 sparc-solaris x64-freebsd sparc64-solaris x86-linux x64-macos sparc m68k-mint ia64 mips ppc-macos x86-interix hppa-hpux amd64-fbsd x64-solaris mips-irix m68k sh x86-solaris sparc-fbsd"
PORTAGE_BINHOST_CHUNKSIZE="3000"
PORTAGE_BIN_PATH="/usr/lib64/portage/bin"
PORTAGE_COMPRESS_EXCLUDE_SUFFIXES="css gif htm[l]? jp[e]?g js pdf png"
PORTAGE_CONFIGROOT="/"
PORTAGE_DEBUG="0"
PORTAGE_DEPCACHEDIR="/var/cache/edb/dep"
PORTAGE_ELOG_CLASSES="log warn error"
PORTAGE_ELOG_MAILFROM="portage@localhost"
PORTAGE_ELOG_MAILSUBJECT="[portage] ebuild log for ${PACKAGE} on ${HOST}"
PORTAGE_ELOG_MAILURI="root"
PORTAGE_ELOG_SYSTEM="save_summary echo"
PORTAGE_FETCH_CHECKSUM_TRY_MIRRORS="5"
PORTAGE_FETCH_RESUME_MIN_SIZE="350K"
PORTAGE_GID="250"
PORTAGE_INST_GID="0"
PORTAGE_INST_UID="0"
PORTAGE_PYM_PATH="/usr/lib64/portage/pym"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_RSYNC_RETRIES="3"
PORTAGE_TMPDIR="/var/tmp"
PORTAGE_VERBOSE="1"
PORTAGE_WORKDIR_MODE="0700"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage/layman/science"
PROFILE_ONLY_VARIABLES="ARCH ELIBC KERNEL USERLAND"
PWD="/root"
RESUMECOMMAND="/usr/bin/wget -c -t 5 -T 60 --passive-ftp -O "${DISTDIR}/${FILE}" "${URI}""
ROOT="/"
ROOTPATH="/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.2"
RPMDIR="/usr/portage/rpm"
SHELL="/bin/bash"
SHLVL="1"
SSH_CLIENT="93.38.71.74 58944 22"
SSH_CONNECTION="93.38.71.74 58944 130.192.5.61 22"
SSH_TTY="/dev/pts/2"
STAGE1_USE="multilib nptl nptlonly unicode"
SYMLINK_LIB="yes"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
TERM="xterm-color"
USE="X acl acpi amd64 apache2 bash-completion bzip2 cairo ccache cli cracklib crypt cscope dbus diags dri fortran gd gif git gnome gtk hal heterogeneous iconv ipath isdnlog iser java jpeg kdeenablefinal kdehiddenvisibility libffi metis mlx4 mmx modules mpi-threads mthca mudflap multilib mysql ncurses nls nptl nptlonly nsplugin nvidia openal opengl openmp opensm openssl pbs pcre perl png pppd python rdma rds readline reflection reiserfs romio session spl sqlite srp sse sse2 ssh ssl subversion svg symlink sysfs tcpd threads tiff vim vnic vt xfs xorg xulrunner zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="it en" USERLAND="GNU" VIDEO_CARDS="vesa nvidia nv radeon"
USER="root"
USERLAND="GNU"
USE_EXPAND="ALSA_CARDS ALSA_PCM_PLUGINS APACHE2_MODULES APACHE2_MPMS CAMERAS CROSSCOMPILE_OPTS DVB_CARDS ELIBC FCDSL_CARDS FOO2ZJS_DEVICES FRITZCAPI_CARDS INPUT_DEVICES KERNEL LCD_DEVICES LINGUAS LIRC_DEVICES MISDN_CARDS NETBEANS_MODULES QEMU_SOFTMMU_TARGETS QEMU_USER_TARGETS SANE_BACKENDS USERLAND VIDEO_CARDS"
USE_EXPAND_HIDDEN="CROSSCOMPILE_OPTS ELIBC KERNEL USERLAND"
USE_ORDER="env:pkg:conf:defaults:pkginternal:env.d"
VIDEO_CARDS="vesa nvidia nv radeon"
_="/usr/bin/emerge"
Comment 1 Vittorio 2009-10-14 13:11:19 UTC
output of a mpirun launch

mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting.
CMA: unable to open RDMA device
[randori:15932] *** Process received signal ***
[randori:15932] Signal: Segmentation fault (11)
[randori:15932] Signal code: Address not mapped (1)
[randori:15932] Failing at address: 0x10c
[randori:15932] [ 0] /lib/libpthread.so.0 [0x7f78b6b78a10]
[randori:15932] [ 1] /usr/lib/libibverbs.so.1(ibv_close_device+0x21) [0x7f78b2d4b8c1]
[randori:15932] [ 2] /usr/lib/librdmacm.so.1 [0x7f78b2f53d2e]
[randori:15932] [ 3] /usr/lib/librdmacm.so.1 [0x7f78b2f53ee1]
[randori:15932] [ 4] /usr/lib/librdmacm.so.1(rdma_create_event_channel+0x12) [0x7f78b2f55e42]
[randori:15932] [ 5] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_btl_openib.so [0x7f78b2b216a2]
[randori:15932] [ 6] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_btl_openib.so [0x7f78b2b265aa]
[randori:15932] [ 7] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_btl_openib.so [0x7f78b2b22014]
[randori:15932] [ 8] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_btl_openib.so [0x7f78b2b10e8d]
[randori:15932] [ 9] /usr/lib64/mpi/mpi-ompi/usr/lib64/libmpi.so.0(mca_btl_base_select+0x1ba) [0x7f78b7b6237a]
[randori:15932] [10] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_bml_r2.so [0x7f78b3362911]
[randori:15932] [11] /usr/lib64/mpi/mpi-ompi/usr/lib64/libmpi.so.0(mca_bml_base_init+0x9f) [0x7f78b7b61b0f]
[randori:15932] [12] /usr/lib64/mpi/mpi-ompi/usr/lib64/openmpi/mca_pml_ob1.so [0x7f78b376d01f]
[randori:15932] [13] /usr/lib64/mpi/mpi-ompi/usr/lib64/libmpi.so.0(mca_pml_base_select+0x20e) [0x7f78b7b6ce7e]
[randori:15932] [14] /usr/lib64/mpi/mpi-ompi/usr/lib64/libmpi.so.0 [0x7f78b7b23ff8]
[randori:15932] [15] /usr/lib64/mpi/mpi-ompi/usr/lib64/libmpi.so.0(PMPI_Init+0x179) [0x7f78b7b46679]
[randori:15932] [16] mpitest(main+0x2b) [0x400b47]
[randori:15932] [17] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f78b68335c6]
[randori:15932] [18] mpitest [0x400a59]
[randori:15932] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 15932 on node randori exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[randori:15929] *** Process received signal ***
[randori:15929] Signal: Segmentation fault (11)
[randori:15929] Signal code: Address not mapped (1)
[randori:15929] Failing at address: 0x7f7fdcec2ae0
Segmentation fault
Comment 2 Justin Lecher (RETIRED) gentoo-dev 2009-10-14 20:58:16 UTC
Hopefully you are the right addresses, otherwise send it back to me.
Comment 3 Vittorio 2009-10-14 21:20:56 UTC
what do you mean by "right addresses"? if you mean access rights, yes as i'm running by root, otherwise if by address you mean the other nodes of the clusters, yes they are mapped correctly through opensm
Comment 4 Justin Lecher (RETIRED) gentoo-dev 2009-10-15 06:40:11 UTC
(In reply to comment #3)
> what do you mean by "right addresses"? 
The ones who I assigned the bug too. I didn't really get which package gave you the trouble so I chosse a little blind the maintainer who I assigned the bug to. Perhaps I missed someone.

Comment 5 Alexey Shvetsov archtester gentoo-dev 2009-10-15 11:36:00 UTC
(In reply to comment #3)
> what do you mean by "right addresses"? if you mean access rights, yes as i'm
> running by root, otherwise if by address you mean the other nodes of the
> clusters, yes they are mapped correctly through opensm
> 

yep. right address is me. but i cant test mlx4 driver since i only has mthca rdma devices. so what i can recomend is to try another kernel version or so. also what actual version of libmlx4 did you tried?
Comment 6 Vittorio 2009-11-02 14:07:55 UTC
this bug applies also with the unstable kernel 2.6.31-r4
Comment 7 Dawid Węgliński (RETIRED) gentoo-dev 2010-09-29 12:02:13 UTC
Is this still a problem for you? If so, could you share some information about your hardware specification?
Comment 8 Matthias Schoepfer 2010-10-01 13:14:07 UTC
(In reply to comment #7)
> Is this still a problem for you? If so, could you share some information about
> your hardware specification?

I was having the same issue, if I recall correctly, I could resolve the issue by updating the firmware of the infiniband cards (We have some DELL OEM carda). But I also recall that I installed libibverbs and libmlx4 from source. There was some issue with the gentoo-ebuild... 

All I actually can recall is that this issue hat absolutely nothing to do with the kernel, as it was configured correctly. It was rather an issue with the userspace libs... 

Hope that helps... 

Comment 9 Dawid Węgliński (RETIRED) gentoo-dev 2010-10-01 17:42:12 UTC
Googling a bit about this issue gave me the same informations - firmware upgrade resolved everyone's problem.