Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 895286 - dev-libs/rocm-opencl-runtime:5.3: clinfo: ROCclr-rocm-5.3.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed
Summary: dev-libs/rocm-opencl-runtime:5.3: clinfo: ROCclr-rocm-5.3.3/os/os_posix.cpp:3...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC64 Linux
: Normal normal (vote)
Assignee: Craig Andrews
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-18 17:27 UTC by darkbasic
Modified: 2024-03-01 13:23 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
0001-Revert-Update-counters-for-gfx11.patch (0001-Revert-Update-counters-for-gfx11.patch,6.64 KB, patch)
2023-02-20 11:55 UTC, darkbasic
Details | Diff
rocm-opencl-runtime-tests-ppc64.patch (rocm-opencl-runtime-tests-ppc64.patch,497 bytes, patch)
2023-02-21 09:30 UTC, darkbasic
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description darkbasic 2023-02-18 17:27:43 UTC
On my Raptor CS Talos 2 ppc64le (4K page size) incredibly dev-libs/rocm-opencl-runtime-5.3.3-r1 compiled without any issue.

Unfortunately as soon as I run "clinfo" it crashes:

niko@talos2 ~ $ clinfo 
clinfo: /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.3.3-r1/work/ROCclr-rocm-5.3.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
Aborted (core dumped)

The core dump is completely useless:

talos2 ~ # coredumpctl gdb 507511
           PID: 507511 (clinfo)
           UID: 1000 (niko)
           GID: 1000 (niko)
        Signal: 6 (ABRT)
     Timestamp: Sat 2023-02-18 18:22:15 CET (19s ago)
  Command Line: clinfo
    Executable: /usr/bin/clinfo
 Control Group: /user.slice/user-1000.slice/user@1000.service/session.slice/vte-spawn-0d9d182f-19f2-4412-ac9e-1077645eb7bd.scope
          Unit: user@1000.service
     User Unit: vte-spawn-0d9d182f-19f2-4412-ac9e-1077645eb7bd.scope
         Slice: user-1000.slice
     Owner UID: 1000 (niko)
       Boot ID: b0658e05c889413aa8a4a97e82082f61
    Machine ID: b3e834569b8ff461391f5ac061feb773
      Hostname: talos2
       Storage: /var/lib/systemd/coredump/core.clinfo.1000.b0658e05c889413aa8a4a97e82082f61.507511.1676740935000000.zst (present)
  Size on Disk: 1.5M
       Message: Process 507511 (clinfo) of user 1000 dumped core.

GNU gdb (Gentoo 12.1 vanilla) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/clinfo...
(No debugging symbols found in /usr/bin/clinfo)
[New LWP 507511]
[New LWP 507515]
[New LWP 507512]
[New LWP 507513]
[New LWP 507514]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `clinfo '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff8487603c in ?? () from /usr/lib64/libc.so.6
[Current thread is 1 (Thread 0x3fff84ae6020 (LWP 507511))]
(gdb) info threads
  Id   Target Id                          Frame 
* 1    Thread 0x3fff84ae6020 (LWP 507511) 0x00003fff8487603c in ?? () from /usr/lib64/libc.so.6
  2    Thread 0x3fff72c3b120 (LWP 507515) 0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
  3    Thread 0x3fff7457f120 (LWP 507512) 0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
  4    Thread 0x3fff73c3d120 (LWP 507513) 0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
  5    Thread 0x3fff7343c120 (LWP 507514) 0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x3fff84ae6020 (LWP 507511))]
#0  0x00003fff8487603c in ?? () from /usr/lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x3fff72c3b120 (LWP 507515))]
#0  0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x3fff7457f120 (LWP 507512))]
#0  0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0x3fff73c3d120 (LWP 507513))]
#0  0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x3fff7343c120 (LWP 507514))]
#0  0x00003fff8486df84 in ?? () from /usr/lib64/libc.so.6
Comment 1 darkbasic 2023-02-18 17:27:59 UTC
talos2 ~ # emerge --info
Portage 3.0.44 (python 3.10.9-final-0, default/linux/ppc64le/17.0/desktop/gnome/systemd/merged-usr, gcc-12, glibc-2.36-r5, 6.1.12-gentoo-dist ppc64le)
=================================================================
System uname: Linux-6.1.12-gentoo-dist-ppc64le-POWER9,_altivec_supported-with-glibc2.36
KiB Mem:    65402560 total,  41483872 free
KiB Swap:   16777212 total,  16777212 free
Timestamp of repository bobwya: Wed, 15 Feb 2023 16:01:57 +0000
Head commit of repository bobwya: 8acb2fe1c2cc89a204a643f449a4cd10809553c8

Timestamp of repository gentoo: Sat, 18 Feb 2023 16:16:55 +0000
Head commit of repository gentoo: 2d977cbbb8384e1f6cc0f1207b9331e36e72afaf

Timestamp of repository guru: Fri, 17 Feb 2023 14:02:19 +0000
Head commit of repository guru: dd95ab3abc229844c1ad5c9bead61f593550a920

Timestamp of repository pf4public: Sat, 18 Feb 2023 14:01:59 +0000
Head commit of repository pf4public: 50bd9f4a7cc2c337c25e9f137c98f96b2d0f9257

sh bash 5.1_p16-r2
ld GNU ld (Gentoo 2.39 p5) 2.39.0
ccache version 4.7.4 [disabled]
app-misc/pax-utils:        1.3.5::gentoo
app-shells/bash:           5.1_p16-r2::gentoo
dev-java/java-config:      2.3.1::gentoo
dev-lang/perl:             5.36.0-r1::gentoo
dev-lang/python:           3.10.9-r1::gentoo, 3.11.1-r1::gentoo
dev-lang/rust:             1.66.1::gentoo
dev-util/ccache:           4.7.4::gentoo
dev-util/cmake:            3.25.2::gentoo
dev-util/meson:            1.0.0::gentoo
sys-apps/baselayout:       2.9::gentoo
sys-apps/sandbox:          2.29::gentoo
sys-apps/systemd:          252.4-r1::gentoo
sys-devel/autoconf:        2.13-r7::gentoo, 2.71-r5::gentoo
sys-devel/automake:        1.16.5::gentoo
sys-devel/binutils:        2.39-r4::gentoo
sys-devel/binutils-config: 5.4.1::gentoo
sys-devel/clang:           15.0.7-r1::gentoo
sys-devel/gcc:             12.2.1_p20230121-r1::gentoo
sys-devel/gcc-config:      2.8::gentoo
sys-devel/libtool:         2.4.7::gentoo
sys-devel/lld:             15.0.7::gentoo
sys-devel/llvm:            15.0.7::gentoo
sys-devel/make:            4.3::gentoo
sys-kernel/linux-headers:  5.15-r3::gentoo (virtual/os-headers)
sys-libs/glibc:            2.36-r5::gentoo
Repositories:

bobwya
    location: /var/db/repos/bobwya
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/bobwya.git
    masters: gentoo
    priority: -1100
    volatile: True

gentoo
    location: /var/db/repos/gentoo
    sync-type: git
    sync-uri: https://anongit.gentoo.org/git/repo/sync/gentoo.git
    priority: -1000
    volatile: True
    sync-git-verify-commit-signature: yes

darkbasic
    location: /var/db/repos/darkbasic
    masters: gentoo
    volatile: True

guru
    location: /var/db/repos/guru
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/guru.git
    masters: gentoo
    volatile: True

pf4public
    location: /var/db/repos/pf4public
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/pf4public.git
    masters: gentoo
    volatile: True

ACCEPT_KEYWORDS="ppc64"
ACCEPT_LICENSE="@FREE @FREE unRAR fping freedist Microsoft-vscode"
CBUILD="powerpc64le-unknown-linux-gnu"
CFLAGS="-O2 -pipe -mcpu=power9 -mtune=power9"
CHOST="powerpc64le-unknown-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.1/ext-active/ /etc/php/cgi-php8.1/ext-active/ /etc/php/cli-php8.1/ext-active/ /etc/php/fpm-php8.1/ext-active/ /etc/php/phpdbg-php8.1/ext-active/ /etc/revdep-rebuild /etc/sandbox.d"
CXXFLAGS="-O2 -pipe -mcpu=power9 -mtune=power9"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe -mcpu=power9 -mtune=power9"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -mcpu=power9 -mtune=power9"
GENTOO_MIRRORS="https://gentoo.mirror.garr.it/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
MAKEOPTS="-j32"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="X a52 aac acl alsa bluetooth branding bzip2 cairo cdda cdr cli colord crypt cups dbus dri dts dvd dvdr eds encode evo exif flac fortran gdbm gif gnome gnome-keyring gnome-online-accounts gpm gstreamer gtk gui iconv icu introspection ipv6 jpeg lcms libglvnd libnotify libsecret mad mng mp3 mp4 mpeg nautilus ncurses networkmanager nls nptl ogg opencl opengl openmp pam pango pcre pdf pipewire png policykit ppc64 ppds pulseaudio qt5 readline screencast sdl seccomp sound spell ssl startup-notification svg systemd test-rust tiff tracker truetype udev udisks unicode upower usb vaapi vorbis vpx vulkan wayland wxwidgets x264 xattr xcb xft xml xv xvid zeroconf zlib" ADA_TARGET="gnat_2021" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_PPC="altivec vsx vsx2 vsx3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-4 php8-0" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10" QEMU_SOFTMMU_TARGETS="ppc ppc64 i386 x86_64 arm aarch64" QEMU_USER_TARGETS="ppc ppc64 i386 x86_64 arm aarch64" RUBY_TARGETS="ruby27 ruby30" USERLAND="GNU" VIDEO_CARDS="amdgpu radeon" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 2 Yiyang Wu 2023-02-20 04:09:17 UTC
I suggest to open an issue to upstream https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/, to see whether if they can debug such issue and fix the support for ppc64le. You can also enable debug build of rocm-opencl-runtime (USE=debug and some debugging CXXFLAGS) to get the backtrace with more information.
Comment 3 darkbasic 2023-02-20 08:29:37 UTC
I **already** have debug symbols enabled and as you can see they didn't help at all because the only line in the coredump concerns libc.so.6.

I've added the following to both dev-libs/rocm-opencl-runtime and dev-libs/rocr-runtime as well as the debug use:

FEATURES="${FEATURES} nostrip"
CFLAGS="${CFLAGS} -ggdb3 -Wall"
CXXFLAGS="${CFLAGS}"

Is there any ETA until 5.4.3 gets packaged in Gentoo? I would like to avoid filing a bug request with an old version.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-20 09:31:50 UTC
(In reply to darkbasic from comment #3)
> I **already** have debug symbols enabled and as you can see they didn't help
> at all because the only line in the coredump concerns libc.so.6.
> 

Their request was not for debugging symbols but for assertions. USE=debug is usually orthogonal to debugging symbols.
Comment 5 darkbasic 2023-02-20 10:02:20 UTC
> as well as the debug use

I have that too, but I didn't notice any additional logs.
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-20 10:06:06 UTC
(In reply to darkbasic from comment #5)
> > as well as the debug use
> 
> I have that too, but I didn't notice any additional logs.

OK (please don't enable it globally, only selectively when debugging something - it's not safe to keep on, unlike debugging symbols). Anyway, the bump is in tree now.
Comment 7 darkbasic 2023-02-20 11:07:31 UTC
> please don't enable it globally

It's enabled only for dev-libs/rocm-opencl-runtime and dev-libs/rocr-runtime.

> Anyway, the bump is in tree now.

Thanks!
Shouldn't we also bump dev-libs/rocm-opencl-runtime dependencies like dev-libs/rocr-runtime, dev-libs/rocm-comgr, dev-libs/rocm-device-libs, dev-util/rocm-cmake and dev-libs/roct-thunk-interface ?

Btw same issue with 5.4.3:

niko@talos2 ~ $ clinfo 
clinfo: /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
Aborted (core dumped)

And same useless dump as well:

talos2 ~ # coredumpctl gdb 119859
           PID: 119859 (clinfo)
           UID: 1000 (niko)
           GID: 1000 (niko)
        Signal: 6 (ABRT)
     Timestamp: Mon 2023-02-20 12:02:59 CET (26s ago)
  Command Line: clinfo
    Executable: /usr/bin/clinfo
 Control Group: /user.slice/user-1000.slice/user@1000.service/session.slice/vte-spawn-5625e4ad-1505-4188-be9e-6f9ed5549c09.scope
          Unit: user@1000.service
     User Unit: vte-spawn-5625e4ad-1505-4188-be9e-6f9ed5549c09.scope
         Slice: user-1000.slice
     Owner UID: 1000 (niko)
       Boot ID: 0dca6c1f75ea46d7b02761482c0ec1d6
    Machine ID: b3e834569b8ff461391f5ac061feb773
      Hostname: talos2
       Storage: /var/lib/systemd/coredump/core.clinfo.1000.0dca6c1f75ea46d7b02761482c0ec1d6.119859.1676890979000000.zst (present)
  Size on Disk: 1.5M
       Message: Process 119859 (clinfo) of user 1000 dumped core.

GNU gdb (Gentoo 12.1 vanilla) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/clinfo...
(No debugging symbols found in /usr/bin/clinfo)
[New LWP 119859]
[New LWP 119860]
[New LWP 119862]
[New LWP 119861]
[New LWP 119863]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `clinfo '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff9de5603c in ?? () from /usr/lib64/libc.so.6
[Current thread is 1 (Thread 0x3fff9e0ca020 (LWP 119859))]
Comment 8 darkbasic 2023-02-20 11:24:43 UTC
I've tried to bump everything but rocm-device-libs 5.4.3 won't compile:

[514/656] cd /var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3_build/ockl && /usr/lib/llvm/15/bin/clang-15 -I/var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3/ockl/../irif/inc -I/var/tmp/portage/dev-libs/rocm-device-li>
FAILED: ockl/mtime.bc /var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3_build/ockl/mtime.bc
cd /var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3_build/ockl && /usr/lib/llvm/15/bin/clang-15 -I/var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3/ockl/../irif/inc -I/var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/w>
/var/tmp/portage/dev-libs/rocm-device-libs-5.4.3/work/ROCm-Device-Libs-rocm-5.4.3/ockl/src/mtime.cl:20:12: error: use of undeclared identifier '__builtin_amdgcn_s_sendmsg_rtnl'
    return __builtin_amdgcn_s_sendmsg_rtnl(0x83);
           ^
1 error generated.
Comment 9 darkbasic 2023-02-20 11:55:19 UTC
Created attachment 853184 [details, diff]
0001-Revert-Update-counters-for-gfx11.patch

This change requires LLVM 16, reverting it fixes the build:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/85f95b94960c6f7ff4ff0242a399deb4a204fb6a
Comment 10 darkbasic 2023-02-20 12:15:38 UTC
Unfortunately I still get the same issue despite upgrading the whole rocm stack to 5.4.3. I will open a PR to upgrade to 5.4.3 later but I'll mark it as draft since I can't test either OpenCL (doesn't work) nor HIP (not supported on my RX 570). Feel free to continue from there if you want.
Comment 11 darkbasic 2023-02-20 13:55:14 UTC
Draft: https://github.com/gentoo/gentoo/pull/29684
Comment 12 darkbasic 2023-02-20 14:12:46 UTC
Upstream issue: https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/158
Comment 13 darkbasic 2023-02-20 14:48:11 UTC
I didn't notice before but compiling with +test fails on ppc64le:

[122/224] /usr/bin/powerpc64le-unknown-linux-gnu-g++ -DCL_TARGET_OPENCL_VERSION=220 -DEMU_ENV=1 -DUSE_OPENGL=1 -Doclperf_EXPORTS -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/khronos/headers/opencl2.2 -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/common -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl  -O2 -pipe -mcpu=power9 -mtune=power9 -ggdb3 -Wall -fPIC -std=c++14 -MD -MT tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -MF tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o.d -o tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -c /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp
FAILED: tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o 
/usr/bin/powerpc64le-unknown-linux-gnu-g++ -DCL_TARGET_OPENCL_VERSION=220 -DEMU_ENV=1 -DUSE_OPENGL=1 -Doclperf_EXPORTS -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/khronos/headers/opencl2.2 -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/common -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl  -O2 -pipe -mcpu=power9 -mtune=power9 -ggdb3 -Wall -fPIC -std=c++14 -MD -MT tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -MF tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o.d -o tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -c /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp
In file included from /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp:21:
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:16: error: typedef ‘CPUKernel’ is initialized (use ‘decltype’ instead)
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                ^~~~~~~~~
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:27: error: ‘__m128’ was not declared in this scope; did you mean ‘__ibm128’?
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                           ^~~~~~
      |                           __ibm128
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:35: error: expected primary-expression before ‘,’ token
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                   ^
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:37: error: ‘__m128’ was not declared in this scope; did you mean ‘__ibm128’?
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                     ^~~~~~
      |                                     __ibm128
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:45: error: expected primary-expression before ‘,’ token
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                             ^
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:47: error: expected primary-expression before ‘unsigned’
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                               ^~~~~~~~
Comment 14 Yiyang Wu 2023-02-21 06:26:57 UTC
Maybe changing __m128 to __ibm128 can resolve this compilation issue[1]

[1] https://gcc.gnu.org/wiki/Ieee128PowerPC
Comment 15 darkbasic 2023-02-21 09:30:57 UTC
Created attachment 853452 [details, diff]
rocm-opencl-runtime-tests-ppc64.patch

The patch fixes the compilation of the tests, but I wonder if it will be needed with the upcoming IEEE long double migration and/or if it will do more harm than good.

Unfortunately I didn't manage to run the tests:

OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} FEATURES="test -ipc-sandbox -mount-sandbox -usersandbox -sandbox -network-sandbox -pid-sandbox" USE=test emerge -v --oneshot rocm-opencl-runtime

>>> Test phase: dev-libs/rocm-opencl-runtime-5.4.3
 * Running oclgl test under DISPLAY :0 ...
Error: unable to open display 
 * Please start an X server using amdgpu driver (not Xvfb!),
 * and export OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} before reruning the test.
 * ERROR: dev-libs/rocm-opencl-runtime-5.4.3::gentoo failed (test phase):
 *   This display does not have AMD OpenGL vendor!

It fails to run a simple glxinfo | grep "OpenGL vendor string: AMD" because it doesn't get access to the display.

Since you managed to pass the tests, can you please tell me how to do so?
Comment 16 darkbasic 2023-02-21 09:34:49 UTC
Sorry the real log is the following because I was messing up with ENV variables inside the ebuild:

>>> Test phase: dev-libs/rocm-opencl-runtime-5.4.3
 * Running oclgl test under DISPLAY :0 ...
Authorization required, but no authorization protocol specified

Error: unable to open display :0
 * Please start an X server using amdgpu driver (not Xvfb!),
 * and export OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} before reruning the test.
 * ERROR: dev-libs/rocm-opencl-runtime-5.4.3::gentoo failed (test phase):
 *   This display does not have AMD OpenGL vendor!
Comment 17 Yiyang Wu 2023-02-21 11:19:35 UTC
> Error: unable to open display 
>  * Please start an X server using amdgpu driver (not Xvfb!),
>  * and export OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} before
> reruning the test.
>  * ERROR: dev-libs/rocm-opencl-runtime-5.4.3::gentoo failed (test phase):
>  *   This display does not have AMD OpenGL vendor!
> 
> It fails to run a simple glxinfo | grep "OpenGL vendor string: AMD" because
> it doesn't get access to the display.
> 
> Since you managed to pass the tests, can you please tell me how to do so?

As the instruction suggest, you have to open an Xserver that is rendering using the GPU you test. For example, I use `xinit xterm -- :0` (maybe you have to connect GPU to a monitor). And then :0 is a display with AMD vendor OpenGL. Also you need to have OpenGL implementation, like media-libs/mesa[+video_cards_radeon].

If you don't care about oclgl tests you can delete the code block

       if [[ -n ${OCLGL_DISPLAY+x} ]]; then
                export DISPLAY=${OCLGL_DISPLAY}
                export XAUTHORITY=${OCLGL_XAUTHORITY}
                ebegin "Running oclgl test under DISPLAY ${OCLGL_DISPLAY}"
                if ! glxinfo | grep "OpenGL vendor string: AMD"; then
                        ewarn "${instruction1}"
                        ewarn "${instruction2}"
                        die "This display does not have AMD OpenGL vendor!"
                fi
                ./ocltst -m $(realpath liboclgl.so) -A ogl.exclude
                eend $? || die "oclgl test failed"
        else
                ewarn "${instruction1}"
                ewarn "${instruction2}"
                die "\${OCLGL_DISPLAY} not set."
        fi
Comment 18 darkbasic 2023-02-21 11:22:53 UTC
Sorry, I didn't make clear that running that very same command outside of the ebuild works flawlessly:

glxinfo | grep "OpenGL vendor string: AMD"
OpenGL vendor string: AMD

I am running an XWayland server from my Gnome session.

The same glxinfo command fails from the ebuild with Error: unable to open display :0
Comment 19 Yiyang Wu 2023-02-21 12:47:05 UTC
(In reply to darkbasic from comment #18)
> Sorry, I didn't make clear that running that very same command outside of
> the ebuild works flawlessly:
> 
> glxinfo | grep "OpenGL vendor string: AMD"
> OpenGL vendor string: AMD
> 
> I am running an XWayland server from my Gnome session.
> 
> The same glxinfo command fails from the ebuild with Error: unable to open
> display :0

I believe that's due to the command during emerge is executed using portage user, which does not have permission to your :0

BTW Xwayland should be OK
Comment 20 darkbasic 2023-02-21 13:49:38 UTC
Do you know how I could give it access to the Xorg server?
Comment 21 Yiyang Wu 2023-02-21 13:55:41 UTC
(In reply to darkbasic from comment #20)
> Do you know how I could give it access to the Xorg server?

I think a more convenient way is set PORTAGE_USERNAME and PORTAGE_GRPNAME to the user who opens the Xserver (envvar/package.env/make.conf)
Comment 22 darkbasic 2023-02-21 14:22:16 UTC
PORTAGE_USERNAME=niko PORTAGE_GRPNAME=niko OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} FEATURES="test" USE=test emerge -v --oneshot rocm-opencl-runtime

>>> Test phase: dev-libs/rocm-opencl-runtime-5.4.3
 * Running oclgl test under DISPLAY :0 ...
OpenGL vendor string: AMD
Built for Emulation Environment
ocltst: /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/temp/environment: line 2213:    38 Aborted                 (core dumped) ./ocltst -m $(realpath liboclgl.so) -A ogl.exclude

Tests do indeed fail with the same error as clinfo.
Comment 23 darkbasic 2024-03-01 13:23:24 UTC
Here is the new upstream issue with ROCM 6.0.2 and a better backtrace: https://github.com/ROCm/clr/issues/61