First of all, thanks that dev-libs/rocm-opencl-runtime finally found it's way into the Gentoo tree! But I have the issue of dev-libs/rocm-opencl-runtime-2.6.0-r1::gentoo not finding my card, whereas dev-libs/rocm-opencl-runtime-2.6.0::rocm did, running on kernel 5.2.8. I scrapped the ::rocm versions and emerged the ::gentoo version. Did notice it had different deps however. # clinfo Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP.internal (2924.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_object_metadata cl_amd_event_callback Platform Max metadata object keys (AMD) 8 Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 0 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) clCreateContext(NULL, ...) [default] No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.12 ICD loader Profile OpenCL 2.2 # rocminfo ROCm initialization failed hsa api call failure at: /var/tmp/portage/dev-util/rocminfo-2.6.0/work/rocminfo-roc-2.6.0/rocminfo.cc:1068 Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. I understand that rocm-opencl-runtime-2.6.0-r1::gentoo does not install the 'rocm' OpenCL provider any longer to choose with eselect opencl but uses ocl-icd now. But the transition from the overlay is not without glitches it seems...
(In reply to ernsteiswuerfel from comment #0) > First of all, thanks that dev-libs/rocm-opencl-runtime finally found it's > way into the Gentoo tree! > > But I have the issue of dev-libs/rocm-opencl-runtime-2.6.0-r1::gentoo not > finding my card, whereas dev-libs/rocm-opencl-runtime-2.6.0::rocm did, I cannot find that "rocm" overlay on https://overlays.gentoo.org/ so to help everyone else find it, please mention that URL in this bug report. The [URL] field should be a good place.
Gentoo doesn't support that overlay; if you can reproduce this issue using ony packages from Gentoo (I cannot), then I can help - but with a mix of Gentoo and overlay packages I'm sorry that to say that you'll need to get help elsewhere.
Hi, I have similar problem as ernsteiswuerfel - my system should theoretically be compatible with opencl on amd gpu however rocminfo output is: ==/ $ rocminfo ROCk module is loaded johnny is member of video group hsa api call failure at: /var/tmp/portage/dev-util/rocminfo-2.7.0/work/rocminfo-roc-2.7.0/rocminfo.cc:1102 Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ==\ I'll attach clinfo next
Created attachment 588854 [details] clinfo out
Do you have these kernel configuration options set? HSA_AMD HMM_MIRROR ZONE_DEVICE If not, you must set them - please do that then try again. dev-libs/roct-thunk-interface would have warned you if these were not set.
(In reply to Craig Andrews from comment #5) > Do you have these kernel configuration options set? > > HSA_AMD > HMM_MIRROR > ZONE_DEVICE > > If not, you must set them - please do that then try again. > dev-libs/roct-thunk-interface would have warned you if these were not set. I do have those :) weren't easy to find: ==/ # zgrep "HSA_AMD" /proc/config.gz CONFIG_HSA_AMD=y # zgrep "HMM_MIRROR" /proc/config.gz CONFIG_ARCH_HAS_HMM_MIRROR=y CONFIG_HMM_MIRROR=y # zgrep "ZONE_DEVICE" /proc/config.gz CONFIG_ARCH_HAS_ZONE_DEVICE=y CONFIG_ZONE_DEVICE=y # uname -a Linux inspiron17 5.2.9-gentoo #1 SMP Sun Aug 25 17:07:22 CEST 2019 x86_64 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz GenuineIntel GNU/Linux ==\ Should it work? Amd GPU is secondary unit, but module loads and i don't see errors in dmesg...
Are you using all packages from Gentoo? I'm not sure if this could be caused by mixing old rocm overlay packages with Gentoo ones or not. Also, some quick web searching for that error (HSA_STATUS_ERROR_OUT_OF_RESOURCES) indicates that you may need to reboot to resolve it. I'm curious to learn if you've tried that and if it changed anything.
(In reply to Craig Andrews from comment #7) > Are you using all packages from Gentoo? I'm not sure if this could be caused > by mixing old rocm overlay packages with Gentoo ones or not. > > Also, some quick web searching for that error > (HSA_STATUS_ERROR_OUT_OF_RESOURCES) indicates that you may need to reboot to > resolve it. I'm curious to learn if you've tried that and if it changed > anything. Rebooted frequently, no change in behaviour. Also - all packages from gentoo, no overlays on this system ever.
> I understand that rocm-opencl-runtime-2.6.0-r1::gentoo does not install the > 'rocm' OpenCL provider any longer to choose with eselect opencl but uses > ocl-icd now. But the transition from the overlay is not without glitches it > seems... Did it work before with the OpenCL libs from the old ebuild?
(In reply to justXi from comment #9) > Did it work before with the OpenCL libs from the old ebuild? Yes, with the libs from your rocm overlay it works. It also works with dev-libs/amdgpu-pro-opencl.
Created attachment 589484 [details] kernel .config (5.2.13) Also the kernel options roct-thunk-interface requests are correctly set. # rocminfo ROCm initialization failed hsa api call failure at: /var/tmp/portage/dev-util/rocminfo-2.6.0/work/rocminfo-roc-2.6.0/rocminfo.cc:1068 Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. rocminfo shows this output ^^ But I also got that output from it the time where OpenCL worked on rocm overlay.
(In reply to ernsteiswuerfel from comment #11) > Created attachment 589484 [details] > kernel .config (5.2.13) > > Also the kernel options roct-thunk-interface requests are correctly set. > > # rocminfo > ROCm initialization failed > hsa api call failure at: > /var/tmp/portage/dev-util/rocminfo-2.6.0/work/rocminfo-roc-2.6.0/rocminfo.cc: > 1068 > Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to > allocate the necessary resources. This error may also occur when the core > runtime library needs to spawn threads or create internal OS-specific events. > > rocminfo shows this output ^^ But I also got that output from it the time > where OpenCL worked on rocm overlay. Can you please try with dev-libs/roct-thunk-interface-2.8.0? If rocminfo doesn't show any devices with that version, can you please report the issue upstream at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/issues and link to that issue report here? (I suspect they'll need to ask you some questions and to gather additional information)
See: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/issues/44 That thread reports: A change to ROCT 2.8.0 now requires the kernel config "CONFIG_NUMA=y" to be set, even for non-NUMA systems. Also, the kernel config "CONFIG_CPU_SUP_* =y" (as appropriate for your CPU) should be set. For example: "CONFIG_CPU_SUP_INTEL=y" "CONFIG_CPU_SUP_AMD=y" Also note from that thread: ascollard commented Oct 1, 2019 (Contributor) "In the past ROCT uses cpuid instruction to get CPU cache information. This was causing problems when new CPUs were introduced to the market with new cpuid operations required. Using sysfs removes this limitation. For now please make your kernel with CONFIG_NUMA=y. In the future ROCT release we can add the fallback when NUMA is not enabled in the system."
(In reply to Martin from comment #13) > See: > > https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/issues/44 > > That thread reports: > > > A change to ROCT 2.8.0 now requires the kernel config "CONFIG_NUMA=y" to be > set, even for non-NUMA systems. > > Also, the kernel config "CONFIG_CPU_SUP_* =y" (as appropriate for your CPU) > should be set. For example: > > "CONFIG_CPU_SUP_INTEL=y" > "CONFIG_CPU_SUP_AMD=y" > > > Also note from that thread: > > ascollard commented Oct 1, 2019 (Contributor) > > "In the past ROCT uses cpuid instruction to get CPU cache information. This > was causing problems when new CPUs were introduced to the market with new > cpuid operations required. Using sysfs removes this limitation. > > For now please make your kernel with CONFIG_NUMA=y. In the future ROCT > release we can add the fallback when NUMA is not enabled in the system." All of that information is relevant when dev-libs/roct-thunk-interface-2.8.0 is installed; however, that version wasn't added to Gentoo until September 24, 60b6c127957561b46198428eb401c8b04e5644ea which is well after this bug report was created. So this bug report expresses a different problem.
Thanks for that and indeed so. I've just recompiled my kernel with "CONFIG_NUMA=y", rebooted into that kernel, and... No change seen. For example I still get: clinfo: [...] Platform Name AMD Accelerated Parallel Processing Number of devices 0 [...] Checking for the NUMA, there is correctly now a "/sys/devices/system/node/node0" for my system showing that the NUMA code is in place... Sorry for the noise.
Can you please try again with ROC 2.8 (currently in Gentoo)? And also, what card do you have?
(In reply to Craig Andrews from comment #16) > Can you please try again with ROC 2.8 (currently in Gentoo)? And also, what > card do you have? Card is a Radeon RX 590. # inxi -b System: Host: supah Kernel: 5.3.5-gentoo x86_64 bits: 64 Desktop: MATE 1.22.0 Distro: Gentoo Base System release 2.6 Machine: Type: Server Mobo: Supermicro model: H8SGL v: 1234567890 serial: OM1BS70566 BIOS: American Megatrends v: 3.5b date: 03/18/2016 CPU: 8-Core: AMD Opteron 6380 type: MT MCP speed: 1399 MHz min/max: 1400/2500 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] driver: amdgpu v: kernel Device-2: Matrox Systems MGA G200eW WPCM450 driver: N/A Display: x11 server: X.Org 1.20.5 driver: amdgpu,ati unloaded: modesetting,radeon resolution: 1920x1080~60Hz OpenGL: renderer: Radeon RX 590 Series (POLARIS10 DRM 3.33.0 5.3.5-gentoo LLVM 8.0.1) v: 4.5 Mesa 19.1.7 There is definately a change with Kernel 5.3.x and ROCm 2.8. Both rocminfo/clinfo now give me a kernel crash when I invoke them: [...] [ 277.314209] BUG: kernel NULL pointer dereference, address: 00000000000001ec [ 277.314214] #PF: supervisor write access in kernel mode [ 277.314216] #PF: error_code(0x0002) - not-present page [ 277.314218] PGD 0 P4D 0 [ 277.314222] Oops: 0002 [#1] SMP NOPTI [ 277.314226] CPU: 11 PID: 1664 Comm: rocminfo Not tainted 5.3.5-gentoo #2 [ 277.314228] Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5b 03/18/2016 [ 277.314231] RIP: 0010:0xffffffffc0cc8aa0 [...] I have yet to report this crash upstream. Funnily amdgpu-pro-opencl still works without any problem.
Finally found some time for an upstream report: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/issues/45 Sorry it took me that long! Did not mention the crash from comment #17 as it happens no longer with ROC 2.9 and kernel 5.4-rc4.
2.6.0-r1 no longer in tree.