This version depends on libhsa-ext-finalize64.so.1, which is in https://github.com/HSAFoundation/HSA-Runtime-AMD. The library does not appear in distfiles of dev-libs/hsa-ext-rocr nor dev-libs/amdgpu-pro-opencl. This version's FAHClient executable is no longer linked to openssl-compat libaries, they might be removed. Reproducible: Always Steps to Reproduce: Upgrade from 7.5.1, where an existing workunit was running on an AMD GPU. cd /opt/foldingathome sudo -u foldingathome ./FAHClient Actual Results: 16:54:55:INFO(1):Read GPUs.txt LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory Starting from an clean slate with no existing work units, the CPU folding runs fine. Logs show 16:56:17:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] 16:56:17:ERROR:No compute devices matched GPU #0 { 16:56:17:ERROR: "vendor": 4098, 16:56:17:ERROR: "device": 26751, 16:56:17:ERROR: "type": 1, 16:56:17:ERROR: "species": 5, 16:56:17:ERROR: "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]" 16:56:17:ERROR:}. You may need to update your graphics drivers.
Would you be able to confirm for me if F@H is failing to process I have only been able to do a little bit of research on this, but, is the lack of this library preventing this version from operating upon your GPU? If so, does your GPU work properly with any other OpenCL related software? I am going to change the focus of this bug to just deal with HSA; verifying whether openssl-compat is still needed is easy enough to do separately.
As the original comment says, GPU folding works fine without that library in 7.5.1. After upgrade, 7.6.9 attempts to load that library to resume the active GPU workunit and fails. If I wipe out the work directory, it runs on the CPU fine. I'm not sure why it crashes attempting to load the hsa library on an existing unit but not when "detecting" the GPU.
(In reply to Gordon Pettey from comment #2) > I'm not sure why it crashes attempting to load the hsa library > on an existing unit but not when "detecting" the GPU. Guess this depends on the workunits requirements. Can confirm this behaviour on my machine.
Can confirm this behaviour on my machine as well (RX 480) along with the addition of libhsa-ext-image64.so.1. This issue has also made Blender with the opencl flag unuseable for the exact same reason.
Update: Turns out that libhsa-ext-finalize64.so, at least, is depreciated HSA extensions that are no longer shipped, as confirmed by https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/89#issuecomment-613788944. I believe that the newer version of F@H (7.6.13) no longer has these dependencies, and it Should™ run. Take this with a grain of salt - I just got the .deb of F@H's website, unpacked with ar and tar, and then tried doing 'FAHClient --help', which causes an abort/segfault with what gets installed by the package available in the Gentoo repo (7.6.9).
Using FAHClient and FAHCoreWrapper from 7.6.13 I no longer see errors, whether initiating a new GPU unit or resuming an old one. Bump it again :)
The problem still remains with the bumped version from https://bugs.gentoo.org/721980 . LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory Segmentation fault
You're right, it just presented the same issue for me after I rebooted. I now get something like 08:39:12:Removing old file 'logs/log-20200323-063800.txt' 08:39:12:Trying to access database... 08:39:12:Successfully acquired database lock 08:39:12:Read GPUs.txt 08:39:12:Enabled folding slot 00: READY cpu:3 LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory LoadLib(libhsa-ext-image64.so.1) failed: libhsa-ext-image64.so.1: cannot open shared object file: No such file or directory Segmentation fault
There is some difference between the deb and rpm packages. Gentoo's ebuild is using the RPM package. Using FAHClient from the upstream RPM, it crashes. Using the deb, I still see the LoadLib failure message, but it does not crash, and continues folding successfully.
(In reply to Gordon Pettey from comment #9) > There is some difference between the deb and rpm packages. Gentoo's ebuild > is using the RPM package. Using FAHClient from the upstream RPM, it crashes. > Using the deb, I still see the LoadLib failure message, but it does not > crash, and continues folding successfully. Can confirm, copied the FAHClient (and wrapper) from the .deb over into /opt/foldingathome/ dir. Now it starts with: LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory LoadLib(libhsa-amd-aqlprofile64.so) failed: libhsa-amd-aqlprofile64.so: cannot open shared object file: No such file or directory 22:24:00:Enabled folding slot 02: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] 22:24:00:****************************** FAHClient ****************************** 22:24:00: Version: 7.6.13 and folds happily :).
I can also confirm this problem. And I can confirm it works with the debian package. Trying to start the failing FAHClient manually actually shows a "free(): invalid pointer" as the problem just after the LoadLib fails, so it has probably nothing to do with the LoadLib fail. I am attaching an updated ebuild using the debian version and adding a use flag for rocm dependencies. Note the CentOS package as used in the old ebuild and the Debian used in this have different URL's, but the have the same file name. So if any of you try to use this ebuild, you need to download the package manually and force the new checksum with ebuild --force foldingathome-7.6.13-r1.ebuild manifest.
Created attachment 640934 [details] Suggested new ebuild