Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 719676

Summary: =sci-biology/foldingathome-7.6.9 wants missing libhsa-ext-finalize64.so.1 (fix included)
Product: Gentoo Linux Reporter: Gordon Pettey <petteyg359>
Component: Current packagesAssignee: Gentoo Science Biology related packages <sci-biology>
Status: CONFIRMED ---    
Severity: normal CC: gentoo-bugs, imp, jstein, sci-biology
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Suggested new ebuild

Description Gordon Pettey 2020-04-26 16:57:58 UTC
This version depends on libhsa-ext-finalize64.so.1, which is in https://github.com/HSAFoundation/HSA-Runtime-AMD. The library does not appear in distfiles of dev-libs/hsa-ext-rocr nor dev-libs/amdgpu-pro-opencl.

This version's FAHClient executable is no longer linked to openssl-compat libaries, they might be removed.

Reproducible: Always

Steps to Reproduce:
Upgrade from 7.5.1, where an existing workunit was running on an AMD GPU.

cd /opt/foldingathome
sudo -u foldingathome ./FAHClient
Actual Results:  
16:54:55:INFO(1):Read GPUs.txt
LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory


Starting from an clean slate with no existing work units, the CPU folding runs fine. Logs show
16:56:17:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
16:56:17:ERROR:No compute devices matched GPU #0 {
16:56:17:ERROR:  "vendor": 4098,
16:56:17:ERROR:  "device": 26751,
16:56:17:ERROR:  "type": 1,
16:56:17:ERROR:  "species": 5,
16:56:17:ERROR:  "description": "Vega 10 XL/XT [Radeon RX Vega 56/64]"
16:56:17:ERROR:}.  You may need to update your graphics drivers.
Comment 1 Ian Stakenvicius (RETIRED) gentoo-dev 2020-04-27 15:09:17 UTC
Would you be able to confirm for me if F@H is failing to process I have only been able to do a little bit of research on this, but, is the lack of this library preventing this version from operating upon your GPU?  If so, does your GPU work properly with any other OpenCL related software?

I am going to change the focus of this bug to just deal with HSA; verifying whether openssl-compat is still needed is easy enough to do separately.
Comment 2 Gordon Pettey 2020-04-27 17:34:46 UTC
As the original comment says, GPU folding works fine without that library in 7.5.1. After upgrade, 7.6.9 attempts to load that library to resume the active GPU workunit and fails. If I wipe out the work directory, it runs on the CPU fine. I'm not sure why it crashes attempting to load the hsa library on an existing unit but not when "detecting" the GPU.
Comment 3 me 2020-04-30 07:17:48 UTC
(In reply to Gordon Pettey from comment #2)
> I'm not sure why it crashes attempting to load the hsa library
> on an existing unit but not when "detecting" the GPU.
Guess this depends on the workunits requirements. Can confirm this behaviour on my machine.
Comment 4 primalucegd 2020-05-08 20:27:04 UTC
Can confirm this behaviour on my machine as well (RX 480) along with the addition of libhsa-ext-image64.so.1. This issue has also made Blender with the opencl flag unuseable for the exact same reason.
Comment 5 primalucegd 2020-05-10 04:57:57 UTC
Update:

Turns out that libhsa-ext-finalize64.so, at least, is depreciated HSA extensions that are no longer shipped, as confirmed by https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/89#issuecomment-613788944. I believe that the newer version of F@H (7.6.13) no longer has these dependencies, and it Should™ run. Take this with a grain of salt - I just got the .deb of F@H's website, unpacked with ar and tar, and then tried doing 'FAHClient --help', which causes an abort/segfault with what gets installed by the package available in the Gentoo repo (7.6.9).
Comment 6 Gordon Pettey 2020-05-10 13:19:47 UTC
Using FAHClient and FAHCoreWrapper from 7.6.13 I no longer see errors, whether initiating a new GPU unit or resuming an old one. Bump it again :)
Comment 7 me 2020-05-14 18:06:18 UTC
The problem still remains with the bumped version from https://bugs.gentoo.org/721980 .

LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory
Segmentation fault
Comment 8 primalucegd 2020-05-20 08:39:42 UTC
You're right, it just presented the same issue for me after I rebooted. I now get something like

08:39:12:Removing old file 'logs/log-20200323-063800.txt'
08:39:12:Trying to access database...
08:39:12:Successfully acquired database lock
08:39:12:Read GPUs.txt
08:39:12:Enabled folding slot 00: READY cpu:3
LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory
LoadLib(libhsa-ext-image64.so.1) failed: libhsa-ext-image64.so.1: cannot open shared object file: No such file or directory
Segmentation fault
Comment 9 Gordon Pettey 2020-05-20 13:19:25 UTC
There is some difference between the deb and rpm packages. Gentoo's ebuild is using the RPM package. Using FAHClient from the upstream RPM, it crashes. Using the deb, I still see the LoadLib failure message, but it does not crash, and continues folding successfully.
Comment 10 me 2020-05-22 09:20:22 UTC
(In reply to Gordon Pettey from comment #9)
> There is some difference between the deb and rpm packages. Gentoo's ebuild
> is using the RPM package. Using FAHClient from the upstream RPM, it crashes.
> Using the deb, I still see the LoadLib failure message, but it does not
> crash, and continues folding successfully.

Can confirm, copied the FAHClient (and wrapper) from the .deb over into /opt/foldingathome/ dir.

Now it starts with:
LoadLib(libhsa-ext-finalize64.so.1) failed: libhsa-ext-finalize64.so.1: cannot open shared object file: No such file or directory
LoadLib(libhsa-amd-aqlprofile64.so) failed: libhsa-amd-aqlprofile64.so: cannot open shared object file: No such file or directory
22:24:00:Enabled folding slot 02: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
22:24:00:****************************** FAHClient ******************************
22:24:00:        Version: 7.6.13

and folds happily :).
Comment 11 Jesper Saxtorph 2020-05-22 20:39:38 UTC
I can also confirm this problem.
And I can confirm it works with the debian package.

Trying to start the failing FAHClient manually actually shows a "free(): invalid pointer" as the problem just after the LoadLib fails, so it has probably nothing to do with the LoadLib fail.

I am attaching an updated ebuild using the debian version and adding a use flag for rocm dependencies.

Note the CentOS package as used in the old ebuild and the Debian used in this have different URL's, but the have the same file name.
So if any of you try to use this ebuild, you need to download the package manually and force the new checksum with ebuild --force foldingathome-7.6.13-r1.ebuild manifest.
Comment 12 Jesper Saxtorph 2020-05-22 20:40:34 UTC
Created attachment 640934 [details]
Suggested new ebuild