Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 670012

Summary: sci-biology/foldingathome-7.5.1: FAHClient: segmentation fault at startup with media-libs/mesa libOpenCL.so
Product: Gentoo Linux Reporter: Alexander Miller <alex.miller>
Component: Current packagesAssignee: Ian Stakenvicius <axs>
Status: UNCONFIRMED ---    
Severity: normal CC: sci-biology
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: foldingathome-7.5.1-r1.ebuild

Description Alexander Miller 2018-10-31 02:07:50 UTC
When the opencl provider is set to mesa, FAHClient crashes with a segfault every time I try to start it, even if GPU folding is off or when I just run FAHClient --help. The problem seems to be that FAHClient exports a few symbols from a statically linked libexpat which are API-incompatible with the system's libexpat.so:

Symbol table '.dynsym' contains 1236 entries:
   604: 00000000007c26d0    79 FUNC    GLOBAL DEFAULT   14 XmlParseXmlDecl
   651: 00000000007c1d20   176 FUNC    GLOBAL DEFAULT   14 XmlInitEncoding
   895: 00000000007c2720    79 FUNC    GLOBAL DEFAULT   14 XmlParseXmlDeclNS
   924: 00000000007c1c10    26 FUNC    GLOBAL DEFAULT   14 XmlInitUnknownEncodingNS
   947: 00000000007c1dd0   176 FUNC    GLOBAL DEFAULT   14 XmlInitEncodingNS
  1210: 00000000007c0f60  2213 FUNC    GLOBAL DEFAULT   14 XmlInitUnknownEncoding

When a loaded library links against and tries to use the system libexpat.so, that leads to a crash. Mesa's libOpenCL.so does just that when dlopen()ed,
as you can see in the following backtrace:

Thread 1 "FAHClient" received signal SIGSEGV, Segmentation fault.

#0  0x00000000007b76e0 in ?? ()
#1  0x000000365e00d869 in storeAtts (parser=parser@entry=0xcb3950, 
    enc=enc@entry=0xba1380, attStr=<optimized out>, 
    tagNamePtr=tagNamePtr@entry=0xcbc348, 
    bindingsPtr=bindingsPtr@entry=0xcbc380)
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:3219
#2  0x000000365e011744 in doContent (parser=parser@entry=0xcb3950, 
    startTagLevel=startTagLevel@entry=0, enc=<optimized out>, 
    s=s@entry=0xcb4320 "<driinfo>\n<section>\n<description lang=\"en\" text=\"Performance\"/>\n<description lang=\"ca\" text=\"Rendiment\"/>\n<description lang=\"de\" text=\"Leistung\"/>\n<description lang=\"es\" text=\"Rendimiento\"/>\n<descript"..., end=end@entry=0xcb9993 "", nextPtr=nextPtr@entry=0xcb3980, 
    haveMore=0 '\000')
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:2865
#3  0x000000365e012474 in contentProcessor (parser=parser@entry=0xcb3950, 
    start=start@entry=0xcb4320 "<driinfo>\n<section>\n<description lang=\"en\" text=\"Performance\"/>\n<description lang=\"ca\" text=\"Rendiment\"/>\n<description lang=\"de\" text=\"Leistung\"/>\n<description lang=\"es\" text=\"Rendimiento\"/>\n<descript"..., end=end@entry=0xcb9993 "", endPtr=endPtr@entry=0xcb3980)
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:2531
#4  0x000000365e0101d3 in doProlog (parser=parser@entry=0xcb3950, 
    enc=<optimized out>, 
    s=s@entry=0xcb4320 "<driinfo>\n<section>\n<description lang=\"en\" text=\"Performance\"/>\n<description lang=\"ca\" text=\"Rendiment\"/>\n<description lang=\"de\" text=\"Leistung\"/>\n<description lang=\"es\" text=\"Rendimiento\"/>\n<descript"..., end=end@entry=0xcb9993 "", tok=29, next=<optimized out>, 
    nextPtr=0xcb3980, haveMore=0 '\000')
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:4556
#5  0x000000365e010b45 in prologProcessor (parser=0xcb3950, 
    s=0xcb4320 "<driinfo>\n<section>\n<description lang=\"en\" text=\"Performance\"/>\n<description lang=\"ca\" text=\"Rendiment\"/>\n<description lang=\"de\" text=\"Leistung\"/>\n<description lang=\"es\" text=\"Rendimiento\"/>\n<descript"..., end=0xcb9993 "", nextPtr=0xcb3980)
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:4270
#6  0x000000365e00c250 in XML_ParseBuffer (parser=<optimized out>, 
    len=<optimized out>, isFinal=<optimized out>, isFinal=<optimized out>, 
    len=<optimized out>, parser=<optimized out>)
    at /usr/src/debug/dev-libs/expat-2.2.5/expat-2.2.5/lib/xmlparse.c:1983
#7  0x00007ffff65ff9e3 in driParseOptionInfo (
    configOptions=0x7fffe3c04590 "<driinfo>\n<section>\n<description lang=\"en\" text=\"Performance\"/>\n<description lang=\"ca\" text=\"Rendiment\"/>\n<description lang=\"de\" text=\"Leistung\"/>\n<description lang=\"es\" text=\"Rendimiento\"/>\n<descript"..., info=0xca0718)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/util/xmlconfig.c:750
#8  pipe_loader_load_options (dev=0xca06e0)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/auxiliary/pipe-loader/pipe_loader.c:109
#9  pipe_loader_create_screen (dev=0xca06e0)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/auxiliary/pipe-loader/pipe_loader.c:134
#10 __base_ctor  (platform=..., ldev=0xca06e0, this=0xca9190)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/state_trackers/clover/core/device.cpp:47
#11 _ZN6clover6createINS_6deviceEJRNS_8platformERP18pipe_loader_deviceEEENS_13intrusive_refIT_EEDpOT0_.isra.3 (as#0=...)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/state_trackers/clover/util/pointer.hpp:230
#12 __base_ctor  (
    this=0x7ffff694b9e0 <(anonymous namespace)::_clover_platform>)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/state_trackers/clover/core/platform.cpp:36
#13 __static_initialization_and_destruction_0 (__initialize_p=1, 
    __priority=65535)
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/state_trackers/clover/api/platform.cpp:31
#14 _GLOBAL__sub_I_platform.cpp.lto_priv.1160 ()
    at /usr/src/debug/media-libs/mesa-18.1.9/mesa-18.1.9/src/gallium/state_trackers/clover/api/platform.cpp:145
#15 0x00007ffff6600568 in global constructors keyed to 65535_0_libpipe_loader_dynamic.a_0x522.248669 () from /usr/lib64/libOpenCL.so
#16 0x00007ffff7de5c0a in ?? () from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff7de5d04 in ?? () from /lib64/ld-linux-x86-64.so.2
#18 0x00007ffff7de9c53 in ?? () from /lib64/ld-linux-x86-64.so.2
#19 0x00007ffff727ff0f in _dl_catch_exception () from /lib64/libc.so.6
#20 0x00007ffff7de953a in ?? () from /lib64/ld-linux-x86-64.so.2
#21 0x00007ffff79b3f8a in ?? () from /lib64/libdl.so.2
#22 0x00007ffff727ff0f in _dl_catch_exception () from /lib64/libc.so.6
#23 0x00007ffff727ffa7 in _dl_catch_error () from /lib64/libc.so.6
#24 0x00007ffff79b4775 in ?? () from /lib64/libdl.so.2
#25 0x00007ffff79b4049 in dlopen () from /lib64/libdl.so.2
#26 0x00000000006279fe in cb::DynamicLibrary::DynamicLibrary(std::string const&) ()
#27 0x000000000069c242 in FAH::OpenCLLibrary::OpenCLLibrary(cb::Inaccessible) ()
#28 0x000000000046ac1f in cb::Singleton<FAH::OpenCLLibrary>::instance() ()
#29 0x0000000000695315 in FAH::FAHSystemInfo::add(cb::Info&, bool) ()
#30 0x0000000000687831 in FAH::FAHApplication::FAHApplication(std::string const&, bool (*)(int)) ()
#31 0x0000000000426aa1 in FAH::ClientApp::ClientApp() ()
#32 0x0000000000420703 in _start ()

So, function storeAtts from libexpat.so tries to call XmlNameLength(enc, currAtt->name) which expands to ((enc)->nameLength)(enc, currAtt->name) and lands somewhere inside the FAHClient binary. The target function then starts
with dereferencing rcx, but that's not where the parameters are passed in the standard api, so rcx contains garbage and we get a segfault.

=> 0x00000000007b76e0:  movzbl (%rcx),%eax

A closer inspection shows that the enc structure contains callback pointers into the FAHClient binary, and that's likely the case because libexpat.so's GOT entry for XmlInitEncoding() indeed points to FAHClient's function at 0x7c1d20.

Obviously, the binary shouldn't export those symbols, and there's no easy way to fix it. Moreover, there are other issues with the binary (require incompatible openssl libs) with partial workarounds in the ebuild (there are still warnings about missing symbol version information). The rpath patching could even be considered a license violation. In summary, the binary used by the current ebuild is totally inadequate for a gentoo system. Fortunately, there's another build available in the debian directory that seems to have none of these problems.

The package should therefore switch to the debian derived build:
https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.5/fahclient_7.5.1-64bit-release.tar.bz2
Comment 1 Alexander Miller 2018-10-31 02:26:21 UTC
Created attachment 553730 [details]
foldingathome-7.5.1-r1.ebuild

Here is an updated ebuild using the other tarball (and a few other small changes). Feel free to use it. Tested here, works fine for me.

(For some weird reason I can't attach the file. I hope it doesn't get messed up when I paste its contents in the text box.)