Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 950234 - sys-cluster/ceph-19.2.1 ceph-osd crash after upgrade, missing /usr/lib64/ceph/erasure-code/libec_isa.so
Summary: sys-cluster/ceph-19.2.1 ceph-osd crash after upgrade, missing /usr/lib64/ceph...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Patrick McLean
URL:
Whiteboard:
Keywords:
Depends on: 942680 950294
Blocks:
  Show dependency tree
 
Reported: 2025-02-24 16:11 UTC by ev
Modified: 2025-02-28 10:55 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info sys-cluster/ceph (emerge-info-ceph-19.2.1.txt,6.76 KB, text/plain)
2025-02-24 16:11 UTC, ev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ev 2025-02-24 16:11:15 UTC
Created attachment 919808 [details]
emerge --info sys-cluster/ceph

I've tried to test out new ceph-19.2.1 recently added to gentoo.
I cannot start any osd daemons because they crash during startup.
I have fully upgraded system.


 ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28) squid (stable)
 1: /usr/lib64/libc.so.6(+0x3c650) [0x7febac450650]
 2: /usr/lib64/libc.so.6(+0x91bcc) [0x7febac4a5bcc]
 3: gsignal()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17d) [0x55ce3563bf1e]
 6: /usr/bin/ceph-osd(+0x4180a6) [0x55ce3563c0a6]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, ceph::common::CephContext*)+0x7a7) [0x55ce3596ef27]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, spg_t)+0x1d7) [0x55ce358e87f7]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0xc7f) [0x55ce3572eaaf]
 10: (OSD::load_pgs()+0x5e4) [0x55ce3574c424]
 11: (OSD::init()+0x279e) [0x55ce3577dd3e]
 12: main()
 13: /usr/lib64/libc.so.6(+0x2616e) [0x7febac43a16e]
 14: __libc_start_main()
 15: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


    -1> 2025-02-24T14:18:18.224+0000 7febaceff680 -1 /var/tmp/portage/sys-cluster/ceph-19.2.1/work/ceph-19.2.1/src/osd/PGBackend.cc: In function 'static PGBackend* PGBackend::build_pg_backend(const pg_pool_t&, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, Listener*, coll_t, ObjectStore::CollectionHandle&, ObjectStore*, ceph::common::CephContext*)' thread 7febaceff680 time 2025-02-24T14:18:18.221468+0000
/var/tmp/portage/sys-cluster/ceph-19.2.1/work/ceph-19.2.1/src/osd/PGBackend.cc: 593: FAILED ceph_assert(ec_impl)

 ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28) squid (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x123) [0x55ce3563bec4]
 2: /usr/bin/ceph-osd(+0x4180a6) [0x55ce3563c0a6]
 3: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, ceph::common::CephContext*)+0x7a7) [0x55ce3596ef27]
 4: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, spg_t)+0x1d7) [0x55ce358e87f7]
 5: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0xc7f) [0x55ce3572eaaf]
 6: (OSD::load_pgs()+0x5e4) [0x55ce3574c424]
 7: (OSD::init()+0x279e) [0x55ce3577dd3e]
 8: main()
 9: /usr/lib64/libc.so.6(+0x2616e) [0x7febac43a16e]
 10: __libc_start_main()
 11: _start()

     0> 2025-02-24T14:18:18.230+0000 7febaceff680 -1 *** Caught signal (Aborted) **
 in thread 7febaceff680 thread_name:ceph-osd

 ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28) squid (stable)
 1: /usr/lib64/libc.so.6(+0x3c650) [0x7febac450650]
 2: /usr/lib64/libc.so.6(+0x91bcc) [0x7febac4a5bcc]
 3: gsignal()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17d) [0x55ce3563bf1e]
 6: /usr/bin/ceph-osd(+0x4180a6) [0x55ce3563c0a6]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, ceph::common::CephContext*)+0x7a7) [0x55ce3596ef27]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, spg_t)+0x1d7) [0x55ce358e87f7]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0xc7f) [0x55ce3572eaaf]
 10: (OSD::load_pgs()+0x5e4) [0x55ce3574c424]
 11: (OSD::init()+0x279e) [0x55ce3577dd3e]
 12: main()
 13: /usr/lib64/libc.so.6(+0x2616e) [0x7febac43a16e]
 14: __libc_start_main()
 15: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Comment 1 ev 2025-02-24 21:41:41 UTC
Error points to https://github.com/ceph/ceph/blob/main/src/osd/PGBackend.cc#L593

I'm using EC pool with ISA plugin.

OSDs cannot stop because this plugin is missing.

Error EIO: load dlopen(/usr/lib64/ceph/erasure-code/libec_isa.so): /usr/lib64/ceph/erasure-code/libec_isa.so: cannot open shared object file: No such file or directory
Comment 2 ev 2025-02-24 22:10:17 UTC
In my tests clusters without ISA Erasure Code pool are not affected.
If you are using ISA EC pool do not upgrade to 19.2.1 on gentoo.
(This is quite a serious issue since you cannot change pool plugin without recreating a pool.)
Comment 3 ev 2025-02-28 10:47:37 UTC
This issue was fixed in sys-cluster/ceph-19.2.1-r2

My test cluster works fine with ec isa pool on this version.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 10:55:09 UTC
Thanks.

FTR, the changes between -r0 and -r2 were:

commit 99d68c0890aa6e633663406368e003a60f13e6f2
Author: Sam James <sam@gentoo.org>
Date:   Tue Feb 25 06:34:01 2025 +0000

    sys-cluster/ceph: don't break installed bundled libraries

    The cmake.eclass `-DBUILD_SHARED_LIBS=ON` default breaks bundled libraries
    when they're forced to be shared. Flip it to off given ceph clearly isn't
    supposed to be built with it. Fixes loading libcpp_redis.so at least
    but possibly others.

    Closes: https://bugs.gentoo.org/942680
    Signed-off-by: Sam James <sam@gentoo.org>

commit 325186ffa178b797ea8cf2d31ac08b067e1b024b
Author: Sam James <sam@gentoo.org>
Date:   Wed Feb 26 11:22:29 2025 +0000

    sys-cluster/ceph: fix building bundled isa-l

    It gets confused by D exported from the ebuild environment.

    Closes: https://bugs.gentoo.org/950294
    Signed-off-by: Sam James <sam@gentoo.org>

commit 97e5262b48cac39e434609ffa6df0e8d2a1bc30b
Author: Patrick Lauer <patrick@gentoo.org>
Date:   Wed Feb 26 12:16:28 2025 +0000

    sys-cluster/ceph: More cleanups

    Remove nasm hack for isa-l,
    and add patch to quiet compiler some warnings.

    Signed-off-by: Patrick Lauer <patrick@gentoo.org>