Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 520218

Summary: sys-cluster/ceph-0.80.5 - init script breaks standard ceph conventions; fails to start daemons properly
Product: Gentoo Linux Reporter: Aaron Ten Clay <gentoo-bugzilla>
Component: [OLD] ServerAssignee: Patrick McLean <chutzpah>
Status: RESOLVED OBSOLETE    
Severity: normal CC: cluster, dlan, mgorny
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Aaron Ten Clay 2014-08-19 00:06:04 UTC
After updating from 0.79, the Gentoo-provided init script for Ceph is unable to start daemons without superfluous symlinks e.g. /etc/init.d/ceph-osd.0 -> /etc/init.d/ceph. Also, even with such symlinks in place, the init script does not start daemons with proper command-line arguments, does not mount filesystems appropriately for OSDs, etc. I believe the previous version of Gentoo's Ceph init script called the actual Ceph init script at /usr/lib/ceph/ceph_init.sh, which properly handled these tasks.

Reproducible: Always

Steps to Reproduce:
1. Install Ceph according to Ceph documentation
1a. Ensure /etc/ceph/ceph.conf is correctly listing all daemons for the current host
2. Attempt to start Ceph (/etc/init.d/ceph start)
- or -
2a. Symlink init script (ln -s /etc/init.d/ceph /etc/init.d/ceph-osd.0)
2b. Attempt to start Ceph (/etc/init.d/ceph-osd.0 start)

Actual Results:  
Ceph does not start, init script is expecting superfluous symlinks and/or does not verify OSD filesystem contains correct datastructurs + mount filesystem as needed.

Expected Results:  
Behaved as /usr/lib/ceph/ceph_init.sh in regards to starting daemons based on /etc/ceph/ceph.conf

Portage 2.2.8-r1 (default/linux/amd64/13.0, gcc-4.7.3, glibc-2.15-r3, 3.10.7-gentoo-r1 x86_64)
=================================================================
System uname: Linux-3.10.7-gentoo-r1-x86_64-Intel-R-_Core-TM-_i7-4770_CPU_@_3.40GHz-with-gentoo-2.2
KiB Mem:    32858008 total,  32641600 free
KiB Swap:    8388604 total,   8388604 free
Timestamp of tree: Mon, 18 Aug 2014 05:15:01 +0000
ld GNU ld (GNU Binutils) 2.23.1
app-shells/bash:          4.2_p45
dev-lang/python:          2.7.5-r3, 3.2.5-r3, 3.3.3
dev-util/pkgconfig:       0.28
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.13.4
sys-devel/binutils:       2.23.1
sys-devel/gcc:            4.7.3-r1
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4.2
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.9 (virtual/os-headers)
sys-libs/glibc:           2.15-r3
Repositories: gentoo
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://gentoo.closest.myvwan.com/gentoo"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j6"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://gentoo.closest.myvwan.com/gentoo-portage"
USE="acl agent amd64 berkdb bflsc bindist bitforce bzip2 cli cracklib crypt cryptsetup cxx dri fortran gdbm iconv ipv6 mmx modules multilib ncurses nls nptl openmp pam pcre readline session sse sse2 ssl tcpd unicode xfs zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Comment 1 Aaron Ten Clay 2014-08-19 00:08:02 UTC
Workaround for now is:

echo "/usr/lib/ceph/ceph_init.sh start" >> /etc/local.d/ceph.start
echo "/usr/lib/ceph/ceph_init.sh stop" >> /etc/local.d/ceph.stop
chmod +x /etc/local.d/ceph.{start,stop}
Comment 2 Yixun Lan archtester gentoo-dev 2014-08-20 04:20:56 UTC
yes, we expect user to create additional symbol link.
Also, I'd leave "mounting OSDs" out of ceph script, can you handle it with proper /etc/fstab? 

still, I've not seen what is the specific error? just mounting OSDs failure
Comment 3 Yixun Lan archtester gentoo-dev 2014-09-12 13:25:26 UTC
closed as invalid, we'd leave mount-file-system out of ceph init script.
Comment 4 Aaron Ten Clay 2014-09-12 15:57:25 UTC
(In reply to Yixun Lan from comment #2)
> yes, we expect user to create additional symbol link.
> Also, I'd leave "mounting OSDs" out of ceph script, can you handle it with
> proper /etc/fstab? 
> 
> still, I've not seen what is the specific error? just mounting OSDs failure

Sorry, BugZilla didn't email me when you replied this time, only the subsequent time, so I couldn't reply expediently.

Ceph's approach to managing OSDs (and all daemons) is a bit more comprehensive than most init scripts because the Ceph ecosystem is complex. Having to update fstab, and make a bunch of symlinks, whenever one piece changes, is quite cumbersome and inelegant compared to what the Ceph init script already supports. Ceph is responsible for knowing the correct mount-time options, filesystem type, etc. for getting an OSD online. Adding that detail to fstab would be error-prone, at best, and could have data-loss consequences at worst.

As long as the Ceph init script is still shipped with the Ceph ebuild at /usr/lib/ceph/ceph_init.sh, there is no problem for me - the workaround is simple enough. But the changes you've made to the Gentoo init script will make it much more difficult for people new to Ceph to get up and running with Ceph on Gentoo.

Also, there is still a bug because the "--cluster" argument is not provided to daemons when they are started by the Gentoo-provided init script.
Comment 5 Yixun Lan archtester gentoo-dev 2014-09-13 14:10:56 UTC
after second thoughts, I re-opened this bug.

(In reply to Aaron Ten Clay from comment #4)
> Ceph's approach to managing OSDs (and all daemons) is a bit more
> comprehensive than most init scripts because the Ceph ecosystem is complex.
sigh.. ceph init is quite large and comprehensive, and their philosophy is "put all functions in one script(mon, msd, osd), make it work". while my initial motivation was to convert it to Gentoo style init script: do one thing in one script, make it clean and neat.
now, I couldn't say I've successfully made this.

> Having to update fstab, and make a bunch of symlinks, whenever one piece
> changes, is quite cumbersome and inelegant compared to what the Ceph init
> script already supports. Ceph is responsible for knowing the correct
> mount-time options, filesystem type, etc. for getting an OSD online. Adding
> that detail to fstab would be error-prone, at best, and could have data-loss
> consequences at worst.
your arguments are reasonable here, let us leave alone the ceph script's design philosophy. following upstream is always good, since they've already tested the script, so it should work out of the box.  

> 
> As long as the Ceph init script is still shipped with the Ceph ebuild at
> /usr/lib/ceph/ceph_init.sh, there is no problem for me - the workaround is
> simple enough. 
your workaround is exactly the same as old Gentoo init script (before I converted).

> But the changes you've made to the Gentoo init script will
> make it much more difficult for people new to Ceph to get up and running
> with Ceph on Gentoo.
I can restore the old ceph init script logic (which is ceph upstream's version), before we fully convert it to Gentoo style (huge work, potential out of sync with upstream's version).

sum of my plan
1) restore ceph upstream init script(/usr/lib/ceph/ceph_init.sh), make it default
2) keep current Gentoo init script logic if possible.
3) try to make it compatible with current init script, eg. link to ceph-osd.0 still works.
Comment 6 Aaron Ten Clay 2014-09-15 03:00:46 UTC
(In reply to Yixun Lan from comment #5)
> I can restore the old ceph init script logic (which is ceph upstream's
> version), before we fully convert it to Gentoo style (huge work, potential
> out of sync with upstream's version).
> 
> sum of my plan
> 1) restore ceph upstream init script(/usr/lib/ceph/ceph_init.sh), make it
> default
> 2) keep current Gentoo init script logic if possible.
> 3) try to make it compatible with current init script, eg. link to
> ceph-osd.0 still works.

I would love to see a more "Gentoo" approach, personally, and I applaud the efforts - unfortunately, I can't think of any way to simplify the process. I think Ceph will improve the init process over time, and maybe the Gentoo architecture can help guide that.

If you're interested, I would suggest haivng the Gentoo init script call the Ceph-distributed init script, and if there is a specific symlink (e.g. ceph-osd.0), then call the Ceph init script as '/usr/lib/ceph/ceph_init.sh <verb> osd.0', when the Gentoo script is invoked as e.g. '/etc/init.d/ceph-osd.0 <verb>'.

To clarify:
(Gentoo) '/etc/init.d/ceph <verb>' calls '/usr/lib/ceph/ceph_init.sh <verb>',
(Gentoo) '/etc/init.d/ceph-osd.0 <verb>' calls '/usr/lib/ceph/ceph_init.sh <verb> osd.0'

I believe this would satisfy both the upstream use case of one init script doing everything based on ceph.conf, as well as the Gentoo style of service-specific symlinks.

What I'm not sure about is how to incorporate the potential "--cluster" parameter, since that could be very important for some users. Maybe if there is a dot in the symlink before any dashes, that is the cluster name? e.g. 'ceph.<cluster>-osd.0' or 'ceph.<cluster>'? That follows the OpenVPN init symlink naming convention.

I'm happy to discuss further if you'd like. I frequent the Ceph IRC channel and mailing list, perhaps I can assist with the efforts. Just let me know.
Comment 7 Yixun Lan archtester gentoo-dev 2014-09-15 07:12:35 UTC
(In reply to Aaron Ten Clay from comment #6)
> 
> To clarify:
> (Gentoo) '/etc/init.d/ceph <verb>' calls '/usr/lib/ceph/ceph_init.sh <verb>',
> (Gentoo) '/etc/init.d/ceph-osd.0 <verb>' calls '/usr/lib/ceph/ceph_init.sh
> <verb> osd.0'
yeah, this is exact as I planed, just not sure if the ceph upstream init already support following:
'/usr/lib/ceph/ceph_init.sh> <verb> osd.0' -> one specific osd daemon

from my reading of the code it should support the use case of '/usr/lib/ceph/ceph_init.sh <verb> [mon|mds|osd]', it parse the /etc/ceph/ and get all ids of one specific type.

> 
> I believe this would satisfy both the upstream use case of one init script
> doing everything based on ceph.conf, as well as the Gentoo style of
> service-specific symlinks.
> 
> What I'm not sure about is how to incorporate the potential "--cluster"
> parameter, since that could be very important for some users. Maybe if there
> is a dot in the symlink before any dashes, that is the cluster name? e.g.
> 'ceph.<cluster>-osd.0' or 'ceph.<cluster>'? That follows the OpenVPN init
> symlink naming convention.
> 
I haven't looked this, but it sounds good to me.
is "--cluster" an option that can be switched on/off? we may control it via /etc/conf.d/ceph or something?

> I'm happy to discuss further if you'd like. I frequent the Ceph IRC channel
> and mailing list, perhaps I can assist with the efforts. Just let me know.
that's good, helps are always welcome!
if you willing to push this forward, I'd just say "go ahead", I'd more than happy to review and test it. thanks very much.
Comment 8 martha simons 2019-08-27 13:23:38 UTC
Created attachment 588264 [details]
554