New Ceph major version 10.2.0 codename Jewel has been released. CephFS is now considered stable and production ready. Reproducible: Always
Created attachment 432202 [details] testing ebuild for ceph 10.2.0 I am testing this ebuild out now... The memory requirement isn't a hard requirement, so I've disabled it. I've found that somewhere between 1-2 GB of ram is need, if you limit the compilation to 1 job at a time (-j1)...
(The ebuild is a straight copy from 10.0.2, with the memory requirement disabled) FYI
Created attachment 432204 [details] build err I'm building 10.2.0 at my local machine, but have compiling err. something like this (not sure if it related to gcc-5.x ..) : gw/ceph_dencoder-rgw_dencoder.o: In function `RGWZoneGroup::generate_test_instances(std::list<RGWZoneGroup*, std::allocator<RGWZoneGroup*> >&)': /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/rgw_rados.h:1096: undefined reference to `vtable for RGWZoneGroup' rgw/ceph_dencoder-rgw_dencoder.o: In function `RGWZoneParams::generate_test_instances(std::list<RGWZoneParams*, std::allocator<RGWZoneParams*> >&)': /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/rgw_rados.h:871: undefined reference to `vtable for RGWZoneParams' rgw/ceph_dencoder-rgw_json_enc.o: In function `decode_zonegroups(std::map<std::string, RGWZoneGroup, std::less<std::string>, std::allocator<std::pair<std::string const, RGWZ oneGroup> > >&, JSONObj*)': /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/rgw_rados.h:1096: undefined reference to `vtable for RGWZoneGroup' rgw/ceph_dencoder-rgw_json_enc.o: In function `RGWSystemMetaObj::~RGWSystemMetaObj()':
for the memory/disk restriction, I'm already aware that ... but unfortunately, we only have kind of *hard* restriction. (user only have to resort to copy-and-modify the ebuild on his own overlay) I would like to make it kind of *soft*, default enable the restriction, but can be overrided by user variables (if they know what they are doing)? something like: NO_MEMORY_DISK_CHECK=1 MAKEOPTS="-j1" USE="blah blah" emerge =ceph-10.2.0
That error is not new... See this thread, happens with 4.9.2: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/29957 I haven't hit the rgw code in my compile yet... I don't have the radowgw flag turned on, do you? I have jemalloc, libaio, nss, python_targets_python2_7.
(In reply to Yixun Lan from comment #4) > for the memory/disk restriction, I'm already aware that ... > but unfortunately, we only have kind of *hard* restriction. > (user only have to resort to copy-and-modify the ebuild on his own overlay) > > I would like to make it kind of *soft*, default enable the restriction, but > can be overrided by user variables (if they know what they are doing)? > > something like: NO_MEMORY_DISK_CHECK=1 MAKEOPTS="-j1" USE="blah blah" emerge > =ceph-10.2.0 You can force compiling to be single job (-j1)... https://devmanual.gentoo.org/eclass-reference/ebuild/#lbAP "emake -j1" I bet you could wrap that with a if statement...
(In reply to Yixun Lan from comment #3) > Created attachment 432204 [details] > build err > > I'm building 10.2.0 at my local machine, but have compiling err. > > something like this (not sure if it related to gcc-5.x ..) : > > gw/ceph_dencoder-rgw_dencoder.o: In function > `RGWZoneGroup::generate_test_instances(std::list<RGWZoneGroup*, > std::allocator<RGWZoneGroup*> >&)': > /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/ > rgw_rados.h:1096: undefined reference to `vtable for RGWZoneGroup' > rgw/ceph_dencoder-rgw_dencoder.o: In function > `RGWZoneParams::generate_test_instances(std::list<RGWZoneParams*, > std::allocator<RGWZoneParams*> >&)': > /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/ > rgw_rados.h:871: undefined reference to `vtable for RGWZoneParams' > rgw/ceph_dencoder-rgw_json_enc.o: In function > `decode_zonegroups(std::map<std::string, RGWZoneGroup, > std::less<std::string>, std::allocator<std::pair<std::string const, RGWZ > oneGroup> > >&, JSONObj*)': > /var/notmpfs/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/rgw/ > rgw_rados.h:1096: undefined reference to `vtable for RGWZoneGroup' > rgw/ceph_dencoder-rgw_json_enc.o: In function > `RGWSystemMetaObj::~RGWSystemMetaObj()': I just hit (20 hours after starting emerge lol) this same error, gcc 4.9.3, while building src/.libs/ceph-dencoder Testing the needing radosgw flag theory (without discarding failed compilation results): 1. Manually verify that dependencies for radowgw are installed (fcgi, expat, and curl) 2. # cd /var/tmp/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0 3. Run the original configure string (found in config.log), after replacing '--without-radosgw' with '--with-radosgw' 4. # make This got me further until an undefined reference to various ldap calls... could be how to was testing, so I'm restarting from the beginning with the radowgw flag enabled. Dyweni
(In reply to Dyweni from comment #7) > > This got me further until an undefined reference to various ldap calls... > could be how to was testing, so I'm restarting from the beginning with the > radowgw flag enabled. > Okay, next error from a fresh 'emerge -qv1 =sys-cluster/ceph-10.2.0' with flags: jemalloc, libaio, nss, python_targets_python2_7, and radowgw /bin/sh ../libtool --tag=CXX --mode=link i686-pc-linux-gnu-g++ -I/usr/include/nss -I/usr/include/nspr -Wall -Wtype-limits -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char -rdynamic -ftemplate-depth-1024 -Wnon-virtual-dtor -Wno-invalid-offsetof -O2 -g -pipe -Wall -Wp,-U_FORTIFY_SOURCE -Wp,-D_FORTIFY_SOURCE=2 -fexceptions --param=ssp-buffer-size=4 -fPIE -fstack-protector-strong -Wstrict-null-sentinel -march=native -O2 -pipe -std=gnu++11 -Wl,--as-needed -pie -Wl,-z,relro -Wl,-z,now -Wl,-O1 -Wl,--as-needed -o radosgw rgw/rgw_fcgi_process.o rgw/rgw_loadgen_process.o rgw/rgw_civetweb.o rgw/rgw_civetweb_frontend.o rgw/rgw_civetweb_log.o civetweb/src/radosgw-civetweb.o rgw/rgw_main.o librgw.la -ljemalloc libcivetweb.la -lssl -lcrypto librados.la libcls_rgw_client.la libcls_log_client.la libcls_statelog_client.la libcls_timeindex_client.la libcls_user_client.la libcls_replica_log_client.la libcls_lock_client.la libcls_refcount_client.la libcls_version_client.la -lcurl -lexpat -lm -lfcgi -ldl -lresolv libglobal.la libcommon.la -luuid -lpthread -lm -lssl3 -lsmime3 -lnss3 -lnssutil3 -lplds4 -lplc4 -lnspr4 -lm -lrt -lboost_iostreams-mt -lboost_system-mt libtool: link: i686-pc-linux-gnu-g++ -I/usr/include/nss -I/usr/include/nspr -Wall -Wtype-limits -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char -rdynamic -ftemplate-depth-1024 -Wnon-virtual-dtor -Wno-invalid-offsetof -O2 -g -pipe -Wall -Wp,-U_FORTIFY_SOURCE -Wp,-D_FORTIFY_SOURCE=2 -fexceptions --param=ssp-buffer-size=4 -fPIE -fstack-protector-strong -Wstrict-null-sentinel -march=native -O2 -pipe -std=gnu++11 -pie -Wl,-z -Wl,relro -Wl,-z -Wl,now -Wl,-O1 -o .libs/radosgw rgw/rgw_fcgi_process.o rgw/rgw_loadgen_process.o rgw/rgw_civetweb.o rgw/rgw_civetweb_frontend.o rgw/rgw_civetweb_log.o civetweb/src/radosgw-civetweb.o rgw/rgw_main.o -Wl,--as-needed ./.libs/librgw.so /var/tmp/portage/sys-cluster/ceph-10.2.0/work/ceph-10.2.0/src/.libs/librados.so -ljemalloc ./.libs/libcivetweb.a -lssl -lcrypto ./.libs/librados.so ./.libs/libcls_rgw_client.a ./.libs/libcls_log_client.a ./.libs/libcls_statelog_client.a ./.libs/libcls_timeindex_client.a ./.libs/libcls_user_client.a ./.libs/libcls_replica_log_client.a ./.libs/libcls_lock_client.a ./.libs/libcls_refcount_client.a ./.libs/libcls_version_client.a -lcurl -lexpat /usr/lib/libfcgi.so -lnsl -lresolv ./.libs/libglobal.a ./.libs/libcommon.a -ldl -lboost_thread-mt -lboost_random-mt -lblkid -luuid -lpthread -lssl3 -lsmime3 -lnss3 -lnssutil3 -lplds4 -lplc4 -lnspr4 -lm -lrt -lboost_iostreams-mt -lboost_system-mt -pthread ./.libs/librgw.so: undefined reference to `ldap_get_dn' ./.libs/librgw.so: undefined reference to `ldap_search_s' ./.libs/librgw.so: undefined reference to `ldap_memfree' ./.libs/librgw.so: undefined reference to `ldap_initialize' ./.libs/librgw.so: undefined reference to `ldap_msgfree' ./.libs/librgw.so: undefined reference to `ldap_first_entry' ./.libs/librgw.so: undefined reference to `ldap_unbind' ./.libs/librgw.so: undefined reference to `ldap_simple_bind_s' collect2: error: ld returned 1 exit status Makefile:16818: recipe for target 'radosgw' failed
(In reply to Dyweni from comment #8) > ./.libs/librgw.so: undefined reference to `ldap_get_dn' > ./.libs/librgw.so: undefined reference to `ldap_search_s' > ./.libs/librgw.so: undefined reference to `ldap_memfree' > ./.libs/librgw.so: undefined reference to `ldap_initialize' > ./.libs/librgw.so: undefined reference to `ldap_msgfree' > ./.libs/librgw.so: undefined reference to `ldap_first_entry' > ./.libs/librgw.so: undefined reference to `ldap_unbind' > ./.libs/librgw.so: undefined reference to `ldap_simple_bind_s' > These errors don't recur when I build with -j3 as opposed to -j1... This will need to be investigated.
Towards the end, I get a lot of sandbox violations from pip... The directory '/var/tmp/portage/sys-cluster/ceph-10.2.0/homedir/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/var/tmp/portage/sys-cluster/ceph-10.2.0/homedir/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/var/tmp/portage/sys-cluster/ceph-10.2.0/homedir/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/var/tmp/portage/sys-cluster/ceph-10.2.0/homedir/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting pip>=6.1 Collecting pip>=6.1 Downloading pip-8.1.1-py2.py3-none-any.whl (1.2MB) 1% | | 12kB 27.2MB/s eta 0:00:01 Downloading pip-8.1.1-py2.py3-none-any.whl (1.2MB) 100% |################################| 1.2MB 410kB/s 100% |################################| 1.2MB 413kB/s Installing collected packages: pip Found existing installation: pip 7.1.2 Installing collected packages: pip Found existing installation: pip 7.1.2 Uninstalling pip-7.1.2: Uninstalling pip-7.1.2: * ACCESS DENIED: rename: /usr/bin/pip * ACCESS DENIED: rename: /usr/bin/pip * ACCESS DENIED: unlink: /usr/bin/pip * ACCESS DENIED: unlink: /usr/bin/pip Exception: Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/pip/basecommand.py", line 211, in main status = self.run(options, args) File "/usr/lib64/python2.7/site-packages/pip/commands/install.py", line 311, in run root=options.root_path, File "/usr/lib64/python2.7/site-packages/pip/req/req_set.py", line 640, in install requirement.uninstall(auto_confirm=True) File "/usr/lib64/python2.7/site-packages/pip/req/req_install.py", line 716, in uninstall paths_to_remove.remove(auto_confirm) File "/usr/lib64/python2.7/site-packages/pip/req/req_uninstall.py", line 125, in remove renames(path, new_path) File "/usr/lib64/python2.7/site-packages/pip/utils/__init__.py", line 315, in renames shutil.move(old, new) File "/usr/lib64/python2.7/shutil.py", line 303, in move os.unlink(src) OSError: [Errno 13] Permission denied: '/usr/bin/pip' Exception: Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/pip/basecommand.py", line 211, in main status = self.run(options, args) File "/usr/lib64/python2.7/site-packages/pip/commands/install.py", line 311, in run root=options.root_path, File "/usr/lib64/python2.7/site-packages/pip/req/req_set.py", line 640, in install requirement.uninstall(auto_confirm=True) File "/usr/lib64/python2.7/site-packages/pip/req/req_install.py", line 716, in uninstall paths_to_remove.remove(auto_confirm) File "/usr/lib64/python2.7/site-packages/pip/req/req_uninstall.py", line 125, in remove renames(path, new_path) File "/usr/lib64/python2.7/site-packages/pip/utils/__init__.py", line 315, in renames shutil.move(old, new) File "/usr/lib64/python2.7/shutil.py", line 303, in move os.unlink(src) OSError: [Errno 13] Permission denied: '/usr/bin/pip'
I also saw this right before the pip errors... cd ./ceph-detect-init ; ../tools/setup-virtualenv.sh /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-detect-init-virtualenv ; test -d wheelhouse && export NO_INDEX=--no-index ; /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-detect-init-virtualenv/bin/pip install $NO_INDEX --use-wheel --find-links=file://$(pwd)/wheelhouse -e . ../tools/setup-virtualenv.sh: line 21: virtualenv: command not found ../tools/setup-virtualenv.sh: line 22: /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-detect-init-virtualenv/bin/activate: No such file or directory cd ./ceph-disk ; ../tools/setup-virtualenv.sh /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-disk-virtualenv ; test -d wheelhouse && export NO_INDEX=--no-index ; /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-disk-virtualenv/bin/pip install $NO_INDEX --use-wheel --find-links=file://$(pwd)/wheelhouse -e . ../tools/setup-virtualenv.sh: line 21: virtualenv: command not found ../tools/setup-virtualenv.sh: line 22: /var/tmp/portage/sys-cluster/ceph-10.2.0/temp/ceph-disk-virtualenv/bin/activate: No such file or directory
Created attachment 432504 [details] updated ebuild with emake -j1 This ebuild tests using emake -j3 (to limit memory usage...) Found that for the major components, cc1plus can take up to 2.5GB ram each! NOTE: RocksDB will spawn -j<max cpu count> regardless of any -j3 parameter given to emake. This has been reported on the ceph users mailing list.
(In reply to Dyweni from comment #9) > (In reply to Dyweni from comment #8) > > ./.libs/librgw.so: undefined reference to `ldap_get_dn' > > ./.libs/librgw.so: undefined reference to `ldap_search_s' > > ./.libs/librgw.so: undefined reference to `ldap_memfree' > > ./.libs/librgw.so: undefined reference to `ldap_initialize' > > ./.libs/librgw.so: undefined reference to `ldap_msgfree' > > ./.libs/librgw.so: undefined reference to `ldap_first_entry' > > ./.libs/librgw.so: undefined reference to `ldap_unbind' > > ./.libs/librgw.so: undefined reference to `ldap_simple_bind_s' > > > > > These errors don't recur when I build with -j3 as opposed to -j1... This > will need to be investigated. I have confirmed. If MAKEOPTS="-j1" this occurs. If MAKEOPTS="-j2" (or higher), this does not. I have reported this on the Ceph-Users mailing list. Dyweni
Created attachment 432608 [details] added dev-python/virtualenv dependency I am re-testing with the virtualenv dependency added and installed. The pip upgrade to 8.1.1 was triggered by Ceph's build system trying to self-install the virtualenv python package. Also, I removed the src_compile function, as -j1 triggers the ldap dependency error.
After fixing dependencies for virutalenv, now tonight I ended up with two ldap errors (because compiling with -j2)... ./.libs/librgw.so: undefined reference to `ldap_get_dn' ./.libs/librgw.so: undefined reference to `ldap_search_s' ./.libs/librgw.so: undefined reference to `ldap_memfree' ./.libs/librgw.so: undefined reference to `ldap_initialize' ./.libs/librgw.so: undefined reference to `ldap_msgfree' ./.libs/librgw.so: undefined reference to `ldap_first_entry' ./.libs/librgw.so: undefined reference to `ldap_unbind' ./.libs/librgw.so: undefined reference to `ldap_simple_bind_s' collect2: error: ld returned 1 exit status Makefile:16826: recipe for target 'radosgw-admin' failed make[3]: *** [radosgw-admin] Error 1 make[3]: *** Waiting for unfinished jobs.... ./.libs/librgw.so: undefined reference to `ldap_get_dn' ./.libs/librgw.so: undefined reference to `ldap_search_s' ./.libs/librgw.so: undefined reference to `ldap_memfree' ./.libs/librgw.so: undefined reference to `ldap_initialize' ./.libs/librgw.so: undefined reference to `ldap_msgfree' ./.libs/librgw.so: undefined reference to `ldap_first_entry' ./.libs/librgw.so: undefined reference to `ldap_unbind' ./.libs/librgw.so: undefined reference to `ldap_simple_bind_s' collect2: error: ld returned 1 exit status Makefile:16818: recipe for target 'radosgw' failed make[3]: *** [radosgw] Error 1
Created attachment 432990 [details] updated ebuild This ebuild version fixes all outstanding issues. It compiles successfully (-j2) on 64bit amd. I am testing (-j1) now on 32bit amd. We should still add a test for memory requirements based on the # of jobs (-j) that portage assigns. Test could be <TotalRAM+SWAP-1G> >= <-jN>*2.5G. What do you think? If the test fails, we issue a warning about compiling with too many parallel jobs.
(In reply to Dyweni from comment #16) > Created attachment 432990 [details] > updated ebuild > > This ebuild version fixes all outstanding issues. It compiles successfully > (-j2) on 64bit amd. I am testing (-j1) now on 32bit amd. > > We should still add a test for memory requirements based on the # of jobs > (-j) that portage assigns. Test could be <TotalRAM+SWAP-1G> >= <-jN>*2.5G. > What do you think? If the test fails, we issue a warning about compiling > with too many parallel jobs. This compiles fine (-j1) on 32bit amd. I've upgraded my monitor to 10.2.0. Now upgrading my OSDs.
(In reply to Dyweni from comment #17) > This compiles fine (-j1) on 32bit amd. I've upgraded my monitor to 10.2.0. > Now upgrading my OSDs. I am also testing the compilation on 32bit ARM. (Odroid XU4).
Done https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=02e633365fd994d9601fc84ddd742a6b20742e1e