https://blogs.gentoo.org/ago/2020/07/04/gentoo-tinderbox/ Issue: dev-python/pytesseract-0.3.12 fails tests. Discovered on: amd64 (internal ref: clang-lld_tinderbox) System: CLANG-LLD (https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#CLANG-LLD) Info about the issue: https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#CF0015
Created attachment 884829 [details] build.log build log and emerge --info
Error(s) that match a know pattern: E pytesseract.pytesseract.TesseractError: (127, "read_params_file: Can't open tessedit_create_boxfile=1 read_params_file: Can't open tessedit_create_hocr=1 tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num") E pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304 tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') E pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 333 tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') E pytesseract.pytesseract.TesseractError: (127, 'Page 0 : ./tests/data/test.jpg tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') E pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') E pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') FAILED tests/pytesseract_test.py::test_image_to_alto_xml - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_boxes - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_data_common_output[bytes] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_data_common_output[dict] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_data_common_output[string] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_pdf_or_hocr[hocr] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_pdf_or_hocr[pdf] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_batch - pytesseract.pytesseract.TesseractError: (127, 'Page 0 : ./tests/data/test.j... FAILED tests/pytesseract_test.py::test_image_to_string_european - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_multiprocessing - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_with_args_type[image_object] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_image_to_string_with_args_type[path_str] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[gif] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[jpeg2000] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[jpg] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[pgm] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[png] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[ppm] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[tiff] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... FAILED tests/pytesseract_test.py::test_image_to_string_with_image_type[webp] - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 304... FAILED tests/pytesseract_test.py::test_la_image_to_string - pytesseract.pytesseract.TesseractError: (127, 'Estimating resolution as 333... FAILED tests/pytesseract_test.py::test_run_and_get_multiple_output[extensions0] - pytesseract.pytesseract.TesseractError: (127, "read_params_file: Can't open... FAILED tests/pytesseract_test.py::test_run_and_get_multiple_output[extensions1] - pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup err... pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num') E pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num')
(In reply to Agostino Sarubbo from comment #2) > Error(s) that match a know pattern: > > > E pytesseract.pytesseract.TesseractError: (127, > "read_params_file: Can't open tessedit_create_boxfile=1 read_params_file: > Can't open tessedit_create_hocr=1 tesseract: symbol lookup error: > /usr/lib64/libtesseract.so.5: undefined symbol: __kmpc_global_thread_num") The errors seem related to app-text/tesseract. I'm not able to reproduce. Maybe pytesseract need to add some library?
It looks like the standard weird clang/lld weirdness with openmp. Not tesseract specific.
(In reply to Sam James from comment #4) > It looks like the standard weird clang/lld weirdness with openmp. Not > tesseract specific. or, well, it might be, but if it is, it's app-text/tesseract which needs fixing.
I rebuilt tesseract # CC="clang" CXX="clang++" LDFLAGS="${LDFLAGS} -fuse-ld=lld" emerge -av1 tesseract [ebuild R ] app-text/tesseract-5.3.4:0/5::gentoo USE="float32 jpeg openmp png tiff webp -doc -opencl -static-libs -training" ABI_X86="32 (64) (-x32)" # strings /usr/lib64/libtesseract.so.5|grep __kmpc_global_thread_num __kmpc_global_thread_num But I don't get openmp undefined symbol error and pytesseract tests pass
I'm getting the same problem from tesseract-5.0.4 compiled with clang - nothing to do with python in my case. I see that this very issue gets tesseract into the system-wide clang bug 408963. Compiling tesseract with gcc cures the problem.