OCRopus is a state-of-the-art document analysis and OCR system. Reproducible: Always
Created attachment 125286 [details] ebuild
Created attachment 125294 [details] ebuild
You should update this bug report a bit to reflect that it is already in Sunrise
Accepted into Sunrise app-text/ocropus-svn
By the way, there's now a alpha version (0.1.0) available. http://ocropus.googlecode.com/files/ocropus-0.1.0.tar.gz
Ebuild should depend of dev-util/ftjam instead of dev-util/jam, which was masked on 16 Mar 2008.
*** Bug 223023 has been marked as a duplicate of this bug. ***
Created attachment 163958 [details] ocropus-0.2.ebuild integrates Bug 223023 stuff cleanly
Created attachment 163959 [details, diff] files/ocropus-0.2-build.patch
Created attachment 170502 [details] ocropus-0.3.1.ebuild seems to have parallel build problems, but i havent really investigated it
I installed ocropus using attached ocropus-0.3.1.ebuild with default use-flags (sdl, spell). First observation: src_test() does not work, should be replaced with "make check || die" Second observation: I'm unable to recognize anything: av@snork$ ocroscript recognize text.png ocroscript: [string "require(arg_script_name)"]:1: module 'recognize' not found: no field package.preload['recognize'] no file '/usr/local/share/ocropus/scripts//recognize.lua' no file './recognize.so' no file '/usr/local/lib/lua/5.1/recognize.so' no file '/usr/local/lib/lua/5.1/loadall.so' I found recognize.lua here as /usr/share/ocropus/scripts/recognize.lua, but ocroscript did not look there. Should ebuild give --prefix to configure?
It has nothing to do with configure and --prefix. I found /usr/local/ paths hardcoded in ocroscript/ocrotoplevel.cc and in several other places. Exporting OCROSCRIPTS=/usr/share/ocropus/scripts/ helps ocroscript to find required files.
I had installed ocropus-svn from the sunrise overlay, but it only installed /usr/bin/ocropus So I decided to try out this ebuild to that scripts (lua, python) would work. I'm trying to install ocropus with openfst. There is a sci-misc/openfst from the science overlay which I've installed. There is also a simpler media-libs/openFST ebuild which I found by googling. Both ebuilds are for the same program from http://www.openfst.org. Trying to emerge ocropus with following USE flags: [ebuild N ] app-text/ocropus-0.3.1 USE="interactive lua openfst sdl spell" 0 kB [1] I get the following error: checking fst/lib/fst.h usability... no checking fst/lib/fst.h presence... no checking for fst/lib/fst.h... no checking for main in -lfst... yes configure: error: Could not find openFST! Choose --without-fst if you do not want to use it. !!! Please attach the following file when seeking support: !!! /var/tmp/portage/app-text/ocropus-0.3.1/work/ocropus-0.3/config.log * * ERROR: app-text/ocropus-0.3.1 failed. * Call stack: * ebuild.sh, line 49: Called src_compile * environment, line 2138: Called econf '--with-tesseract=/usr' '--with-iulib=/usr' '--with-fst' '--with-aspell' '--with-SDL' '--with-leptonica' * ebuild.sh, line 534: Called die
(In reply to comment #13) > I had installed ocropus-svn from the sunrise overlay, but it only installed > /usr/bin/ocropus > So I decided to try out this ebuild to that scripts (lua, python) would work. > I'm trying to install ocropus with openfst. > There is a sci-misc/openfst from the science overlay which I've installed. > There is also a simpler media-libs/openFST ebuild which I found by googling. > Both ebuilds are for the same program from http://www.openfst.org. > > Trying to emerge ocropus with following USE flags: > [ebuild N ] app-text/ocropus-0.3.1 USE="interactive lua openfst sdl spell" > 0 kB [1] > > I get the following error: > checking fst/lib/fst.h usability... no > checking fst/lib/fst.h presence... no > checking for fst/lib/fst.h... no > checking for main in -lfst... yes > configure: error: Could not find openFST! Choose --without-fst if you do not > want to use it. > > !!! Please attach the following file when seeking support: > !!! /var/tmp/portage/app-text/ocropus-0.3.1/work/ocropus-0.3/config.log > * > * ERROR: app-text/ocropus-0.3.1 failed. > * Call stack: > * ebuild.sh, line 49: Called src_compile > * environment, line 2138: Called econf '--with-tesseract=/usr' > '--with-iulib=/usr' '--with-fst' '--with-aspell' '--with-SDL' > '--with-leptonica' > * ebuild.sh, line 534: Called die > This is what openfst installs: # epm -ql openfst|grep \/lib /usr/bin/libfstmain.so /usr/include/fst/lib/fst.h /usr/include/fst/lib/arc.h /usr/include/fst/lib/compat.h /usr/include/fst/lib/properties.h /usr/include/fst/lib/register.h /usr/include/fst/lib/symbol-table.h /usr/include/fst/lib/util.h /usr/lib/libfst.so /usr/lib/libfstmain.so
ocropus 0.4 alpha 4 has been released.
Created attachment 201538 [details] ebuild building http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz I have successfully compiled OCRopus 0.4 on my gentoo system. I will also attach the ebuild for =media-libs/iulib-0.4 (emerge sync is still stuck to =media-libs/iulib-0.3) and two patch files necessary for these ebuilds.
Created attachment 201540 [details] ebuild to build the version 0.4 of media-libs/iulib, whose source are bundled with the source of OCRopus in http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz =app-text/ocropus-0.4.ebuild needs >=media-libs/iulib-0.4, and emerge sync only have <=media-libs/iulib-0.3 thus I made a special ebuild
Created attachment 201545 [details, diff] patch for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/ocropus/genAM.py This patch suppresses ocr-autoclean, according to http://www.mail-archive.com/ocropus@googlegroups.com/msg00541.html as a suggestion to answer to the error "make[1]: *** No rule to make target `ocr-autoclean/ocr-orientation.cc', needed by `ocr-orientation.o'. Stop."
Created attachment 201547 [details] patch used by =media-libs/iulib-0.4 for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/iulib/utils/dgraphics_nosdl.cc This patch allows ocropus and other users of iulib to operate when the sdl support of iulib was disabled (by the user or because configure thinked it did not work). Otherwise, if sdl is checked bad, libiulibs.a lacks dsection_set and dactive, breaking ocropus-0.4.ebuild and all other compilations include'ing /usr/include/iulib/iulib.h
The iulib build fails for me at automake: aclocal.m4:16: warning: this file was generated for autoconf 2.61. You have another version of autoconf. It may work, but is not guaranteed to. If you have problems, you may need to regenerate the build system entirely. To do so, use the procedure documented by the package, typically `autoreconf'. configure.ac:8: version mismatch. This is Automake 1.10.2, configure.ac:8: but the definition used by this AM_INIT_AUTOMAKE configure.ac:8: comes from Automake 1.10.1. You should recreate configure.ac:8: aclocal.m4 with aclocal and run automake again.
Replacing "eautomake" in the iulib ebuild with "eautoreconf" seems to do the trick, it builds fine. Now ocropus fails to build: x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"ocropus\" -DPACKAGE_TARNAME=\"ocropus\" -DPACKAGE_VERSION=\"0.3\" -DPACKAGE_STRING=\"ocropus\ 0.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"ocropus\" -DVERSION=\"0.3\" -DSTDC_HEADERS=1 -DHAVE_SYS_WAIT_H=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_FLOAT_H=1 -DHAVE_MALLOC_H=1 -DHAVE_STDINT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_WCHAR_H=1 -DHAVE_LIBZ=1 -DHAVE_LIBPNG=1 -DHAVE_LIBJPEG=1 -DHAVE_LIBTIFF=1 -DHAVE_LIBIULIB=1 -DHAVE_LIBPTHREAD=1 -DHAVE_LIBTESSERACT_FULL=1 -DHAVE_LIBGSLCBLAS=1 -DHAVE_LIBGSL=1 -DHAVE_LIBSDL=1 -DHAVE_LIBGOMP=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_DUP2=1 -DHAVE_MEMSET=1 -DHAVE_SQRT=1 -DHAVE_STRCHR=1 -DHAVE_STRDUP=1 -DHAVE_STRRCHR=1 -I. -I./include -I./ocr-utils -I/usr/include -I/usr/include/tesseract -I/usr/include/tesseract -DHAVE_TESSERACT -march=native -O2 -pipe -ggdb -fopenmp -Wall -Wno-sign-compare -Wno-write-strings -Wno-deprecated -march=native -O2 -pipe -ggdb -fopenmp -MT xml-entities.o -MD -MP -MF .deps/xml-entities.Tpo -c -o xml-entities.o `test -f './ocr-utils/xml-entities.cc' || echo './'`./ocr-utils/xml-entities.cc ./ocr-utils/xml-entities.cc: In function 'void ocropus::xml_unescape(colib::nustring&, const char*)': ./ocr-utils/xml-entities.cc:119: error: invalid conversion from 'const char*' to 'char*' make[1]: *** [xml-entities.o] Error 1 make[1]: Leaving directory `/var/tmp/paludis/app-text-ocropus-0.4/work/ocropus-0.4' make: *** [all-recursive] Error 1 /usr/libexec/paludis/utils/emake: emake returned error 2 !!! ERROR in app-text/ocropus-0.4: I will attach the build log.
Created attachment 201821 [details] Build log from failed ocropus-0.4.0 build Portage 2.2_rc38 (default/linux/amd64/2008.0/desktop, gcc-4.4.1, glibc-2.10.1-r0, 2.6.30.4 x86_64) ================================================================= System uname: Linux-2.6.30.4-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q9650_@_3.00GHz-with-gentoo-2.0.1 Timestamp of tree: Wed, 19 Aug 2009 21:45:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 4.0_p28 dev-java/java-config: 2.1.8-r1 dev-lang/python: 2.5.4-r2, 2.6.2-r1, 3.1.1 dev-util/ccache: 2.4-r8 dev-util/cmake: 2.6.4-r2 sys-apps/baselayout: 2.0.1 sys-apps/openrc: 0.4.3-r3 sys-apps/sandbox: 2.0 sys-devel/autoconf: 2.13, 2.63-r1 sys-devel/automake: 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2, 1.11 sys-devel/binutils: 2.19.51.0.14 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.30-r1 ACCEPT_KEYWORDS="amd64 ~amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -O2 -pipe -ggdb" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-march=native -O2 -pipe -ggdb" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests ccache distlocks fixpackages parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unmerge-logs unmerge-orphans userfetch" GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="en" MAKEOPTS="-j8" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/overlays/zugaina /usr/local/overlays/THE /usr/local/overlays/sunrise /usr/local/overlays/gnome /usr/local/overlays/desktop-effects /usr/local/overlays/vmware /usr/local/overlays/mozilla /usr/local/overlays/ikelos /usr/local/overlays/java-overlay /usr/local/overlays/berkano /usr/local/overlays/gcc-porting /usr/local/overlays/myoverlay" SYNC="rsync://rsync.us.gentoo.org/gentoo-portage" USE="X a52 aac acl acpi alsa amd64 amr avahi bash-completion berkdb bluetooth branding bzip2 cairo cdda cdr cli cracklib crypt css cups dbus dri dts dvd dvdr eds emboss encode esd evo expat fam ffmpeg flac fortran gdbm gif gnome gnome-keyring gpm gstreamer gtk hal iconv ipv6 isdnlog java jpeg lcms ldap libnotify mad mikmod mmx mono mp3 mp4 mpeg mudflap multilib nautilus ncurses networkmanager nls nptl nptlonly nsplugin ogg opengl openmp paludis pam pcre pdf perl pic png policykit ppds pppd pulseaudio python quicktime readline reflection ruby samba sdl session spell spl sqlite sse sse2 ssl startup-notification svg sysfs tcpd theora tiff tracker truetype unicode usb userlocales vcd vorbis x264 xcb xml xorg xulrunner xv xvid zlib" ALSA_CARDS="cmipci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="nvidia" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Created attachment 201823 [details] Output of "paludis --info ocropus"
I believe this would be one of the glibc 2.10 failures tracked in bug 270353 and explained in this blog posting from flameeyes: http://blog.flameeyes.eu/2009/05/24/c-libraries-galore
Created attachment 203042 [details] empty attachment created by mistake. I have added some more dependencies. It now compiles on amd64 and x86, with USE="interactive sdl spell -lua -openfst" and runs smoothly. Results are poor (90%) for foreign languages as fr. I am not sure that it is acually calling tesseract, as tesseract has better recognition rates. I am now looking in the source code. app-text/cuneiform/cuneiform-0.7.ebuild has recognition rates being either 99% (good) or 0% (it fails, produces a useless Hocr-html file)
Created attachment 203043 [details] media-libs/iulib-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz I have added some more dependencies. It now compiles on amd64 and x86, with USE="sdl" (but sdl support fails on my gentoo, iulib continue anyway and make ocropus happy, albeit with messages like "no image display, since dgraphics disabled in iulib").
Created attachment 203045 [details] app-text/ocropus-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz [this is a dupe, previous attachment was empty by mistake] I have added some more dependencies. It now compiles on amd64 and x86, with USE="interactive sdl spell -lua -openfst" and runs smoothly. Results are poor (90%) for foreign languages as fr. I am not sure that it is acually calling tesseract, as tesseract has better recognition rates. I am now looking in the source code. app-text/cuneiform/cuneiform-0.7.ebuild has recognition rates being either 99% (good) or 0% (it fails, produces a useless Hocr-html file).
#26 : this ebuild is for media-libs/iulib-0.4-r1.ebuild, not app-text/ocropus-0.4-r1.ebuild. The sources of iulib are nonetheless in http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz.
Created attachment 203050 [details] /var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log I did not have the problem, and I think indeed that #24 is right. I recompiled it with `ebuild /usr/local/portage/app-text/ocropus/ocropus-0.4-r1.ebuild install`, and provide my /var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log I do not have paludis. See emerge --info on next attachment. (I wrongly replaced ocropus-0.4-r1 in this attachment)
Created attachment 203051 [details] /var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log emerge --info. (I wrongly replaced ocropus-0.4-r1 by ocropus-0.4 in this attachment)
I do not know if (and how) my app-text/ocropus-0.4-r1.ebuild should take bug 270353 in account.
Created attachment 203053 [details] source file of utility to help ocropus-0.4-r1 produce pdf files. This is a small file so I put it right where it will be useful.
Created attachment 203059 [details] app-text/hocrtopdf-0.0.ebuild to create PDF using `ocropus buildhtml dir` This is an undocumented converter to the .pdf format from the .hocr format produced by app-text/ocropus-0.4-r1 on any image. Usage: hocrtopdf inputHocrFile fontHelveticaORCourier outputPdfFile imagedpix imagedpiy imagepixelwidth imagepixelheight oneifisotropic Example: converting an image to a pdf file, with optical character recognition: size=`file b.png | sed -e 's:^[^0-9]*::' -e 's:,.*::'` sudo emerge -nav '=app-text/ocropus-0.4-r1' '=sys-apps/file-4.23' rm -rf dir.ocropus ocropus book2pages dir.ocropus b.png ocropus pages2lines dir.ocropus ocropus lines2fsts dir.ocropus ocropus fsts2text dir.ocropus ocropus buildhtml dir.ocropus | tr 'A\012' '\012A' | sed 's% Transitional//ENA http://www%Transitional//EN"A "http://www%' | tr 'A\012' '\012A' > b.hocr hocrtopdf b.hocr Helvetica b.pdf 300 300 ${size%% x*} ${size##*x } 1 Bugs: The built-in helps and comments are completely outdated in hocrtopdf. This is a patched version of http://xplus3.net/downloads/HocrConverter.gz -- I still have to ask for an agreement. Output of this example: ~ $ size=`file b.png | sed -e 's:^[^0-9]*::' -e 's:,.*::'` ~ $ ocropus book2pages dir.ocropus b.png [info] page 1 ~ $ ocropus pages2lines dir.ocropus no image display, since dgraphics disabled in iulib [info] page 1 [info] #lines = 25 ~ $ ocropus lines2fsts dir.ocropus no image display, since dgraphics disabled in iulib [info] rate nan errs 0 ntrue 0 npred 0 lines 0 nogt 24 ~ $ ocropus fsts2text dir.ocropus [info] dir.ocropus/0001/0001.fst (0/24) [info] dir.ocropus/0001/0007.fst 99999996802856924650656260769173209088.000000 [info] dir.ocropus/0001/0018.fst 99999996802856924650656260769173209088.000000 ~ $ ocropus buildhtml dir.ocropus | tr 'A\012' '\012A' | sed 's% Transitional//ENA http://www%Transitional//EN"A "http://www%' | tr 'A\012' '\012A' > b.hocr ~ $ hocrtopdf b.hocr Helvetica b.pdf 300 300 ${size%% x*} ${size##*x } 1 /usr/lib/python2.6/site-packages/reportlab/pdfgen/canvas.py:17: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 ~ $ ls -l b.pdf -rw-r--r-- 1 me me 3399 Sep 3 15:55 b.pdf ~ $ file b.pdf b.pdf: PDF document, version 1.3 ~ $
I just asked for permission to patch on http://xplus3.net/2009/04/02/convert-hocr-to-pdf/#comment-663
Created attachment 203108 [details] app-text/hocrtopdf-0.1.tgz to help app-text/ocropus-0.4-r1 produce pdf files Ok, licence cleared (see http://xplus3.net/2009/04/02/convert-hocr-to-pdf/#comment-664 :-) and command-line help updated.
Created attachment 203111 [details] app-text/hocrtopdf-0.1.ebuild to make PDF from the html output of OCRopus version bump, with correct LICENCE and SRC_URI. hocrtopdf is alpha, so use GENTOO_MIRRORS="" emerge to use this ebuild.
(In reply to comment #29) > I did not have the problem, and I think indeed that #24 is right. I have the same issue with 0.4-r1. Per the links I provided this is an issue with glibc 2.10 only, your emerge --info shows glibc 2.5 so you will not encounter this issue.
Other than ocropus not building with lua enabled (another completely separate topic), everything built fine. However, when I run hocrtopdf, built from hocrtopdf-0.1.ebuild, I get this: $ hocrtopdf b.hocr Helvetica b.pdf 300 300 2550 3300 1 /usr/lib/python2.6/site-packages/reportlab/pdfgen/canvas.py:17: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 Traceback (most recent call last): File "/usr/local/bin/hocrtopdf", line 188, in <module> hocr = HocrConverter(sys.argv[1]) File "/usr/local/bin/hocrtopdf", line 58, in __init__ self.parse_hocr(hocrFileName) File "/usr/local/bin/hocrtopdf", line 103, in parse_hocr self.hocr.parse(hocrFileName) File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse parser.feed(data) File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: syntax error: line 3, column 5 I'm not sure what this is: xml.parsers.expat.ExpatError
I should have mentioned... hocrtopdf shows two dependencies. Here are the versions I'm using... dev-python/reportlab-2.1 dev-python/pyxml-0.8.4-r2
Your error message is "xml.parsers.expat.ExpatError: syntax error: line 3, column 5". Can you show me the beginning of the 3rd line of the file provided in first argument to hocrtopdf ?
Sorry, it was actually "column 59": xml.parsers.expat.ExpatError: syntax error: line 3, column 59 $ head -3 b.hocr <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
#41: Try this: sed -e '2s:$:":' -e '3s:h:"h:' -i b.hocr before using hocrtopdf. The patched b.hocr should then generate: $ head -3 b.hocr <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> This is a bug of ocropus that I correct each time with sed (see my comment #33). You may submit a patch to ocropus, precisely about line 199 of /var/tmp/portage/app-text/ocropus-0.4/work/ocropus-0.4/ocropus/ocr-commands/ocr-commands.cc
Created attachment 208505 [details, diff] Patch for building with glibc-2.10
Created attachment 208506 [details] Updated ebuild for building with glibc-2.10
Created attachment 231227 [details] build.log error Today it asked me to remerge ocropus and it gets this error in ebuilding it.
Created attachment 232215 [details] output of emerge -p --update --newuse --tree --deep ocropus @Silvio, from #45: "Today it asked me to remerge ocropus and it gets this error in ebuilding it." The last lines of your file build.log gives two undefined keywords, TIFFHeader and TIFF_VERSION. The search http://www.google.fr/search?q=TIFFHeader+%22tiff_version%22 gives me http://code.google.com/p/ocropus/issues/detail?id=112#c1 which contains a patch about TIFFHeaderClassic and TIFF_VERSION_CLASSIC. The search http://www.google.fr/search?q=TIFFHeaderClassic+%22TIFF_VERSION_CLASSIC%22 shows that this patch reflect a possible change of libtiff and media-libs/tiff. Please: (1) write here to say if this patch helped you. (2) write here which version of media-libs/tiff obsoleted TIFFHeader and TIFF_VERSION. If (1) did not work, you may execute emerge -p --update --newuse --tree --deep ocropus and compare your output to mine, attached here, and look for tiff and for lines close to ocropus lines. It may be better though to compare /var/log/emerge.log but I do not know how to do that. You may also post here your /var/log/emerge.log and the output of emerge --info
The patch solved the problem with changed libtiff. Thank you very much, Daa Jaa. (In reply to comment #46) > The search http://www.google.fr/search?q=TIFFHeader+%22tiff_version%22 gives me > http://code.google.com/p/ocropus/issues/detail?id=112#c1 which contains a patch > about > TIFFHeaderClassic and TIFF_VERSION_CLASSIC. >
Created attachment 232343 [details] Updated iulib ebuild using merucrial and scons
Created attachment 232345 [details] Updated ocropus ebuild using mercurial and scons
Created attachment 232347 [details, diff] Patch for changed libtiff
In response to Pavel https://bugs.gentoo.org/show_bug.cgi?id=185810#c49 : Could you precise the exact mercurial version you tested your ebuild against ? Current mercurial version of ocropus and iulib are 65011c70b3 e5f183e0bf, but I guess you tested a different version. Alternatively, is it hard to produce ebuilds based on a version of ocropus and iulib that one will be able to download in a few months also ?
Sorry to everyone for the duplicate comment. Shame on me.
I have 65011c70b3d7 and e5f183e0bf9e, so I guess I've tested against same versions. It should be not hard to modify this ebuilds to build concrete mercurial tag, as described at http://code.google.com/p/ocropus/wiki/InstallTranscript
Thank you very much Pavel!
Created attachment 238081 [details] Update to include sdl-image Installation will fail without media-libs/sdl-image enabled as well.
Note: I had to remove the libtiff patch to get this to work. Thanks everyone for these ebuilds!!
*** Bug 329063 has been marked as a duplicate of this bug. ***
We just learned that another ebuild was made by gpo-overlay. http://gpo.zugaina.org/app-text/ocropus Who is the maintainer of this ebuild ?
If we would have, this would have been in portage.
applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1 re bug 329063 (undefined SDL_MapRGB).
The ebuild has been removed from Sunrise.
Hi all, I failed building ocropus. I tried several attachments. Can someone post a tarball with ocrobus & hocrtopdf, iulib ebuilds and files (patches) that works please ? Just to untar them in one's own overlay and emerge.
Hello all Gentoo users interested in ocropus, This bug seems to have gone silent in 2011. Is no one using ocropus under Gentoo any more? I wonder if it's because of the linking error which seems to have persisted. Here are some comments documenting what I've done; please correct me wherever I've made a mistake! TIA. In emerging app-text/ocropus-0.4-r1, I cannot get past the libiulib.a(dgraphics.o) problems, thusly: ======== /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)': (.text+0x22b): undefined reference to `SDL_MapRGB' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)': (.text+0x23c): undefined reference to `SDL_FillRect' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)': (.text+0x25c): undefined reference to `SDL_UpdateRect' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dwait()': (.text+0x294): undefined reference to `SDL_WaitEvent' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dinit(int, int, bool)': (.text+0x5a1): undefined reference to `SDL_Init' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dinit(int, int, bool)': (.text+0x5b8): undefined reference to `SDL_SetVideoMode' /usr/lib/libiulib.a(dgraphics.o): In function `iulib::dflush()': (.text+0x2d8): undefined reference to `SDL_Flip' ======== I've tried Daa Jaa's utils.dgraphics_nosdl.cc.patch for iulib, updating its version to media-libs/iulib-0.4-r1 via his corresponding ebuild; please see Comment 26. However, the above error persists. It seems to me that this must be a problem with iulib ver. 0.4 being out of date in relation to what ocropus requires(?). I suppose my next step is to try the live mercurial builds posted by Pavel Denisov; please see Comment 48 and Comment 49, and also Comment 50. My overall goal is to build app-text/gscan2pdf (see http://gpo.zugaina.org/Search?search=gscan2pdf) with support for ocropus under Gentoo. Building it with support for cuneiform and tesseract seems to be working. Note that ocropus ver. 0.4 won't compile with tesseract ver. 3.0x. If I downgrade tesseract to the stable version, app-text/tesseract-2.04-r1, at least the ocropus source compiles, but then it fails to resolve references as above. Also note that tesseract ver. 3.0x depends on leptonica. However, media-libs/leptonica conflicts with app-text/tesseract-2.04-r1. When leptonica is installed (as far back as media-libs/leptonica-1.62) compilation of tesseract-2.04-r1 fails like so: ======== leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetHalftoneMask(Pix*, Pix**, Boxa**, Pixa**, bool)': leptonica_pageseg.cpp:69: error: 'int32' was not declared in this scope leptonica_pageseg.cpp:69: error: expected ';' before 'debug' leptonica_pageseg.cpp:73: error: 'debug' was not declared in this scope leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetTextlineMask(Pix*, Pix**, Pix**, Boxa**, Pixa**, bool)': leptonica_pageseg.cpp:139: error: 'int32' was not declared in this scope leptonica_pageseg.cpp:139: error: expected ';' before 'debug' leptonica_pageseg.cpp:143: error: 'debug' was not declared in this scope leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetTextblockMask(Pix*, Pix**, Boxa**, Pixa**, bool)': leptonica_pageseg.cpp:211: error: 'int32' was not declared in this scope leptonica_pageseg.cpp:211: error: expected ';' before 'debug' leptonica_pageseg.cpp:220: error: 'debug' was not declared in this scope leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetAllRegions(Pix*, Pix**, Pix**, Pix**, bool)': leptonica_pageseg.cpp:273: error: 'int32' was not declared in this scope leptonica_pageseg.cpp:273: error: expected ';' before 'w' leptonica_pageseg.cpp:274: error: 'w' was not declared in this scope leptonica_pageseg.cpp:274: error: 'h' was not declared in this scope leptonica_pageseg.cpp:275: error: expected ';' before 'debug' leptonica_pageseg.cpp:288: error: 'debug' was not declared in this scope leptonica_pageseg.cpp:293: error: 'debug' was not declared in this scope leptonica_pageseg.cpp:298: error: 'debug' was not declared in this scope leptonica_pageseg.cpp:302: error: 'debug' was not declared in this scope leptonica_pageseg.cpp:311: error: 'debug' was not declared in this scope leptonica_pageseg.cpp:320: error: 'debug' was not declared in this scope /usr/include/liblept/leptprotos.h:553: error: too few arguments to function 'PIX* pixRenderRandomCmapPtaa(PIX*, PTAA*, l_int32, l_int32, l_int32)' leptonica_pageseg.cpp:322: error: at this point in file leptonica_pageseg.cpp:332: error: 'debug' was not declared in this scope ======== These dependency conflicts need to be fixed in the various ebuilds. As possible, I'll try to make the needed fixes and post patches (opening new bugs as appropriate). (In reply to comment #61) > applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1 > re bug 329063 (undefined SDL_MapRGB). I found this comment intriguing. I did a couple of quick searches but never was able to find a patch called something like "no-as-needed" to fix the undefined SDL_MapRGB problem. If you're reading this, Leho Kraav, could you please provide us a pointer so that we can better follow your Comment 61? TIA. Clemmitt
Created attachment 293377 [details] ocropus-0.4-r1.ebuild, with no-as-needed > (In reply to comment #61) > > applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1 > > re bug 329063 (undefined SDL_MapRGB). > > I found this comment intriguing. I did a couple of quick searches but never > was able to find a patch called something like "no-as-needed" to fix the > undefined SDL_MapRGB problem. If you're reading this, Leho Kraav, could you > please provide us a pointer so that we can better follow your Comment 61? TIA. while i've put in zero play with ocropus in the past year and therefore don't really know about the current state of affairs, "no-as-needed" means adding this to your ebuild: pkg_setup() { append-ldflags $(no-as-needed) } i'm attaching the ebuild i have in my overlay sitting from last year.
(In reply to comment #66) > Created attachment 293377 [details] > ocropus-0.4-r1.ebuild, with no-as-needed > > "no-as-needed" means adding this to your ebuild: > > pkg_setup() { > append-ldflags $(no-as-needed) > } > > i'm attaching the ebuild i have in my overlay sitting from last year. Ah, the wonders of Gentoo! On what other system could you fix a package build problem so simply and cleanly?! Using this ebuild with the no-as-needed setting, ocropus now builds. I can run 'ocropus page sample.png' (sample.tif, etc.), but the accuracy of conversion to text is quite inferior to the output directly produced by cuneiform-1.1.0, and tesseract-2.04-r1 or tesseract-3.01. (Caveat: I don't fully understand ocropus. It appears to be a powerful and somewhat complex beast. I believe one needs to employ a workflow of the various ocropus commands to do the job properly; it may also need training, of course.) As I'm an over-perfectionist, I've made a couple of tiny changes to the ebuild Leho just posted. I'll attach them as a patch against his ebuild next. With app-text/cuneiform-1.1.0, app-text/tesseract-2.04-r1, and app-text/ocropus-0.4-r1 installed, I've emerged app-text/gscan2pdf-1.0.0-r1 with USE flags adf, unpaper, xdg, cuneiform, tesseract and ocropus. I've run gscan2pdf; it uses cuneiform and tesseract for OCR, but even with the proper USE flag turned on it doesn't offer ocropus as an OCR back-end option. (Also, the display within gscan2pdf of the output from cuneiform and tesseract leaves something to be desired, but that may be due to the fact that I don't know what I'm doing.) Thanks to all for input on this ebuild. As I'm able, I'll post more info on other ebuilds I've patched and open new bugs on them as appropriate. Clemmitt
Created attachment 293413 [details, diff] Small patch to Leho Kraav's ebuild posted in Comment 66.
you might want to read up on as-needed [1]. i think it was kind of a two way street re who exactly broke who when, but i guess as of today at the very latest it's safe to say that the package is more broken, since mostly everything else now builds with as-needed. other than that, always good to see someone have interest in a obscure but cool and useful package and be willing to put in some work to figure stuff out. [1]: http://www.gentoo.org/proj/en/qa/asneeded.xml
Created attachment 294049 [details] source files for app-text/image2text-pdf-0.0.1.ebuild Hello, congratulation Clemmitt for all your efforts. I spent time to correct the recognition rate problems of ocropus. As you seems to spend resources about this, I show you the result. This is NOT EASY to read, but is useful very often to me. The ebuild will follow.
Created attachment 294051 [details] source files for app-text/image2text-pdf-0.0.1.ebuild CORRECT TAR FILE. Sorry, anonymizing .tar format is not straightforward.
Created attachment 294053 [details] app-text/image2text-pdf-0.0.1.ebuild makes OCRopus, tesseract, and cuneiform vote; Automated in a particular multi-image setting. Ebuild correcting the recognition rate problems of ocropus, by collecting votes from OCRopus, tesseract, and cuneiform, for every line framed by OCRopus. It is useful very often to me. Coding style is awful: language is /bin/bash, comments are scarce and in French, ... Warning, only texted with app-text/ocropus-0.4 (as stated in .ebuild), please report here adaptations to newer versions of ocropus. Put this file in /usr/local/portage/image2text-pdf/image2text-pdf-0.0.1.ebuild and use it with `PORTDIR_OVERLAY=/usr/local/portage emerge -av app-text/image2text-pdf`. You may need to execute `echo '>=app-text/image2text-pdf-0.0 ~amd64' >> /etc/portage/package.keywords` once at first, and also download my /usr/local/portage/app-text/image2text-pdf/Manifest provided as next attachment.
Created attachment 294055 [details] /usr/local/portage/app-text/image2text-pdf/Manifest contains checksums for two previous attachments Checksums you may want to put in /usr/local/portage/app-text/image2text-pdf/Manifest
Created attachment 294057 [details] app-text/scan4image2text-pdf-0.0.0.tgz, automation of multi-image setting according to what is typed on USB-linked to HP LaserJet 3020. Needs app-text/image2text-pdf installed on another computer. I also choose to release shell codes that reads the numerical keyboard of HP LaserJet 3020, and interprets it, making automated scans and then feeding app-text/image2text-pdf. app-text/image2text-pdf may be installed on another more powerful computer. CAREFUL: coding style is awful; I made no ebuild for that. It is useful very often to me. This may be useful to interpret the usage of /usr/local/bin/.post.scan that is packaged in app-text/image2text-pdf
I forgot to mention that there is a bounty (1000 $) to make tesseract report for pixel position of every recognized character. Contact me for more information.
ocropus-0.5.4 has been released in the meanwhile. Anyone here still interested in this?
I'm playing with it. It seems to bring custom version of media-libs/iulib, but I haven't got time yet to figure out the differences.
Hi all, OCRopus 0.6 is available. from upstream : "It features much simpler installation, fewer dependencies, and improved character recognition rates. This is the first all-Python release." Installation procedure : " $ hg clone -r ocropus-0.6 https://code.google.com/p/ocropus $ cd ocropus/ocropy $ sudo apt-get install $(cat PACKAGES) $ python setup.py download_models $ sudo python setup.py install $ ./run-test " I tried to adapt the ebuild 0.4. As far of my small knowledge, I used : EHG_REPO_URI="https://code.google.com/p/ocropus/" SRC_URI="" I can create the manifest, but when I emerge, I get the error : * ERROR: app-text/ocropus-0.6 failed (unpack phase): * Nothing passed to the 'unpack' command * * Call stack: * ebuild.sh, line 85: Called src_unpack * environment, line 2860: Called unpack * phase-helpers.sh, line 261: Called die * The specific snippet of code: * [ -z "$*" ] && die "Nothing passed to the 'unpack' command" But sure it would be a piece of cake for someone with ebuild knowledge, especially with taking into account it is now a full python app. If someone can do it.....
Created attachment 380022 [details] ebuild for ocropus 0.7 with source git ebuild for ocropus 0.7 with source git. this is my first python/git-ebuild. i'm open for every optimization ;)