Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 185810 - app-text/ocropus (new package)
Summary: app-text/ocropus (new package)
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High enhancement with 1 vote (vote)
Assignee: Default Assignee for New Packages
URL: http://code.google.com/p/ocropus/
Whiteboard:
Keywords: EBUILD
: 223023 329063 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-07-18 21:26 UTC by Bradford Folkens
Modified: 2014-07-01 01:04 UTC (History)
12 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
ebuild (ocropus-svn-9999.ebuild,742 bytes, text/plain)
2007-07-18 21:29 UTC, Bradford Folkens
Details
ebuild (ocropus-svn-9999.ebuild,808 bytes, text/plain)
2007-07-18 22:15 UTC, Bradford Folkens
Details
ocropus-0.2.ebuild (ocropus-0.2.ebuild,1.35 KB, text/plain)
2008-08-28 04:14 UTC, SpanKY
Details
files/ocropus-0.2-build.patch (ocropus-0.2-build.patch,1.56 KB, patch)
2008-08-28 04:18 UTC, SpanKY
Details | Diff
ocropus-0.3.1.ebuild (ocropus-0.3.1.ebuild,1.28 KB, text/plain)
2008-11-02 06:59 UTC, SpanKY
Details
ebuild building http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz (ocropus-0.4.ebuild,1.50 KB, text/plain)
2009-08-17 15:08 UTC, Daa Jaa
Details
ebuild to build the version 0.4 of media-libs/iulib, whose source are bundled with the source of OCRopus in http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz (iulib-0.4.ebuild,1.40 KB, text/plain)
2009-08-17 15:12 UTC, Daa Jaa
Details
patch for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/ocropus/genAM.py (genAM.py.patch,487 bytes, patch)
2009-08-17 15:36 UTC, Daa Jaa
Details | Diff
patch used by =media-libs/iulib-0.4 for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/iulib/utils/dgraphics_nosdl.cc (utils.dgraphics_nosdl.cc.patch,496 bytes, text/plain)
2009-08-17 15:40 UTC, Daa Jaa
Details
Build log from failed ocropus-0.4.0 build (1250807166-install-app-text_ocropus-0.4:0::myoverlay.out,47.79 KB, text/plain)
2009-08-20 22:28 UTC, Jose daLuz
Details
Output of "paludis --info ocropus" (paludis_info_ocropus.txt,22.37 KB, text/plain)
2009-08-20 22:29 UTC, Jose daLuz
Details
empty attachment created by mistake. (useless file,21 bytes, text/plain)
2009-09-03 12:49 UTC, Daa Jaa
Details
media-libs/iulib-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz (iulib-0.4-r1.ebuild,1.65 KB, text/plain)
2009-09-03 12:56 UTC, Daa Jaa
Details
app-text/ocropus-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz (ocropus-0.4-r1.ebuild,2.43 KB, text/plain)
2009-09-03 13:00 UTC, Daa Jaa
Details
/var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log (build.log,164.46 KB, text/plain)
2009-09-03 13:26 UTC, Daa Jaa
Details
/var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log (emerge.info,4.45 KB, text/plain)
2009-09-03 13:28 UTC, Daa Jaa
Details
source file of utility to help ocropus-0.4-r1 produce pdf files. (hocrtopdf-0.0.tgz,2.20 KB, application/octet-stream)
2009-09-03 13:39 UTC, Daa Jaa
Details
app-text/hocrtopdf-0.0.ebuild to create PDF using `ocropus buildhtml dir` (hocrtopdf-0.0.ebuild,582 bytes, text/plain)
2009-09-03 13:59 UTC, Daa Jaa
Details
app-text/hocrtopdf-0.1.tgz to help app-text/ocropus-0.4-r1 produce pdf files (hocrtopdf-0.1.tgz,2.92 KB, text/plain)
2009-09-04 09:42 UTC, Daa Jaa
Details
app-text/hocrtopdf-0.1.ebuild to make PDF from the html output of OCRopus (hocrtopdf-0.1.ebuild,639 bytes, text/plain)
2009-09-04 11:19 UTC, Daa Jaa
Details
Patch for building with glibc-2.10 (ocropus-0.4-glibc-2.10.patch,1.11 KB, patch)
2009-10-28 12:23 UTC, Pavel Denisov
Details | Diff
Updated ebuild for building with glibc-2.10 (ocropus-0.4-r1.ebuild,2.48 KB, text/plain)
2009-10-28 12:24 UTC, Pavel Denisov
Details
build.log error (build.log,101.47 KB, text/plain)
2010-05-12 14:59 UTC, Silvio
Details
output of emerge -p --update --newuse --tree --deep ocropus (b,26.35 KB, text/plain)
2010-05-20 09:33 UTC, Daa Jaa
Details
Updated iulib ebuild using merucrial and scons (iulib-9999.ebuild,875 bytes, text/plain)
2010-05-21 12:01 UTC, Pavel Denisov
Details
Updated ocropus ebuild using mercurial and scons (ocropus-9999.ebuild,851 bytes, text/plain)
2010-05-21 12:03 UTC, Pavel Denisov
Details
Patch for changed libtiff (ocr-voronoi_tiff_version.patch,618 bytes, patch)
2010-05-21 12:04 UTC, Pavel Denisov
Details | Diff
Update to include sdl-image (iulib-9999.ebuild,898 bytes, text/plain)
2010-07-08 22:08 UTC, gentoo@danielquinn.org
Details
ocropus-0.4-r1.ebuild, with no-as-needed (ocropus-0.4-r1.ebuild,2.54 KB, text/plain)
2011-11-22 06:02 UTC, Leho Kraav (:macmaN @lkraav)
Details
Small patch to Leho Kraav's ebuild posted in Comment 66. (ocropus-kraav-ebuild.patch,1.22 KB, patch)
2011-11-22 15:38 UTC, Clemmitt M. Sigler
Details | Diff
source files for app-text/image2text-pdf-0.0.1.ebuild (image2text-pdf-0.0.1.tgz,80.00 KB, application/octet-stream)
2011-11-28 11:36 UTC, Daa Jaa
Details
source files for app-text/image2text-pdf-0.0.1.ebuild (image2text-pdf-0.0.1.tgz,80.00 KB, application/octet-stream)
2011-11-28 11:48 UTC, Daa Jaa
Details
app-text/image2text-pdf-0.0.1.ebuild makes OCRopus, tesseract, and cuneiform vote; Automated in a particular multi-image setting. (image2text-pdf-0.0.1.ebuild,1.55 KB, text/plain)
2011-11-28 12:19 UTC, Daa Jaa
Details
/usr/local/portage/app-text/image2text-pdf/Manifest contains checksums for two previous attachments (Manifest,411 bytes, text/plain)
2011-11-28 12:22 UTC, Daa Jaa
Details
app-text/scan4image2text-pdf-0.0.0.tgz, automation of multi-image setting according to what is typed on USB-linked to HP LaserJet 3020. Needs app-text/image2text-pdf installed on another computer. (scan4image2text-pdf-0.0.0.tgz,30.00 KB, text/plain)
2011-11-28 12:29 UTC, Daa Jaa
Details
ebuild for ocropus 0.7 with source git (ocropus-0.7.ebuild,2.19 KB, text/plain)
2014-07-01 01:04 UTC, Michael Klapproth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bradford Folkens 2007-07-18 21:26:53 UTC
OCRopus is a state-of-the-art document analysis and OCR system.

Reproducible: Always
Comment 1 Bradford Folkens 2007-07-18 21:29:16 UTC
Created attachment 125286 [details]
ebuild
Comment 2 Bradford Folkens 2007-07-18 22:15:36 UTC
Created attachment 125294 [details]
ebuild
Comment 3 Christian Faulhammer (RETIRED) gentoo-dev 2007-10-24 15:46:09 UTC
You should update this bug report a bit to reflect that it is already in Sunrise
Comment 4 Bradford Folkens 2007-10-24 19:54:28 UTC
Accepted into Sunrise
app-text/ocropus-svn
Comment 5 Andrey Melentyev 2007-10-25 22:54:32 UTC
By the way, there's now a alpha version (0.1.0) available.
http://ocropus.googlecode.com/files/ocropus-0.1.0.tar.gz
Comment 6 Pavel Denisov 2008-04-22 15:58:00 UTC
Ebuild should depend of dev-util/ftjam instead of dev-util/jam, which was masked on 16 Mar 2008.
Comment 7 Hanno Böck gentoo-dev 2008-07-20 18:18:23 UTC
*** Bug 223023 has been marked as a duplicate of this bug. ***
Comment 8 SpanKY gentoo-dev 2008-08-28 04:14:32 UTC
Created attachment 163958 [details]
ocropus-0.2.ebuild

integrates Bug 223023 stuff cleanly
Comment 9 SpanKY gentoo-dev 2008-08-28 04:18:15 UTC
Created attachment 163959 [details, diff]
files/ocropus-0.2-build.patch
Comment 10 SpanKY gentoo-dev 2008-11-02 06:59:45 UTC
Created attachment 170502 [details]
ocropus-0.3.1.ebuild

seems to have parallel build problems, but i havent really investigated it
Comment 11 Alexey Vladykin 2009-03-28 13:26:52 UTC
I installed ocropus using attached ocropus-0.3.1.ebuild with default use-flags (sdl, spell).
First observation: src_test() does not work, should be replaced with "make check || die"
Second observation: I'm unable to recognize anything:

av@snork$ ocroscript recognize text.png 
ocroscript: [string "require(arg_script_name)"]:1: module 'recognize' not found:
	no field package.preload['recognize']
	no file '/usr/local/share/ocropus/scripts//recognize.lua'
	no file './recognize.so'
	no file '/usr/local/lib/lua/5.1/recognize.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'

I found recognize.lua here as /usr/share/ocropus/scripts/recognize.lua, but ocroscript did not look there. Should ebuild give --prefix to configure?
Comment 12 Alexey Vladykin 2009-03-28 14:19:27 UTC
It has nothing to do with configure and --prefix. I found /usr/local/ paths hardcoded in ocroscript/ocrotoplevel.cc and in several other places.
Exporting OCROSCRIPTS=/usr/share/ocropus/scripts/ helps ocroscript to find required files.
Comment 13 Jeremy Johnson 2009-05-22 15:24:25 UTC
I had installed ocropus-svn from the sunrise overlay, but it only installed
/usr/bin/ocropus
So I decided to try out this ebuild to that scripts (lua, python) would work.
I'm trying to install ocropus with openfst.
There is a sci-misc/openfst from the science overlay which I've installed.
There is also a simpler media-libs/openFST ebuild which I found by googling.
Both ebuilds are for the same program from http://www.openfst.org.

Trying to emerge ocropus with following USE flags:
[ebuild  N    ] app-text/ocropus-0.3.1  USE="interactive lua openfst sdl spell" 0 kB [1]

I get the following error:
checking fst/lib/fst.h usability... no                                                                                       
checking fst/lib/fst.h presence... no                                                                                        
checking for fst/lib/fst.h... no                                                                                             
checking for main in -lfst... yes                                                                                            
configure: error: Could not find openFST! Choose --without-fst if you do not want to use it.                                 

!!! Please attach the following file when seeking support:
!!! /var/tmp/portage/app-text/ocropus-0.3.1/work/ocropus-0.3/config.log
 *                                                                     
 * ERROR: app-text/ocropus-0.3.1 failed.                               
 * Call stack:                                                         
 *               ebuild.sh, line   49:  Called src_compile             
 *             environment, line 2138:  Called econf '--with-tesseract=/usr' '--with-iulib=/usr' '--with-fst' '--with-aspell' '--with-SDL' '--with-leptonica'                                                                                             
 *               ebuild.sh, line  534:  Called die

Comment 14 Jeremy Johnson 2009-05-22 15:28:19 UTC
(In reply to comment #13)
> I had installed ocropus-svn from the sunrise overlay, but it only installed
> /usr/bin/ocropus
> So I decided to try out this ebuild to that scripts (lua, python) would work.
> I'm trying to install ocropus with openfst.
> There is a sci-misc/openfst from the science overlay which I've installed.
> There is also a simpler media-libs/openFST ebuild which I found by googling.
> Both ebuilds are for the same program from http://www.openfst.org.
> 
> Trying to emerge ocropus with following USE flags:
> [ebuild  N    ] app-text/ocropus-0.3.1  USE="interactive lua openfst sdl spell"
> 0 kB [1]
> 
> I get the following error:
> checking fst/lib/fst.h usability... no                                          
> checking fst/lib/fst.h presence... no                                           
> checking for fst/lib/fst.h... no                                                
> checking for main in -lfst... yes                                               
> configure: error: Could not find openFST! Choose --without-fst if you do not
> want to use it.                                 
> 
> !!! Please attach the following file when seeking support:
> !!! /var/tmp/portage/app-text/ocropus-0.3.1/work/ocropus-0.3/config.log
>  *                                                                     
>  * ERROR: app-text/ocropus-0.3.1 failed.                               
>  * Call stack:                                                         
>  *               ebuild.sh, line   49:  Called src_compile             
>  *             environment, line 2138:  Called econf '--with-tesseract=/usr'
> '--with-iulib=/usr' '--with-fst' '--with-aspell' '--with-SDL'
> '--with-leptonica'                                                              
>  *               ebuild.sh, line  534:  Called die
> 




This is what openfst installs:
# epm -ql openfst|grep \/lib
/usr/bin/libfstmain.so
/usr/include/fst/lib/fst.h
/usr/include/fst/lib/arc.h
/usr/include/fst/lib/compat.h
/usr/include/fst/lib/properties.h
/usr/include/fst/lib/register.h
/usr/include/fst/lib/symbol-table.h
/usr/include/fst/lib/util.h
/usr/lib/libfst.so
/usr/lib/libfstmain.so
Comment 15 Jose daLuz 2009-06-10 00:28:51 UTC
ocropus 0.4 alpha 4 has been released.
Comment 16 Daa Jaa 2009-08-17 15:08:50 UTC
Created attachment 201538 [details]
ebuild building http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz

I have successfully compiled OCRopus 0.4 on my gentoo system.
I will also attach the ebuild for =media-libs/iulib-0.4 (emerge sync is still stuck to =media-libs/iulib-0.3)
and two patch files necessary for these ebuilds.
Comment 17 Daa Jaa 2009-08-17 15:12:10 UTC
Created attachment 201540 [details]
ebuild to build the version 0.4 of media-libs/iulib, whose source are bundled with the source of OCRopus in http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz

=app-text/ocropus-0.4.ebuild needs >=media-libs/iulib-0.4, and emerge sync only have <=media-libs/iulib-0.3 thus I made a special ebuild
Comment 18 Daa Jaa 2009-08-17 15:36:58 UTC
Created attachment 201545 [details, diff]
patch for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/ocropus/genAM.py

This patch suppresses ocr-autoclean, according to http://www.mail-archive.com/ocropus@googlegroups.com/msg00541.html as a suggestion to answer to the error "make[1]: *** No rule to make target `ocr-autoclean/ocr-orientation.cc', needed by `ocr-orientation.o'.  Stop."
Comment 19 Daa Jaa 2009-08-17 15:40:06 UTC
Created attachment 201547 [details]
patch used by =media-libs/iulib-0.4 for http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz/ocropus-0.4/iulib/utils/dgraphics_nosdl.cc

This patch allows ocropus and other users of iulib to operate when the sdl support of iulib was disabled (by the user or because configure thinked it did not work).
Otherwise, if sdl is checked bad, libiulibs.a lacks dsection_set and dactive, breaking ocropus-0.4.ebuild and all other compilations include'ing /usr/include/iulib/iulib.h
Comment 20 Jose daLuz 2009-08-19 23:06:57 UTC
The iulib build fails for me at automake:

aclocal.m4:16: warning: this file was generated for autoconf 2.61.
You have another version of autoconf.  It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically `autoreconf'.
configure.ac:8: version mismatch.  This is Automake 1.10.2,
configure.ac:8: but the definition used by this AM_INIT_AUTOMAKE
configure.ac:8: comes from Automake 1.10.1.  You should recreate
configure.ac:8: aclocal.m4 with aclocal and run automake again.
Comment 21 Jose daLuz 2009-08-20 22:27:03 UTC
Replacing "eautomake" in the iulib ebuild with "eautoreconf" seems to do the trick, it builds fine.

Now ocropus fails to build:


x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"ocropus\" -DPACKAGE_TARNAME=\"ocropus\" -DPACKAGE_VERSION=\"0.3\" -DPACKAGE_STRING=\"ocropus\ 0.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"ocropus\" -DVERSION=\"0.3\" -DSTDC_HEADERS=1 -DHAVE_SYS_WAIT_H=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_FLOAT_H=1 -DHAVE_MALLOC_H=1 -DHAVE_STDINT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_WCHAR_H=1 -DHAVE_LIBZ=1 -DHAVE_LIBPNG=1 -DHAVE_LIBJPEG=1 -DHAVE_LIBTIFF=1 -DHAVE_LIBIULIB=1 -DHAVE_LIBPTHREAD=1 -DHAVE_LIBTESSERACT_FULL=1 -DHAVE_LIBGSLCBLAS=1 -DHAVE_LIBGSL=1 -DHAVE_LIBSDL=1 -DHAVE_LIBGOMP=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_DUP2=1 -DHAVE_MEMSET=1 -DHAVE_SQRT=1 -DHAVE_STRCHR=1 -DHAVE_STRDUP=1 -DHAVE_STRRCHR=1 -I.  -I./include -I./ocr-utils -I/usr/include -I/usr/include/tesseract  -I/usr/include/tesseract -DHAVE_TESSERACT   -march=native -O2 -pipe -ggdb -fopenmp -Wall -Wno-sign-compare -Wno-write-strings -Wno-deprecated -march=native -O2 -pipe -ggdb -fopenmp -MT xml-entities.o -MD -MP -MF .deps/xml-entities.Tpo -c -o xml-entities.o `test -f './ocr-utils/xml-entities.cc' || echo './'`./ocr-utils/xml-entities.cc
./ocr-utils/xml-entities.cc: In function 'void ocropus::xml_unescape(colib::nustring&, const char*)':
./ocr-utils/xml-entities.cc:119: error: invalid conversion from 'const char*' to 'char*'
make[1]: *** [xml-entities.o] Error 1
make[1]: Leaving directory `/var/tmp/paludis/app-text-ocropus-0.4/work/ocropus-0.4'
make: *** [all-recursive] Error 1
/usr/libexec/paludis/utils/emake: emake returned error 2

!!! ERROR in app-text/ocropus-0.4:

I will attach the build log.
Comment 22 Jose daLuz 2009-08-20 22:28:46 UTC
Created attachment 201821 [details]
Build log from failed ocropus-0.4.0 build

Portage 2.2_rc38 (default/linux/amd64/2008.0/desktop, gcc-4.4.1, glibc-2.10.1-r0, 2.6.30.4 x86_64)
=================================================================
System uname: Linux-2.6.30.4-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q9650_@_3.00GHz-with-gentoo-2.0.1
Timestamp of tree: Wed, 19 Aug 2009 21:45:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     4.0_p28
dev-java/java-config: 2.1.8-r1
dev-lang/python:     2.5.4-r2, 2.6.2-r1, 3.1.1
dev-util/ccache:     2.4-r8
dev-util/cmake:      2.6.4-r2
sys-apps/baselayout: 2.0.1
sys-apps/openrc:     0.4.3-r3
sys-apps/sandbox:    2.0
sys-devel/autoconf:  2.13, 2.63-r1
sys-devel/automake:  1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2, 1.11
sys-devel/binutils:  2.19.51.0.14
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6a
virtual/os-headers:  2.6.30-r1
ACCEPT_KEYWORDS="amd64 ~amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -ggdb"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=native -O2 -pipe -ggdb"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests ccache distlocks fixpackages parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1"
LINGUAS="en"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/overlays/zugaina /usr/local/overlays/THE /usr/local/overlays/sunrise /usr/local/overlays/gnome /usr/local/overlays/desktop-effects /usr/local/overlays/vmware /usr/local/overlays/mozilla /usr/local/overlays/ikelos /usr/local/overlays/java-overlay /usr/local/overlays/berkano /usr/local/overlays/gcc-porting /usr/local/overlays/myoverlay"
SYNC="rsync://rsync.us.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa amd64 amr avahi bash-completion berkdb bluetooth branding bzip2 cairo cdda cdr cli cracklib crypt css cups dbus dri dts dvd dvdr eds emboss encode esd evo expat fam ffmpeg flac fortran gdbm gif gnome gnome-keyring gpm gstreamer gtk hal iconv ipv6 isdnlog java jpeg lcms ldap libnotify mad mikmod mmx mono mp3 mp4 mpeg mudflap multilib nautilus ncurses networkmanager nls nptl nptlonly nsplugin ogg opengl openmp paludis pam pcre pdf perl pic png policykit ppds pppd pulseaudio python quicktime readline reflection ruby samba sdl session spell spl sqlite sse sse2 ssl startup-notification svg sysfs tcpd theora tiff tracker truetype unicode usb userlocales vcd vorbis x264 xcb xml xorg xulrunner xv xvid zlib" ALSA_CARDS="cmipci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="nvidia"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 23 Jose daLuz 2009-08-20 22:29:48 UTC
Created attachment 201823 [details]
Output of "paludis --info ocropus"
Comment 24 Jose daLuz 2009-08-20 22:35:19 UTC
I believe this would be one of the glibc 2.10 failures tracked in bug 270353 and explained in this blog posting from flameeyes: http://blog.flameeyes.eu/2009/05/24/c-libraries-galore
Comment 25 Daa Jaa 2009-09-03 12:49:52 UTC
Created attachment 203042 [details]
empty attachment created by mistake.

I have added some more dependencies. It now compiles on amd64 and x86, with USE="interactive sdl spell -lua -openfst" and runs smoothly. Results are poor (90%) for foreign languages as fr.
I am not sure that it is acually calling tesseract, as tesseract has better recognition rates. I am now looking in the source code.

app-text/cuneiform/cuneiform-0.7.ebuild has recognition rates being either 99% (good) or 0% (it fails, produces a useless Hocr-html file)
Comment 26 Daa Jaa 2009-09-03 12:56:23 UTC
Created attachment 203043 [details]
media-libs/iulib-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz

I have added some more dependencies. It now compiles on amd64 and x86, with USE="sdl" (but sdl support fails on my gentoo, iulib continue anyway and make ocropus happy, albeit with messages like "no image display, since dgraphics disabled in iulib").
Comment 27 Daa Jaa 2009-09-03 13:00:00 UTC
Created attachment 203045 [details]
app-text/ocropus-0.4-r1.ebuild ebuild working on x86 and amd64 compiling http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz

[this is a dupe, previous attachment was empty by mistake]

I have added some more dependencies. It now compiles on amd64 and x86, with USE="interactive sdl spell -lua -openfst" and runs smoothly. Results are poor      (90%) for foreign languages as fr.
I am not sure that it is acually calling tesseract, as tesseract has better recognition rates. I am now looking in the source code.

app-text/cuneiform/cuneiform-0.7.ebuild has recognition rates being either 99% (good) or 0% (it fails, produces a useless Hocr-html file).
Comment 28 Daa Jaa 2009-09-03 13:03:54 UTC
#26 : this ebuild is for media-libs/iulib-0.4-r1.ebuild, not app-text/ocropus-0.4-r1.ebuild. The sources of iulib are nonetheless in http://ocropus.googlecode.com/files/ocropus-0.4.tar.gz.
Comment 29 Daa Jaa 2009-09-03 13:26:40 UTC
Created attachment 203050 [details]
/var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log

I did not have the problem, and I think indeed that #24 is right. I recompiled it with `ebuild /usr/local/portage/app-text/ocropus/ocropus-0.4-r1.ebuild install`, and provide my /var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log

I do not have paludis. See emerge --info on next attachment.

(I wrongly replaced ocropus-0.4-r1 in this attachment)
Comment 30 Daa Jaa 2009-09-03 13:28:06 UTC
Created attachment 203051 [details]
/var/tmp/portage/app-text/ocropus-0.4-r1/temp/build.log

emerge --info.

(I wrongly replaced ocropus-0.4-r1 by ocropus-0.4 in this attachment)
Comment 31 Daa Jaa 2009-09-03 13:31:55 UTC
I do not know if (and how) my app-text/ocropus-0.4-r1.ebuild should take bug 270353 in account.
Comment 32 Daa Jaa 2009-09-03 13:39:05 UTC
Created attachment 203053 [details]
source file of utility to help ocropus-0.4-r1 produce pdf files.

This is a small file so I put it right where it will be useful.
Comment 33 Daa Jaa 2009-09-03 13:59:18 UTC
Created attachment 203059 [details]
app-text/hocrtopdf-0.0.ebuild to create PDF using `ocropus buildhtml dir`

This is an undocumented converter to the .pdf format from the .hocr format produced by app-text/ocropus-0.4-r1 on any image.

Usage: hocrtopdf inputHocrFile fontHelveticaORCourier outputPdfFile imagedpix imagedpiy imagepixelwidth imagepixelheight oneifisotropic

Example: converting an image to a pdf file, with optical character recognition:

size=`file b.png | sed -e 's:^[^0-9]*::' -e 's:,.*::'`
sudo emerge -nav '=app-text/ocropus-0.4-r1' '=sys-apps/file-4.23'
rm -rf dir.ocropus
ocropus book2pages dir.ocropus b.png
ocropus pages2lines dir.ocropus
ocropus lines2fsts dir.ocropus
ocropus fsts2text dir.ocropus
ocropus buildhtml dir.ocropus | tr 'A\012' '\012A' | sed 's% Transitional//ENA   http://www%Transitional//EN"A   "http://www%' | tr 'A\012' '\012A' > b.hocr
hocrtopdf b.hocr Helvetica b.pdf 300 300 ${size%% x*} ${size##*x } 1

Bugs:
The built-in helps and comments are completely outdated in hocrtopdf.
This is a patched version of http://xplus3.net/downloads/HocrConverter.gz -- I still have to ask for an agreement.

Output of this example:
 ~ $ size=`file b.png | sed -e 's:^[^0-9]*::' -e 's:,.*::'`
 ~ $ ocropus book2pages dir.ocropus b.png
[info] page 1
 ~ $ ocropus pages2lines dir.ocropus
no image display, since dgraphics disabled in iulib
[info] page 1
[info] #lines = 25
 ~ $ ocropus lines2fsts dir.ocropus
no image display, since dgraphics disabled in iulib
[info] rate nan errs 0 ntrue 0 npred 0 lines 0 nogt 24
 ~ $ ocropus fsts2text dir.ocropus
[info] dir.ocropus/0001/0001.fst (0/24)
[info] dir.ocropus/0001/0007.fst        99999996802856924650656260769173209088.000000
[info] dir.ocropus/0001/0018.fst        99999996802856924650656260769173209088.000000
 ~ $ ocropus buildhtml dir.ocropus | tr 'A\012' '\012A' | sed 's% Transitional//ENA   http://www%Transitional//EN"A   "http://www%' | tr 'A\012' '\012A' > b.hocr
 ~ $ hocrtopdf b.hocr Helvetica b.pdf 300 300 ${size%% x*} ${size##*x } 1
/usr/lib/python2.6/site-packages/reportlab/pdfgen/canvas.py:17: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
 ~ $ ls -l b.pdf
-rw-r--r-- 1 me me 3399 Sep  3 15:55 b.pdf
 ~ $ file b.pdf
b.pdf: PDF document, version 1.3
 ~ $
Comment 34 Daa Jaa 2009-09-03 14:10:23 UTC
I just asked for permission to patch on http://xplus3.net/2009/04/02/convert-hocr-to-pdf/#comment-663
Comment 35 Daa Jaa 2009-09-04 09:42:40 UTC
Created attachment 203108 [details]
app-text/hocrtopdf-0.1.tgz to help app-text/ocropus-0.4-r1 produce pdf files

Ok, licence cleared (see http://xplus3.net/2009/04/02/convert-hocr-to-pdf/#comment-664 :-) and command-line help updated.
Comment 36 Daa Jaa 2009-09-04 11:19:14 UTC
Created attachment 203111 [details]
app-text/hocrtopdf-0.1.ebuild to make PDF from the html output of OCRopus

version bump, with correct LICENCE and SRC_URI. hocrtopdf is alpha, so use GENTOO_MIRRORS="" emerge to use this ebuild.
Comment 37 Jose daLuz 2009-09-12 17:24:20 UTC
(In reply to comment #29)
> I did not have the problem, and I think indeed that #24 is right. 
I have the same issue with 0.4-r1. Per the links I provided this is an issue with glibc 2.10 only, your emerge --info shows glibc 2.5 so you will not encounter this issue.

Comment 38 Josh McClain 2009-09-18 17:40:03 UTC
Other than ocropus not building with lua enabled (another completely separate topic), everything built fine.

However, when I run hocrtopdf, built from hocrtopdf-0.1.ebuild, I get this:

$ hocrtopdf b.hocr Helvetica b.pdf 300 300 2550 3300 1
/usr/lib/python2.6/site-packages/reportlab/pdfgen/canvas.py:17: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
Traceback (most recent call last):
  File "/usr/local/bin/hocrtopdf", line 188, in <module>
    hocr = HocrConverter(sys.argv[1])
  File "/usr/local/bin/hocrtopdf", line 58, in __init__
    self.parse_hocr(hocrFileName)
  File "/usr/local/bin/hocrtopdf", line 103, in parse_hocr
    self.hocr.parse(hocrFileName)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
    parser.feed(data)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: syntax error: line 3, column 5

I'm not sure what this is:   xml.parsers.expat.ExpatError
Comment 39 Josh McClain 2009-09-18 17:43:19 UTC
I should have mentioned...

hocrtopdf shows two dependencies.  Here are the versions I'm using...

dev-python/reportlab-2.1
dev-python/pyxml-0.8.4-r2
Comment 40 Daa Jaa 2009-09-21 15:04:03 UTC
Your error message is "xml.parsers.expat.ExpatError: syntax error: line 3, column 5".
Can you show me the beginning of the 3rd line of the file provided in first argument to hocrtopdf ?
Comment 41 Josh McClain 2009-09-22 15:55:25 UTC
Sorry, it was actually "column 59":

xml.parsers.expat.ExpatError: syntax error: line 3, column 59


$ head -3 b.hocr 
<!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN
   http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Comment 42 Daa Jaa 2009-09-23 11:53:24 UTC
#41:
Try this:
sed -e '2s:$:":' -e '3s:h:"h:' -i b.hocr
before using hocrtopdf.

The patched b.hocr should then generate:
$ head -3 b.hocr
<!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

This is a bug of ocropus that I correct each time with sed (see my comment
#33). You may submit a patch to ocropus, precisely about line 199 of
/var/tmp/portage/app-text/ocropus-0.4/work/ocropus-0.4/ocropus/ocr-commands/ocr-commands.cc
Comment 43 Pavel Denisov 2009-10-28 12:23:36 UTC
Created attachment 208505 [details, diff]
Patch for building with glibc-2.10
Comment 44 Pavel Denisov 2009-10-28 12:24:32 UTC
Created attachment 208506 [details]
Updated ebuild for building with glibc-2.10
Comment 45 Silvio 2010-05-12 14:59:11 UTC
Created attachment 231227 [details]
build.log error

Today it asked me to remerge ocropus and it gets this error in ebuilding it.
Comment 46 Daa Jaa 2010-05-20 09:33:34 UTC
Created attachment 232215 [details]
output of emerge -p --update --newuse --tree --deep ocropus

@Silvio, from #45: "Today it asked me to remerge ocropus and it gets this error in ebuilding it."
The last lines of your file build.log gives two undefined keywords, TIFFHeader and TIFF_VERSION.

The search http://www.google.fr/search?q=TIFFHeader+%22tiff_version%22 gives me
http://code.google.com/p/ocropus/issues/detail?id=112#c1 which contains a patch about
TIFFHeaderClassic and TIFF_VERSION_CLASSIC.

The search http://www.google.fr/search?q=TIFFHeaderClassic+%22TIFF_VERSION_CLASSIC%22
shows that this patch reflect a possible change of libtiff and media-libs/tiff.

Please:
(1) write here to say if this patch helped you.
(2) write here which version of media-libs/tiff obsoleted TIFFHeader and TIFF_VERSION.

If (1) did not work, you may execute emerge -p --update --newuse --tree --deep ocropus and compare your output to mine, attached here, and look for tiff and for lines close to ocropus lines.
It may be better though to compare /var/log/emerge.log but I do not know how to do that. You may also post here your /var/log/emerge.log and the output of emerge --info
Comment 47 Pavel Denisov 2010-05-21 10:24:02 UTC
The patch solved the problem with changed libtiff.
Thank you very much, Daa Jaa.

(In reply to comment #46)

> The search http://www.google.fr/search?q=TIFFHeader+%22tiff_version%22 gives me
> http://code.google.com/p/ocropus/issues/detail?id=112#c1 which contains a patch
> about
> TIFFHeaderClassic and TIFF_VERSION_CLASSIC.
> 
Comment 48 Pavel Denisov 2010-05-21 12:01:58 UTC
Created attachment 232343 [details]
Updated iulib ebuild using merucrial and scons
Comment 49 Pavel Denisov 2010-05-21 12:03:38 UTC
Created attachment 232345 [details]
Updated ocropus ebuild using mercurial and scons
Comment 50 Pavel Denisov 2010-05-21 12:04:19 UTC
Created attachment 232347 [details, diff]
Patch for changed libtiff
Comment 51 Daa Jaa 2010-05-21 20:10:50 UTC
In response to Pavel https://bugs.gentoo.org/show_bug.cgi?id=185810#c49 :
Could you precise the exact mercurial version you tested your ebuild against ? Current mercurial version of ocropus and iulib are 65011c70b3 e5f183e0bf, but I guess you tested a different version.

Alternatively, is it hard to produce ebuilds based on a version of ocropus and iulib that one will be able to download in a few months also ?
Comment 52 Daa Jaa 2010-05-21 20:11:49 UTC
In response to Pavel https://bugs.gentoo.org/show_bug.cgi?id=185810#c49 :
Could you precise the exact mercurial version you tested your ebuild against ? Current mercurial version of ocropus and iulib are 65011c70b3 e5f183e0bf, but I guess you tested a different version.

Alternatively, is it hard to produce ebuilds based on a version of ocropus and iulib that one will be able to download in a few months also ?
Comment 53 Daa Jaa 2010-05-21 20:12:42 UTC
Sorry to everyone for the duplicate comment. Shame on me.
Comment 54 Pavel Denisov 2010-05-21 21:29:24 UTC
I have 65011c70b3d7 and e5f183e0bf9e, so I guess I've tested against same versions.
It should be not hard to modify this ebuilds to build concrete mercurial tag, as described at http://code.google.com/p/ocropus/wiki/InstallTranscript
Comment 55 Daa Jaa 2010-05-25 09:34:10 UTC
Thank you very much Pavel!
Comment 56 gentoo@danielquinn.org 2010-07-08 22:08:49 UTC
Created attachment 238081 [details]
Update to include sdl-image

Installation will fail without media-libs/sdl-image enabled as well.
Comment 57 Sean Langford 2010-08-06 13:55:58 UTC
Note: I had to remove the libtiff patch to get this to work.

Thanks everyone for these ebuilds!!
Comment 58 Chí-Thanh Christopher Nguyễn gentoo-dev 2010-09-14 12:09:55 UTC
*** Bug 329063 has been marked as a duplicate of this bug. ***
Comment 59 Daa Jaa 2010-09-14 14:47:47 UTC
We just learned that another ebuild was made by gpo-overlay.
http://gpo.zugaina.org/app-text/ocropus

Who is the maintainer of this ebuild ?
Comment 60 dE 2010-09-14 15:32:50 UTC
If we would have, this would have been in portage.
Comment 61 Leho Kraav (:macmaN @lkraav) 2010-11-08 23:10:39 UTC
applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1 re bug 329063 (undefined SDL_MapRGB).
Comment 62 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2011-01-14 15:00:04 UTC
The ebuild has been removed from Sunrise.
Comment 63 lalebarde 2011-10-08 12:00:39 UTC
Hi all,
I failed building ocropus. I tried several attachments. Can someone post a tarball with ocrobus & hocrtopdf, iulib ebuilds and files (patches) that works please ? Just to untar them in one's own overlay and emerge.
Comment 64 lalebarde 2011-10-08 13:25:42 UTC
Hi all,
I failed building ocropus. I tried several attachments. Can someone post a tarball with ocrobus & hocrtopdf, iulib ebuilds and files (patches) that works please ? Just to untar them in one's own overlay and emerge.
Comment 65 Clemmitt M. Sigler 2011-11-22 04:29:12 UTC
Hello all Gentoo users interested in ocropus,

This bug seems to have gone silent in 2011.  Is no one using ocropus under Gentoo any more?  I wonder if it's because of the linking error which seems to have persisted.  Here are some comments documenting what I've done; please correct me wherever I've made a mistake!  TIA.

In emerging app-text/ocropus-0.4-r1, I cannot get past the libiulib.a(dgraphics.o) problems, thusly:
========
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)':
(.text+0x22b): undefined reference to `SDL_MapRGB'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)':
(.text+0x23c): undefined reference to `SDL_FillRect'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dclear(int)':
(.text+0x25c): undefined reference to `SDL_UpdateRect'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dwait()':
(.text+0x294): undefined reference to `SDL_WaitEvent'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dinit(int, int, bool)':
(.text+0x5a1): undefined reference to `SDL_Init'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dinit(int, int, bool)':
(.text+0x5b8): undefined reference to `SDL_SetVideoMode'
/usr/lib/libiulib.a(dgraphics.o): In function `iulib::dflush()':
(.text+0x2d8): undefined reference to `SDL_Flip'
========

I've tried Daa Jaa's utils.dgraphics_nosdl.cc.patch for iulib, updating its version to media-libs/iulib-0.4-r1 via his corresponding ebuild; please see Comment 26.  However, the above error persists.  It seems to me that this must be a problem with iulib ver. 0.4 being out of date in relation to what ocropus requires(?).  I suppose my next step is to try the live mercurial builds posted by Pavel Denisov; please see Comment 48 and Comment 49, and also Comment 50.

My overall goal is to build app-text/gscan2pdf (see http://gpo.zugaina.org/Search?search=gscan2pdf) with support for ocropus under Gentoo.  Building it with support for cuneiform and tesseract seems to be working.

Note that ocropus ver. 0.4 won't compile with tesseract ver. 3.0x.  If I downgrade tesseract to the stable version, app-text/tesseract-2.04-r1, at least the ocropus source compiles, but then it fails to resolve references as above.

Also note that tesseract ver. 3.0x depends on leptonica.  However, media-libs/leptonica conflicts with app-text/tesseract-2.04-r1.  When leptonica is installed (as far back as media-libs/leptonica-1.62) compilation of tesseract-2.04-r1 fails like so:
========
leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetHalftoneMask(Pix*, Pix**, Boxa**, Pixa**, bool)':
leptonica_pageseg.cpp:69: error: 'int32' was not declared in this scope
leptonica_pageseg.cpp:69: error: expected ';' before 'debug'
leptonica_pageseg.cpp:73: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetTextlineMask(Pix*, Pix**, Pix**, Boxa**, Pixa**, bool)':
leptonica_pageseg.cpp:139: error: 'int32' was not declared in this scope
leptonica_pageseg.cpp:139: error: expected ';' before 'debug'
leptonica_pageseg.cpp:143: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetTextblockMask(Pix*, Pix**, Boxa**, Pixa**, bool)':
leptonica_pageseg.cpp:211: error: 'int32' was not declared in this scope
leptonica_pageseg.cpp:211: error: expected ';' before 'debug'
leptonica_pageseg.cpp:220: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp: In static member function 'static bool LeptonicaPageSeg::GetAllRegions(Pix*, Pix**, Pix**, Pix**, bool)':
leptonica_pageseg.cpp:273: error: 'int32' was not declared in this scope
leptonica_pageseg.cpp:273: error: expected ';' before 'w'
leptonica_pageseg.cpp:274: error: 'w' was not declared in this scope
leptonica_pageseg.cpp:274: error: 'h' was not declared in this scope
leptonica_pageseg.cpp:275: error: expected ';' before 'debug'
leptonica_pageseg.cpp:288: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp:293: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp:298: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp:302: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp:311: error: 'debug' was not declared in this scope
leptonica_pageseg.cpp:320: error: 'debug' was not declared in this scope
/usr/include/liblept/leptprotos.h:553: error: too few arguments to function 'PIX* pixRenderRandomCmapPtaa(PIX*, PTAA*, l_int32, l_int32, l_int32)'
leptonica_pageseg.cpp:322: error: at this point in file
leptonica_pageseg.cpp:332: error: 'debug' was not declared in this scope
========

These dependency conflicts need to be fixed in the various ebuilds.  As possible, I'll try to make the needed fixes and post patches (opening new bugs as appropriate).

(In reply to comment #61)
> applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1
> re bug 329063 (undefined SDL_MapRGB).

I found this comment intriguing.  I did a couple of quick searches but never was able to find a patch called something like "no-as-needed" to fix the undefined SDL_MapRGB problem.  If you're reading this, Leho Kraav, could you please provide us a pointer so that we can better follow your Comment 61?  TIA.

Clemmitt
Comment 66 Leho Kraav (:macmaN @lkraav) 2011-11-22 06:02:09 UTC
Created attachment 293377 [details]
ocropus-0.4-r1.ebuild, with no-as-needed

> (In reply to comment #61)
> > applying no-as-needed fixed both openfst-1.2.5 and ocropus-0.4-r1
> > re bug 329063 (undefined SDL_MapRGB).
> 
> I found this comment intriguing.  I did a couple of quick searches but never
> was able to find a patch called something like "no-as-needed" to fix the
> undefined SDL_MapRGB problem.  If you're reading this, Leho Kraav, could you
> please provide us a pointer so that we can better follow your Comment 61?  TIA.

while i've put in zero play with ocropus in the past year and therefore don't really know about the current state of affairs, "no-as-needed" means adding this to your ebuild:

pkg_setup() {
    append-ldflags $(no-as-needed)
}

i'm attaching the ebuild i have in my overlay sitting from last year.
Comment 67 Clemmitt M. Sigler 2011-11-22 15:35:47 UTC
(In reply to comment #66)
> Created attachment 293377 [details]
> ocropus-0.4-r1.ebuild, with no-as-needed
> 
> "no-as-needed" means adding this to your ebuild:
> 
> pkg_setup() {
>     append-ldflags $(no-as-needed)
> }
> 
> i'm attaching the ebuild i have in my overlay sitting from last year.

Ah, the wonders of Gentoo!  On what other system could you fix a package build problem so simply and cleanly?!

Using this ebuild with the no-as-needed setting, ocropus now builds.  I can run 'ocropus page sample.png' (sample.tif, etc.), but the accuracy of conversion to text is quite inferior to the output directly produced by cuneiform-1.1.0, and tesseract-2.04-r1 or tesseract-3.01.  (Caveat: I don't fully understand ocropus.  It appears to be a powerful and somewhat complex beast.  I believe one needs to employ a workflow of the various ocropus commands to do the job properly; it may also need training, of course.)

As I'm an over-perfectionist, I've made a couple of tiny changes to the ebuild Leho just posted.  I'll attach them as a patch against his ebuild next.

With app-text/cuneiform-1.1.0, app-text/tesseract-2.04-r1, and app-text/ocropus-0.4-r1 installed, I've emerged app-text/gscan2pdf-1.0.0-r1 with USE flags adf, unpaper, xdg, cuneiform, tesseract and ocropus.  I've run gscan2pdf; it uses cuneiform and tesseract for OCR, but even with the proper USE flag turned on it doesn't offer ocropus as an OCR back-end option.  (Also, the display within gscan2pdf of the output from cuneiform and tesseract leaves something to be desired, but that may be due to the fact that I don't know what I'm doing.)

Thanks to all for input on this ebuild.  As I'm able, I'll post more info on other ebuilds I've patched and open new bugs on them as appropriate.

Clemmitt
Comment 68 Clemmitt M. Sigler 2011-11-22 15:38:08 UTC
Created attachment 293413 [details, diff]
Small patch to Leho Kraav's ebuild posted in Comment 66.
Comment 69 Leho Kraav (:macmaN @lkraav) 2011-11-22 15:40:31 UTC
you might want to read up on as-needed [1]. i think it was kind of a two way street re who exactly broke who when, but i guess as of today at the very latest it's safe to say that the package is more broken, since mostly everything else now builds with as-needed. other than that, always good to see someone have interest in a obscure but cool and useful package and be willing to put in some work to figure stuff out.

 [1]: http://www.gentoo.org/proj/en/qa/asneeded.xml
Comment 70 Daa Jaa 2011-11-28 11:36:54 UTC
Created attachment 294049 [details]
source files for app-text/image2text-pdf-0.0.1.ebuild

Hello, congratulation Clemmitt for all your efforts.

I spent time to correct the recognition rate problems of ocropus. As you seems to spend resources about this, I show you the result. This is NOT EASY to read, but is useful very often to me.

The ebuild will follow.
Comment 71 Daa Jaa 2011-11-28 11:48:46 UTC
Created attachment 294051 [details]
source files for app-text/image2text-pdf-0.0.1.ebuild

CORRECT TAR FILE. Sorry, anonymizing .tar format is not straightforward.
Comment 72 Daa Jaa 2011-11-28 12:19:14 UTC
Created attachment 294053 [details]
app-text/image2text-pdf-0.0.1.ebuild makes OCRopus, tesseract, and cuneiform vote; Automated in a particular multi-image setting.

Ebuild correcting the recognition rate problems of ocropus, by collecting votes from OCRopus, tesseract, and cuneiform, for every line framed by OCRopus. It is useful very often to me. Coding style is awful: language is /bin/bash, comments are scarce and in French, ...

Warning, only texted with app-text/ocropus-0.4 (as stated in .ebuild), please report here adaptations to newer versions of ocropus.

Put this file in /usr/local/portage/image2text-pdf/image2text-pdf-0.0.1.ebuild and use it with `PORTDIR_OVERLAY=/usr/local/portage emerge -av app-text/image2text-pdf`. You may need to execute `echo '>=app-text/image2text-pdf-0.0 ~amd64' >> /etc/portage/package.keywords` once at first, and also download my /usr/local/portage/app-text/image2text-pdf/Manifest provided as next attachment.
Comment 73 Daa Jaa 2011-11-28 12:22:58 UTC
Created attachment 294055 [details]
/usr/local/portage/app-text/image2text-pdf/Manifest contains checksums for two previous attachments

Checksums you may want to put in /usr/local/portage/app-text/image2text-pdf/Manifest
Comment 74 Daa Jaa 2011-11-28 12:29:08 UTC
Created attachment 294057 [details]
app-text/scan4image2text-pdf-0.0.0.tgz, automation of multi-image setting according to what is typed on USB-linked to HP LaserJet 3020. Needs app-text/image2text-pdf installed on another computer.

I also choose to release shell codes that reads the numerical keyboard of HP LaserJet 3020, and interprets it, making automated scans and then feeding  app-text/image2text-pdf.
app-text/image2text-pdf may be installed on another more powerful computer.

CAREFUL: coding style is awful; I made no ebuild for that. It is useful very often to me.

This may be useful to interpret the usage of /usr/local/bin/.post.scan that is packaged in app-text/image2text-pdf
Comment 75 Daa Jaa 2011-11-28 12:31:44 UTC
I forgot to mention that there is a bounty (1000 $) to make tesseract report for pixel position of every recognized character. Contact me for more information.
Comment 76 Leho Kraav (:macmaN @lkraav) 2012-07-21 08:30:36 UTC
ocropus-0.5.4 has been released in the meanwhile. Anyone here still interested in this?
Comment 77 Pavel Denisov 2012-07-21 13:33:29 UTC
I'm playing with it. It seems to bring custom version of media-libs/iulib, but I haven't got time yet to figure out the differences.
Comment 78 lalebarde 2012-10-23 16:29:56 UTC
Hi all,

OCRopus 0.6 is available. from upstream : 

"It features much simpler installation, fewer dependencies, and improved character recognition rates. This is the first all-Python release."

Installation procedure : 
"
    $ hg clone -r ocropus-0.6 https://code.google.com/p/ocropus
    $ cd ocropus/ocropy
    $ sudo apt-get install $(cat PACKAGES)
    $ python setup.py download_models
    $ sudo python setup.py install
    $ ./run-test
"

I tried to adapt the ebuild 0.4. As far of my small knowledge, I used :

    EHG_REPO_URI="https://code.google.com/p/ocropus/"
    SRC_URI=""

I can create the manifest, but when I emerge, I get the error :

 * ERROR: app-text/ocropus-0.6 failed (unpack phase):
 *   Nothing passed to the 'unpack' command
 * 
 * Call stack:
 *          ebuild.sh, line   85:  Called src_unpack
 *        environment, line 2860:  Called unpack
 *   phase-helpers.sh, line  261:  Called die
 * The specific snippet of code:
 *   	[ -z "$*" ] && die "Nothing passed to the 'unpack' command"

But sure it would be a piece of cake for someone with ebuild knowledge, especially with taking into account it is now a full python app.

If someone can do it.....
Comment 79 Michael Klapproth 2014-07-01 01:04:47 UTC
Created attachment 380022 [details]
ebuild for ocropus 0.7 with source git

ebuild for ocropus 0.7 with source git.

this is my first python/git-ebuild. i'm open for every optimization ;)