Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 343211 - app-text/tesseract 3.00 released
Summary: app-text/tesseract 3.00 released
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High normal with 4 votes (vote)
Assignee: Patrick McLean
URL: http://code.google.com/p/tesseract-oc...
Whiteboard:
Keywords:
: 359349 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-10-29 06:47 UTC by Jesse Adelman
Modified: 2012-10-09 04:15 UTC (History)
18 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Tesseract v3.00 (rev 1) (tesseract-3.00.ebuild,3.97 KB, text/plain)
2010-11-06 21:21 UTC, Henrik
Details
/home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]' (environment,87.39 KB, text/plain)
2010-11-06 22:27 UTC, Jesse Adelman
Details
Tesseract v3.00 (rev 2) (tesseract-3.00.ebuild,4.13 KB, text/plain)
2010-11-07 02:32 UTC, Henrik
Details
Tesseract v3.00 ebuild (rev 3) (tesseract-3.00.ebuild,4.30 KB, text/plain)
2010-11-30 14:49 UTC, simon
Details
metadata for Tesseract v3.00 with use descriptions (metadata.xml,2.61 KB, text/plain)
2010-11-30 14:51 UTC, simon
Details
tesseract-3.01.ebuild (tesseract-3.01.ebuild,3.88 KB, text/plain)
2011-11-23 00:46 UTC, Clemmitt M. Sigler
Details
tesseract-3.01-r1.ebuild (tesseract-3.01-r1.ebuild,3.98 KB, text/plain)
2012-05-12 01:15 UTC, David
Details
backport of r686 (tesseract-3.01-r686-fix-graphics_disabled-build.patch,23.21 KB, patch)
2012-05-16 21:31 UTC, Raphaël Droz
Details | Diff
yet another ebuild : tesseract-ocr-3.01.ebuild (tesseract-ocr-3.01.ebuild,3.83 KB, text/plain)
2012-05-16 22:11 UTC, Raphaël Droz
Details
tesseract-ocr-9999.ebuild (tesseract-ocr-9999.ebuild,2.04 KB, text/plain)
2012-07-12 20:35 UTC, James Le Cuirot
Details
tesseract-ocr-9999.ebuild (tesseract-ocr-9999.ebuild,2.02 KB, text/plain)
2012-08-06 15:09 UTC, James Le Cuirot
Details
tesseract-3.0.1.ebuild (doc and osd flag added) (tesseract-3.01.ebuild,3.89 KB, text/plain)
2012-08-11 08:11 UTC, Samuel Bauer
Details
app-text/tesseract-3.01.ebuild (now with automake fix and swe-frak traineddata) (tesseract-3.01.ebuild,3.94 KB, text/plain)
2012-09-13 00:09 UTC, the_mgt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Adelman 2010-10-29 06:47:43 UTC
Released on September 30th. Source tarball, and lots of language data file updates. Thanks!

---
Tesseract release notes Sep 30 2010 - V3.00¶

    * Preparations for thread safety:
          o Changed TessBaseAPI methods to be non-static
          o Created a class hierarchy for the directories to hold instance data, and began moving code into the classes.
          o Moved thresholding code to a separate class. 
    * Added major new page layout analysis module.
    * Added HOCR output.
    * Added Leptonica as main image I/O and handling. Currently optional, but in future releases linking with Leptonica will be mandatory.
    * Ambiguity table rewritten to allow definite replacements in place of fix_quotes.
    * Added TessdataManager to combine data files into a single file.
    * Some dead code deleted.
    * VC++6 no longer supported. It can't cope with the use of templates.
    * Many more languages added.
    * Doxygenation of most of the function header comments.
Comment 1 Henrik 2010-11-06 21:21:32 UTC
Created attachment 253431 [details]
Tesseract v3.00 (rev 1)
Comment 2 Jesse Adelman 2010-11-06 22:25:21 UTC
Hrm. Starts compile, but src_prepare gets an error:

/home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]'

I'll post the environment.
Comment 3 Jesse Adelman 2010-11-06 22:27:14 UTC
Created attachment 253441 [details]
/home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]'

Attached environment. Re:

">>> Preparing source in /home/jesse/tmp/portage/app-text/tesseract-3.00/work/tesseract-3.00 ...
/home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]'
>>> Source prepared."
Comment 4 Jesse Adelman 2010-11-06 22:48:40 UTC
Probably related to the environment/src_prep problem:
---
$ tesseract /usr/share/doc/tesseract-3.00/examples/phototest.tif boo.out
Error openning data file /usr/share/tessdata/eng.traineddata
---

Indeed, equery f tesseract does not show this file. I'm thinking that the file is set by "en", but the failed test line omits installing the language files. 
Comment 5 Jonathan Callen (RETIRED) gentoo-dev 2010-11-06 23:24:23 UTC
the error is in this line:

	if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && ! use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && ! use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en ]; then

you need to get rid of the square brackets there (and it's still not quite what it should be at that point...)
Comment 6 Henrik 2010-11-06 23:53:15 UTC
(In reply to comment #5)
> the error is in this line:
> 
>         if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use
> sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && !
> use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk
> && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && !
> use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en
> ]; then
> 
Okay I will look at.

Data files will delete the installation in place of the copied.



> you need to get rid of the square brackets there (and it's still not quite what
> it should be at that point...)
> 

(In reply to comment #5)
> the error is in this line:
> 
>         if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use
> sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && !
> use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk
> && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && !
> use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en
> ]; then
> 
> you need to get rid of the square brackets there (and it's still not quite what
> it should be at that point...)
> 

Comment 7 Jesse Adelman 2010-11-06 23:57:34 UTC
> Okay I will look at.
> 
> Data files will delete the installation in place of the copied.
> 

Sorry, what do you mean here?
Comment 8 Henrik 2010-11-07 00:09:12 UTC
(In reply to comment #7)
> > Okay I will look at.
> > 
> > Data files will delete the installation in place of the copied.
> > 
> 
> Sorry, what do you mean here?
> 

There are issues with language packs will be delete when it install.
Comment 9 Henrik 2010-11-07 02:32:13 UTC
Created attachment 253477 [details]
Tesseract v3.00 (rev 2)

Works with TIF image. Rasted will be reviewed during the week

-Note- http://code.google.com/p/tesseract-ocr/wiki/ReadMe
Without additional libraries, Tesseract can only read uncompressed TIFF. (And some versions of BMP) Upto version 2.04, you can add libtiff-dev. See the FAQ  question on compressed TIFF for installation instructions. Version 3.00 will support additional formats via Leptonica, but requires more libraries to be added.

sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlibg-dev

You also need to install leptonica. There is an apt-get package (name unknown), or the sources are at http://www.leptonica.org/ The instructions at http://www.leptonica.org/source/README.html are clear, but basically it is the usual
Comment 10 simon 2010-11-30 14:49:24 UTC
Created attachment 255947 [details]
Tesseract v3.00 ebuild (rev 3)

Support for traditional Chinese and Simplified Chinese should be two different use flags, IMHO.
Same for Danish and Danish (Fraktur).
Added use flag de_frak for German (Fraktur).
Comment 11 simon 2010-11-30 14:51:26 UTC
Created attachment 255949 [details]
metadata for Tesseract v3.00 with use descriptions

added use description for supported languages to metadata file
Comment 12 James Le Cuirot gentoo-dev 2011-03-15 11:58:07 UTC
There is a solid ebuild for Leptonica in bug #297101. Please add this as a dependency.
Comment 13 James Le Cuirot gentoo-dev 2011-03-25 21:52:27 UTC
Not seen pkg_pretend before. That's new in EAPI 4 but EAPI is still set to 2.
Comment 14 Thomas Kahle (RETIRED) gentoo-dev 2011-09-14 09:46:16 UTC
*** Bug 359349 has been marked as a duplicate of this bug. ***
Comment 15 Clemmitt M. Sigler 2011-11-23 00:46:59 UTC
Created attachment 293463 [details]
tesseract-3.01.ebuild

This ebuild is based not on Simon Wagner's (see Comment 10) but on Nirbheek Chauhan's, see http://gpo.zugaina.org/Overlays/nirbheek/app-text/tesseract .  I made a couple of changes for ver. 3.01 to improve dependency and fix a build problem.

Clemmitt
Comment 16 Sushant Sinha 2012-01-04 15:59:10 UTC
When is this going to hit the portage?
Comment 17 Viktor Yu. Kovalskii 2012-01-26 13:44:23 UTC
(In reply to comment #15)
> Created attachment 293463 [details]
> tesseract-3.01.ebuild
> 
> This ebuild is based not on Simon Wagner's (see Comment 10) but on Nirbheek
> Chauhan's, see http://gpo.zugaina.org/Overlays/nirbheek/app-text/tesseract .  I
> made a couple of changes for ver. 3.01 to improve dependency and fix a build
> problem.
> 
> Clemmitt

I try to install tesseract using your tesseract-3.01.ebuild but I have:

Failed Running automake

After adding two strings to ebuild before eautomake:

mkdir m4
eautoreconf || die "eautoreconf failed"

I have successful compilation.

I have installed =sys-devel/automake-1.11.2-r1
Comment 18 Stefan Briesenick (RETIRED) gentoo-dev 2012-02-16 01:59:41 UTC
the unversioned traineddata-files are problematic. They can change w/o warning, and then all checksums are incorrect. I wouldn't do that!

But I try to adapt your ebuild and include only languages with versioned tarballs. Other languages should be downloaded manually by the user.

nonetheless, it's time to bump!
Comment 19 Rodrigo Severo 2012-04-16 20:00:44 UTC
(In reply to comment #18)
> the unversioned traineddata-files are problematic. They can change w/o
> warning, and then all checksums are incorrect. I wouldn't do that!

AFAICT, when they are bumped, they are converted into versioned files (as per all 3.01 versions). Why not include all now. Either way, when (if) the unversioned fles are bumped tesseracts ebuild will have to be fixed. If you include the unversioned versions now, users will be able to take advantage of them easier.
Comment 20 Rodrigo Severo 2012-04-16 20:06:34 UTC
Just tried the 3.01 ebuild from Clemmitt M. Sigler. It seems to be working fine.
Comment 21 David 2012-05-12 01:13:16 UTC
(In reply to comment #17)
> (In reply to comment #15)
> > Created attachment 293463 [details]
> > tesseract-3.01.ebuild
> 
> I try to install tesseract using your tesseract-3.01.ebuild but I have:
> 
> Failed Running automake
> 
> After adding two strings to ebuild before eautomake:
> 
> mkdir m4
> eautoreconf || die "eautoreconf failed"
> 
> I have successful compilation.
> 
> I have installed =sys-devel/automake-1.11.2-r1

I concur.  This ebuild did not work for me.  It gave "Failed Running automake" for me as well.  

Victor's proposed fix solved it for me.  Attaching revised ebuild.
Comment 22 David 2012-05-12 01:15:22 UTC
Created attachment 311467 [details]
tesseract-3.01-r1.ebuild
Comment 23 David 2012-05-12 01:16:57 UTC
Comment on attachment 311467 [details]
tesseract-3.01-r1.ebuild

ebuild fixing "Failed running automake" build error
Comment 24 Raphaël Droz 2012-05-16 19:03:56 UTC
I was going to write one ebuild when I saw this report.
Some notes :

* rename to tesseract-ocr to match upstream name ?
* AFAICT there is not 3.01-r1, only 3.01

* I would write DEPEND as :
DEPEND="media-libs/leptonica[zlib]
	media-libs/leptonica[tiff?]
	media-libs/leptonica[jpeg?]
	media-libs/leptonica[png?]
	media-libs/leptonica[webp?]"

= leptonica[zlib] mandatory, then according to tesseract select useflags

* we may prefer 3.01 data files for supported languages (RtL) so (if ebuild named tesseract-ocr-3.01.ebuild) :

SRC_URI="http://tesseract-ocr.googlecode.com/files/tesseract-${PV}.tar.gz
	http://tesseract-ocr.googlecode.com/files/${P}.eng.tar.gz
	linguas_ar? ( http://tesseract-ocr.googlecode.com/files/${P}.ara.tar.gz )
	linguas_he? (
		http://tesseract-ocr.googlecode.com/files/${P}.heb.tar.gz
		http://tesseract-ocr.googlecode.com/files/${P}.heb-com.tar.gz
	)
	linguas_hi? ( http://tesseract-ocr.googlecode.com/files/${P}.hin.tar.gz )
	linguas_sk? ( http://tesseract-ocr.googlecode.com/files/${P}.slk-frak.tar.gz )
	linguas_th? ( http://tesseract-ocr.googlecode.com/files/${P}.tha.tar.gz )"

* ./configure :
 * I don't see a gettext option
 * there are no libtiff option anymore (leptonica)
 * disable-graphics by default (unless all the dep' [1] are provided)
 * why disable-dependency-tracking ??

* the || die are not needed


[1] https://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging
Comment 25 James Le Cuirot gentoo-dev 2012-05-16 19:48:26 UTC
I disagree with your DEPEND suggestion. We don't do this with SDL or ffmpeg or...
Comment 26 Raphaël Droz 2012-05-16 20:25:20 UTC
(In reply to comment #25)
> I disagree with your DEPEND suggestion. We don't do this with SDL or ffmpeg
> or...

You're right, the following is better:
DEPEND="media-libs/leptonica[zlib tiff? jpeg? png? webp?]"
(like in mpd, mplayer, gegl, virtual/jpeg...)
Comment 27 Raphaël Droz 2012-05-16 20:46:03 UTC
https://code.google.com/p/tesseract-ocr/source/detail?r=686 seems hard to backport. So Google ScrollView support (--enable-graphics) seems mandatory for now.
Comment 28 Raphaël Droz 2012-05-16 21:31:28 UTC
Created attachment 312061 [details, diff]
backport of r686

Let's try this one, derived from https://code.google.com/p/tesseract-ocr/source/detail?r=686#
Comment 29 Raphaël Droz 2012-05-16 22:11:51 UTC
Created attachment 312063 [details]
yet another ebuild : tesseract-ocr-3.01.ebuild

Forget the above backport attempt
But try this ebuild instead.

Problems so far:
* --enable-graphics mandatory
* linguas and package may be handled in a better fashion, thats bug #287373 which may be solved by (re)introducing linguas_en
* purely native language (no english datafiles has not been tested)
* webp not tested
* can't get +scrollview to default to "on"
* not sure $(sed -i 's!po/Makefile.in!!' configure.ac) is even needed at all
Comment 30 James Le Cuirot gentoo-dev 2012-07-12 20:35:48 UTC
Created attachment 318032 [details]
tesseract-ocr-9999.ebuild

I needed to try out 3.02, which isn't out yet, so I made an SVN ebuild. The language files are in the repository so they don't need to be downloaded separately and I managed to do some Bash trickery to handle the installation of these. I also made a couple of other minor improvements.
Comment 31 James Le Cuirot gentoo-dev 2012-08-06 15:09:30 UTC
Created attachment 320564 [details]
tesseract-ocr-9999.ebuild

Just removing a redundant insinto.
Comment 32 Samuel Bauer 2012-08-11 08:11:03 UTC
Created attachment 320974 [details]
tesseract-3.0.1.ebuild (doc and osd flag added)

Another proposed ebuild, this one support doc and osd.
configure otpion picked from Attachment 312063 [details]
Comment 33 the_mgt 2012-09-12 23:41:29 UTC
Ok, can we haz verzion bump plz?!?!?!?!

This bug is open for nearly two years now, which means that the package is also out in version 3 for two years... YAGF makes profound use of the new features and language packs, I contributed some patches to YAGF in order to harness the powers of the frak packages. (I also added a complete german translation.)

So, do this:
1. take the latest ebuild attached to this bug
2. change eautomake to eautoreconf (as the autogen.sh of the tarball hints)
3. add swe-frak.traineddata.gz to the languages
4. version bump YAGF and be happy
5. commit to tree

kthxby

PS: feel free to add me as proxy maintainer to this package
Comment 34 the_mgt 2012-09-13 00:09:34 UTC
Created attachment 323630 [details]
app-text/tesseract-3.01.ebuild (now with automake fix and swe-frak traineddata)

Modification of the previously attached ebuild, just added:
1. swe-frak.traineddata.gz
2. changed to eautoreconf
Comment 35 James Le Cuirot gentoo-dev 2012-09-13 14:12:33 UTC
We might as well wait for 3.02 now because its release is imminent. Zdenko Podobný told me he'd ask Ray Smith's permission to do a community release after the recent patches (which I helped with) are finalised. No one seems to have anything more to add regarding the patches so I'm expecting them to be merged and a release to happen any day now. I'll give him a poke later.
Comment 36 pavel sanda 2012-09-14 11:32:06 UTC
> PS: feel free to add me as proxy maintainer to this package

If you are willing to do proxy maintenance it would be best to write mail to proxy-maint@gentoo.org.
Comment 37 Thomas Kahle (RETIRED) gentoo-dev 2012-09-20 22:29:20 UTC
(In reply to comment #36)
> > PS: feel free to add me as proxy maintainer to this package
> 
> If you are willing to do proxy maintenance it would be best to write mail to
> proxy-maint@gentoo.org.

I'm also interested in this package.  I'll check out the ebuild and maybe do the bump if nothing happens in the next days.  I don't think we should wait for 3.02 since we can always do another bump (which should be simple).
Comment 38 James Le Cuirot gentoo-dev 2012-09-21 09:57:02 UTC
From https://groups.google.com/d/msg/tesseract-dev/KGkf_oqO3xU/1sDfNiV7T7AJ...

"Ray sent me info this week about fixing some open issues (you can see activities on issues at feed [1]) for 3.02 release, so I expect 3.02 release soon. 

Nevertheless - If Gentoo is under time pressure I would suggest to go Debian/Ubuntu way - take current 3.02 source and use it instead of 3.01. IMO 3.02 is tested quite well."
Comment 39 Thomas Kahle (RETIRED) gentoo-dev 2012-10-09 04:15:12 UTC
Hi.  I bumped this package in the tree.  Thank you all for your contributions.
Please check the new version.  I added the_mgt as the maintainer to metadata xml.  This means that bugs may be assigned to you (CCed to me).  Let's see how this goes.  Please open new bugs for new issues.

+  09 Oct 2012; Thomas Kahle <tomka@gentoo.org> +tesseract-3.01.ebuild,
+  metadata.xml:
+  Bump to 3.01 per bug 343211. Thanks to all contributors.