Released on September 30th. Source tarball, and lots of language data file updates. Thanks! --- Tesseract release notes Sep 30 2010 - V3.00¶ * Preparations for thread safety: o Changed TessBaseAPI methods to be non-static o Created a class hierarchy for the directories to hold instance data, and began moving code into the classes. o Moved thresholding code to a separate class. * Added major new page layout analysis module. * Added HOCR output. * Added Leptonica as main image I/O and handling. Currently optional, but in future releases linking with Leptonica will be mandatory. * Ambiguity table rewritten to allow definite replacements in place of fix_quotes. * Added TessdataManager to combine data files into a single file. * Some dead code deleted. * VC++6 no longer supported. It can't cope with the use of templates. * Many more languages added. * Doxygenation of most of the function header comments.
Created attachment 253431 [details] Tesseract v3.00 (rev 1)
Hrm. Starts compile, but src_prepare gets an error: /home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]' I'll post the environment.
Created attachment 253441 [details] /home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]' Attached environment. Re: ">>> Preparing source in /home/jesse/tmp/portage/app-text/tesseract-3.00/work/tesseract-3.00 ... /home/jesse/tmp/portage/app-text/tesseract-3.00/temp/environment: line 2350: [: missing `]' >>> Source prepared."
Probably related to the environment/src_prep problem: --- $ tesseract /usr/share/doc/tesseract-3.00/examples/phototest.tif boo.out Error openning data file /usr/share/tessdata/eng.traineddata --- Indeed, equery f tesseract does not show this file. I'm thinking that the file is set by "en", but the failed test line omits installing the language files.
the error is in this line: if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && ! use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && ! use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en ]; then you need to get rid of the square brackets there (and it's still not quite what it should be at that point...)
(In reply to comment #5) > the error is in this line: > > if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use > sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && ! > use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk > && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && ! > use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en > ]; then > Okay I will look at. Data files will delete the installation in place of the copied. > you need to get rid of the square brackets there (and it's still not quite what > it should be at that point...) > (In reply to comment #5) > the error is in this line: > > if [ ! use zh && ! use in && ! use sv && ! use ro && ! use sl && ! use > sr && ! use tl && ! use tr && ! use hu && ! use fi && ! use it && ! use nl && ! > use no && ! use ja && ! use vi && ! use es && ! use uk && ! use fr && ! use sk > && ! use ko && ! use el && ! use ru && ! use pt && ! use bg && ! use lv && ! > use lt && ! use pl && ! use de && ! use da && ! use cs && ! use ca && ! use en > ]; then > > you need to get rid of the square brackets there (and it's still not quite what > it should be at that point...) >
> Okay I will look at. > > Data files will delete the installation in place of the copied. > Sorry, what do you mean here?
(In reply to comment #7) > > Okay I will look at. > > > > Data files will delete the installation in place of the copied. > > > > Sorry, what do you mean here? > There are issues with language packs will be delete when it install.
Created attachment 253477 [details] Tesseract v3.00 (rev 2) Works with TIF image. Rasted will be reviewed during the week -Note- http://code.google.com/p/tesseract-ocr/wiki/ReadMe Without additional libraries, Tesseract can only read uncompressed TIFF. (And some versions of BMP) Upto version 2.04, you can add libtiff-dev. See the FAQ question on compressed TIFF for installation instructions. Version 3.00 will support additional formats via Leptonica, but requires more libraries to be added. sudo apt-get install libpng12-dev sudo apt-get install libjpeg62-dev sudo apt-get install libtiff4-dev sudo apt-get install zlibg-dev You also need to install leptonica. There is an apt-get package (name unknown), or the sources are at http://www.leptonica.org/ The instructions at http://www.leptonica.org/source/README.html are clear, but basically it is the usual
Created attachment 255947 [details] Tesseract v3.00 ebuild (rev 3) Support for traditional Chinese and Simplified Chinese should be two different use flags, IMHO. Same for Danish and Danish (Fraktur). Added use flag de_frak for German (Fraktur).
Created attachment 255949 [details] metadata for Tesseract v3.00 with use descriptions added use description for supported languages to metadata file
There is a solid ebuild for Leptonica in bug #297101. Please add this as a dependency.
Not seen pkg_pretend before. That's new in EAPI 4 but EAPI is still set to 2.
*** Bug 359349 has been marked as a duplicate of this bug. ***
Created attachment 293463 [details] tesseract-3.01.ebuild This ebuild is based not on Simon Wagner's (see Comment 10) but on Nirbheek Chauhan's, see http://gpo.zugaina.org/Overlays/nirbheek/app-text/tesseract . I made a couple of changes for ver. 3.01 to improve dependency and fix a build problem. Clemmitt
When is this going to hit the portage?
(In reply to comment #15) > Created attachment 293463 [details] > tesseract-3.01.ebuild > > This ebuild is based not on Simon Wagner's (see Comment 10) but on Nirbheek > Chauhan's, see http://gpo.zugaina.org/Overlays/nirbheek/app-text/tesseract . I > made a couple of changes for ver. 3.01 to improve dependency and fix a build > problem. > > Clemmitt I try to install tesseract using your tesseract-3.01.ebuild but I have: Failed Running automake After adding two strings to ebuild before eautomake: mkdir m4 eautoreconf || die "eautoreconf failed" I have successful compilation. I have installed =sys-devel/automake-1.11.2-r1
the unversioned traineddata-files are problematic. They can change w/o warning, and then all checksums are incorrect. I wouldn't do that! But I try to adapt your ebuild and include only languages with versioned tarballs. Other languages should be downloaded manually by the user. nonetheless, it's time to bump!
(In reply to comment #18) > the unversioned traineddata-files are problematic. They can change w/o > warning, and then all checksums are incorrect. I wouldn't do that! AFAICT, when they are bumped, they are converted into versioned files (as per all 3.01 versions). Why not include all now. Either way, when (if) the unversioned fles are bumped tesseracts ebuild will have to be fixed. If you include the unversioned versions now, users will be able to take advantage of them easier.
Just tried the 3.01 ebuild from Clemmitt M. Sigler. It seems to be working fine.
(In reply to comment #17) > (In reply to comment #15) > > Created attachment 293463 [details] > > tesseract-3.01.ebuild > > I try to install tesseract using your tesseract-3.01.ebuild but I have: > > Failed Running automake > > After adding two strings to ebuild before eautomake: > > mkdir m4 > eautoreconf || die "eautoreconf failed" > > I have successful compilation. > > I have installed =sys-devel/automake-1.11.2-r1 I concur. This ebuild did not work for me. It gave "Failed Running automake" for me as well. Victor's proposed fix solved it for me. Attaching revised ebuild.
Created attachment 311467 [details] tesseract-3.01-r1.ebuild
Comment on attachment 311467 [details] tesseract-3.01-r1.ebuild ebuild fixing "Failed running automake" build error
I was going to write one ebuild when I saw this report. Some notes : * rename to tesseract-ocr to match upstream name ? * AFAICT there is not 3.01-r1, only 3.01 * I would write DEPEND as : DEPEND="media-libs/leptonica[zlib] media-libs/leptonica[tiff?] media-libs/leptonica[jpeg?] media-libs/leptonica[png?] media-libs/leptonica[webp?]" = leptonica[zlib] mandatory, then according to tesseract select useflags * we may prefer 3.01 data files for supported languages (RtL) so (if ebuild named tesseract-ocr-3.01.ebuild) : SRC_URI="http://tesseract-ocr.googlecode.com/files/tesseract-${PV}.tar.gz http://tesseract-ocr.googlecode.com/files/${P}.eng.tar.gz linguas_ar? ( http://tesseract-ocr.googlecode.com/files/${P}.ara.tar.gz ) linguas_he? ( http://tesseract-ocr.googlecode.com/files/${P}.heb.tar.gz http://tesseract-ocr.googlecode.com/files/${P}.heb-com.tar.gz ) linguas_hi? ( http://tesseract-ocr.googlecode.com/files/${P}.hin.tar.gz ) linguas_sk? ( http://tesseract-ocr.googlecode.com/files/${P}.slk-frak.tar.gz ) linguas_th? ( http://tesseract-ocr.googlecode.com/files/${P}.tha.tar.gz )" * ./configure : * I don't see a gettext option * there are no libtiff option anymore (leptonica) * disable-graphics by default (unless all the dep' [1] are provided) * why disable-dependency-tracking ?? * the || die are not needed [1] https://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging
I disagree with your DEPEND suggestion. We don't do this with SDL or ffmpeg or...
(In reply to comment #25) > I disagree with your DEPEND suggestion. We don't do this with SDL or ffmpeg > or... You're right, the following is better: DEPEND="media-libs/leptonica[zlib tiff? jpeg? png? webp?]" (like in mpd, mplayer, gegl, virtual/jpeg...)
https://code.google.com/p/tesseract-ocr/source/detail?r=686 seems hard to backport. So Google ScrollView support (--enable-graphics) seems mandatory for now.
Created attachment 312061 [details, diff] backport of r686 Let's try this one, derived from https://code.google.com/p/tesseract-ocr/source/detail?r=686#
Created attachment 312063 [details] yet another ebuild : tesseract-ocr-3.01.ebuild Forget the above backport attempt But try this ebuild instead. Problems so far: * --enable-graphics mandatory * linguas and package may be handled in a better fashion, thats bug #287373 which may be solved by (re)introducing linguas_en * purely native language (no english datafiles has not been tested) * webp not tested * can't get +scrollview to default to "on" * not sure $(sed -i 's!po/Makefile.in!!' configure.ac) is even needed at all
Created attachment 318032 [details] tesseract-ocr-9999.ebuild I needed to try out 3.02, which isn't out yet, so I made an SVN ebuild. The language files are in the repository so they don't need to be downloaded separately and I managed to do some Bash trickery to handle the installation of these. I also made a couple of other minor improvements.
Created attachment 320564 [details] tesseract-ocr-9999.ebuild Just removing a redundant insinto.
Created attachment 320974 [details] tesseract-3.0.1.ebuild (doc and osd flag added) Another proposed ebuild, this one support doc and osd. configure otpion picked from Attachment 312063 [details]
Ok, can we haz verzion bump plz?!?!?!?! This bug is open for nearly two years now, which means that the package is also out in version 3 for two years... YAGF makes profound use of the new features and language packs, I contributed some patches to YAGF in order to harness the powers of the frak packages. (I also added a complete german translation.) So, do this: 1. take the latest ebuild attached to this bug 2. change eautomake to eautoreconf (as the autogen.sh of the tarball hints) 3. add swe-frak.traineddata.gz to the languages 4. version bump YAGF and be happy 5. commit to tree kthxby PS: feel free to add me as proxy maintainer to this package
Created attachment 323630 [details] app-text/tesseract-3.01.ebuild (now with automake fix and swe-frak traineddata) Modification of the previously attached ebuild, just added: 1. swe-frak.traineddata.gz 2. changed to eautoreconf
We might as well wait for 3.02 now because its release is imminent. Zdenko Podobný told me he'd ask Ray Smith's permission to do a community release after the recent patches (which I helped with) are finalised. No one seems to have anything more to add regarding the patches so I'm expecting them to be merged and a release to happen any day now. I'll give him a poke later.
> PS: feel free to add me as proxy maintainer to this package If you are willing to do proxy maintenance it would be best to write mail to proxy-maint@gentoo.org.
(In reply to comment #36) > > PS: feel free to add me as proxy maintainer to this package > > If you are willing to do proxy maintenance it would be best to write mail to > proxy-maint@gentoo.org. I'm also interested in this package. I'll check out the ebuild and maybe do the bump if nothing happens in the next days. I don't think we should wait for 3.02 since we can always do another bump (which should be simple).
From https://groups.google.com/d/msg/tesseract-dev/KGkf_oqO3xU/1sDfNiV7T7AJ... "Ray sent me info this week about fixing some open issues (you can see activities on issues at feed [1]) for 3.02 release, so I expect 3.02 release soon. Nevertheless - If Gentoo is under time pressure I would suggest to go Debian/Ubuntu way - take current 3.02 source and use it instead of 3.01. IMO 3.02 is tested quite well."
Hi. I bumped this package in the tree. Thank you all for your contributions. Please check the new version. I added the_mgt as the maintainer to metadata xml. This means that bugs may be assigned to you (CCed to me). Let's see how this goes. Please open new bugs for new issues. + 09 Oct 2012; Thomas Kahle <tomka@gentoo.org> +tesseract-3.01.ebuild, + metadata.xml: + Bump to 3.01 per bug 343211. Thanks to all contributors.