Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 686944 - app-text/tesseract: bump to 4.0.0
Summary: app-text/tesseract: bump to 4.0.0
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Bernard Cafarelli
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-29 13:12 UTC by Marek Szuba (RETIRED)
Modified: 2019-06-04 14:42 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
tesseract-4.0.0_from_beta4.diff (tesseract-4.0.0_from_beta4.diff,1.23 KB, patch)
2019-05-29 13:12 UTC, Marek Szuba (RETIRED)
Details | Diff
Updated version of the "use system piccolo2d" patch (tesseract-4.00.00-use-system-piccolo2d.patch,516 bytes, patch)
2019-05-29 13:14 UTC, Marek Szuba (RETIRED)
Details | Diff
Do not violate network sandbox by trying to fetch JARs at compile time (tesseract-4.00.00-no-fetch-jars.patch,828 bytes, patch)
2019-05-29 13:15 UTC, Marek Szuba (RETIRED)
Details | Diff
Fix build rule for manpages (tesseract-4.0.0-manpages.patch,1.11 KB, patch)
2019-05-29 18:24 UTC, Chris Mayo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Szuba (RETIRED) archtester gentoo-dev 2019-05-29 13:12:57 UTC
Created attachment 577978 [details, diff]
tesseract-4.0.0_from_beta4.diff

tesseract-4.0.0 was released in late October 2018 yet the latest version we have got in Gentoo is 4.0.0_beta4 (which is another 3 months older) and even that one is masked. Fortunately, a trivial update of the beta4 ebuild is MOSTLY sufficient to get 4.0.0 installed and functional, the exceptions being:
 - USE=scrollview does not work because it now requires another Java library, jaxb-api, which we do not presently have it packaged in Gentoo;
 - for some reason all man pages end up being HTML rather than nroff.

BTW. In the long run it might make sense to add an USE flag allowing the users to choose between the slower, best-quality training data from tessdata which we presently use and the faster, best-value-for-money, integerised training data from tessdata_fast which upstream now recommends most users to use. That said, let us concentrate on simply having 4.0.0 available.
Comment 1 Marek Szuba (RETIRED) archtester gentoo-dev 2019-05-29 13:14:11 UTC
Created attachment 577980 [details, diff]
Updated version of the "use system piccolo2d" patch
Comment 2 Marek Szuba (RETIRED) archtester gentoo-dev 2019-05-29 13:15:50 UTC
Created attachment 577982 [details, diff]
Do not violate network sandbox by trying to fetch JARs at compile time
Comment 3 Chris Mayo 2019-05-29 18:24:41 UTC
Created attachment 578012 [details, diff]
Fix build rule for manpages

man page generation fixed with attached upstream patch:

https://github.com/tesseract-ocr/tesseract/commit/39ed30ad834a43cf403f88158c6db7a96f1bed29

Uses asciidoc and xsltproc.
Comment 4 Chris Mayo 2019-05-29 18:30:57 UTC
> 
> Uses asciidoc and xsltproc.

and app-text/docbook-xsl-stylesheets
Comment 5 Bernard Cafarelli gentoo-dev 2019-06-03 07:17:02 UTC
I hoped to fix bug #663564 before 4.0 release, it is exactly about the training data. But you are right, let's get 4.0 in first, we can make it better later.

Thanks for testing on top of beta4 and both patches, I will test and get it in tree ASAP!
Comment 6 Bernard Cafarelli gentoo-dev 2019-06-04 14:32:05 UTC
Recapping scrollview issues for extra visibility:

scrollview is an additional debug tool in java, that always required patching to use system Java jar files, system install paths, and extra ebuild maintenance.
4.0 now requires a new jar jaxb-api not available in tree for now, and would require new patches.
As it is a seldom used debug feature, I dropped it from 4.0.0 bump (we can alawys restore it later if there is demand for it), simplifying ebuild content and maintenance.
Comment 7 Larry the Git Cow gentoo-dev 2019-06-04 14:42:50 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=98f4080a82c54d39d0a6c646649ca47fe9c7d649

commit 98f4080a82c54d39d0a6c646649ca47fe9c7d649
Author:     Bernard Cafarelli <voyageur@gentoo.org>
AuthorDate: 2019-06-04 14:35:30 +0000
Commit:     Bernard Cafarelli <voyageur@gentoo.org>
CommitDate: 2019-06-04 14:39:53 +0000

    app-text/tesseract: 4.0.0 bump
    
    Thanks marecki and Chris Mayo for the help
    This version does not provide scrollview anymore, see bug for details
    
    Closes: https://bugs.gentoo.org/686944
    Package-Manager: Portage-2.3.67, Repoman-2.3.14
    Signed-off-by: Bernard Cafarelli <voyageur@gentoo.org>

 app-text/tesseract/Manifest                        |   1 +
 .../tesseract/files/tesseract-4.0.0-manpages.patch |  49 ++++++++
 app-text/tesseract/tesseract-4.0.0.ebuild          | 129 +++++++++++++++++++++
 3 files changed, 179 insertions(+)