Created attachment 577978 [details, diff] tesseract-4.0.0_from_beta4.diff tesseract-4.0.0 was released in late October 2018 yet the latest version we have got in Gentoo is 4.0.0_beta4 (which is another 3 months older) and even that one is masked. Fortunately, a trivial update of the beta4 ebuild is MOSTLY sufficient to get 4.0.0 installed and functional, the exceptions being: - USE=scrollview does not work because it now requires another Java library, jaxb-api, which we do not presently have it packaged in Gentoo; - for some reason all man pages end up being HTML rather than nroff. BTW. In the long run it might make sense to add an USE flag allowing the users to choose between the slower, best-quality training data from tessdata which we presently use and the faster, best-value-for-money, integerised training data from tessdata_fast which upstream now recommends most users to use. That said, let us concentrate on simply having 4.0.0 available.
Created attachment 577980 [details, diff] Updated version of the "use system piccolo2d" patch
Created attachment 577982 [details, diff] Do not violate network sandbox by trying to fetch JARs at compile time
Created attachment 578012 [details, diff] Fix build rule for manpages man page generation fixed with attached upstream patch: https://github.com/tesseract-ocr/tesseract/commit/39ed30ad834a43cf403f88158c6db7a96f1bed29 Uses asciidoc and xsltproc.
> > Uses asciidoc and xsltproc. and app-text/docbook-xsl-stylesheets
I hoped to fix bug #663564 before 4.0 release, it is exactly about the training data. But you are right, let's get 4.0 in first, we can make it better later. Thanks for testing on top of beta4 and both patches, I will test and get it in tree ASAP!
Recapping scrollview issues for extra visibility: scrollview is an additional debug tool in java, that always required patching to use system Java jar files, system install paths, and extra ebuild maintenance. 4.0 now requires a new jar jaxb-api not available in tree for now, and would require new patches. As it is a seldom used debug feature, I dropped it from 4.0.0 bump (we can alawys restore it later if there is demand for it), simplifying ebuild content and maintenance.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=98f4080a82c54d39d0a6c646649ca47fe9c7d649 commit 98f4080a82c54d39d0a6c646649ca47fe9c7d649 Author: Bernard Cafarelli <voyageur@gentoo.org> AuthorDate: 2019-06-04 14:35:30 +0000 Commit: Bernard Cafarelli <voyageur@gentoo.org> CommitDate: 2019-06-04 14:39:53 +0000 app-text/tesseract: 4.0.0 bump Thanks marecki and Chris Mayo for the help This version does not provide scrollview anymore, see bug for details Closes: https://bugs.gentoo.org/686944 Package-Manager: Portage-2.3.67, Repoman-2.3.14 Signed-off-by: Bernard Cafarelli <voyageur@gentoo.org> app-text/tesseract/Manifest | 1 + .../tesseract/files/tesseract-4.0.0-manpages.patch | 49 ++++++++ app-text/tesseract/tesseract-4.0.0.ebuild | 129 +++++++++++++++++++++ 3 files changed, 179 insertions(+)