Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 815226 - app-text/tesseract - add support for tessdata_fast pretrained languages
Summary: app-text/tesseract - add support for tessdata_fast pretrained languages
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High enhancement (vote)
Assignee: Bernard Cafarelli
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-28 09:55 UTC by Ladislav Zitka
Modified: 2021-12-22 13:09 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ladislav Zitka 2021-09-28 09:55:52 UTC
To support more than ENG language we need to download files from repository:
https://github.com/tesseract-ocr/tessdata_fast

and put it into dir:
/usr/share/tessdata/

As soon as there is over 100 languages I suggest to just new USE=tessdata_fast or tessdata

And let install everything from repository.
Comment 1 Bernard Cafarelli gentoo-dev 2021-12-22 13:09:22 UTC
Hi, this feature has been available for many releases already, tesseract depends on one of the tessdata packages:

RDEPEND="${COMMON_DEPEND}
    || (
        >=app-text/tessdata_fast-4.0.0
        >=app-text/tessdata_best-4.0.0
        >=app-text/tessdata_legacy-4.0.0
    )"

(by default the _fast versions will be used)
And these packages have L10N variables for all possible languages, for example:

% emerge -pv tessdata_fast

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R    ] app-text/tessdata_fast-4.1.0::gentoo  USE="osd" L10N="en fr -af -am -ar -as -az -be -bg -bn -bo -br -bs -ca -ceb -chr -co -cs -cy -da -de -dv -dz -el -eo -es -et -eu -fa -fi -fil -fo -fy -ga -gd -gl -gu -he -hi -hr -ht -hu -hy -id -is -it -iu -ja -jv -ka -kk -km -kmr-Latn -kn -ko -ky -la -lb -lo -lt -lv -mi -mk -ml -mn -mr -ms -mt -my -ne -nl -no -oc -or -pa -pl -ps -pt -qu -ro -ru -sa -sd -si -sk -sl -sq -sr -su -sv -sw -syc -ta -te -tg -th -ti -to -tr -tt -ug -uk -ur -uz -vi -yi -yo -zh" 20 443 KiB

Total: 1 package (1 reinstall), Size of downloads: 20 443 KiB