To support more than ENG language we need to download files from repository: https://github.com/tesseract-ocr/tessdata_fast and put it into dir: /usr/share/tessdata/ As soon as there is over 100 languages I suggest to just new USE=tessdata_fast or tessdata And let install everything from repository.
Hi, this feature has been available for many releases already, tesseract depends on one of the tessdata packages: RDEPEND="${COMMON_DEPEND} || ( >=app-text/tessdata_fast-4.0.0 >=app-text/tessdata_best-4.0.0 >=app-text/tessdata_legacy-4.0.0 )" (by default the _fast versions will be used) And these packages have L10N variables for all possible languages, for example: % emerge -pv tessdata_fast These are the packages that would be merged, in order: Calculating dependencies... done! [ebuild R ] app-text/tessdata_fast-4.1.0::gentoo USE="osd" L10N="en fr -af -am -ar -as -az -be -bg -bn -bo -br -bs -ca -ceb -chr -co -cs -cy -da -de -dv -dz -el -eo -es -et -eu -fa -fi -fil -fo -fy -ga -gd -gl -gu -he -hi -hr -ht -hu -hy -id -is -it -iu -ja -jv -ka -kk -km -kmr-Latn -kn -ko -ky -la -lb -lo -lt -lv -mi -mk -ml -mn -mr -ms -mt -my -ne -nl -no -oc -or -pa -pl -ps -pt -qu -ro -ru -sa -sd -si -sk -sl -sq -sr -su -sv -sw -syc -ta -te -tg -th -ti -to -tr -tt -ug -uk -ur -uz -vi -yi -yo -zh" 20 443 KiB Total: 1 package (1 reinstall), Size of downloads: 20 443 KiB