Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 892485 - app-text/tessdata_fast and app-text/tessdata_best: add option to install language-neutral "script" training data
Summary: app-text/tessdata_fast and app-text/tessdata_best: add option to install lang...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Bernard Cafarelli
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-29 06:40 UTC by GB
Modified: 2023-01-29 20:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description GB 2023-01-29 06:40:45 UTC
Tesseract's repos provide not only language-specific training data, but also script-specific (i.e. Latin, Cyrillic etc.) This can be useful when OCRing text that has unconventional diacritics, e.g. German text with accents.

The mapping via L10N is nice, but even activating all flags doesn't install all available tessdata.

Reproducible: Always

Steps to Reproduce:
1. Try to install tessdata_best or tessdata_fast
Actual Results:  
USE flags (via L10N) only allow installation of language-specific data.

Expected Results:  
There should be a way to install script-specific training data.

The script-specific files are in https://github.com/tesseract-ocr/tessdata/tree/main/script and https://github.com/tesseract-ocr/tessdata_best/tree/main/script