If tesseract is built without 'linguas_en', /usr/share/tessdata/eng.unicharset is not usable but tesseract will still try to use /usr/share/tessdata/eng.unicharset instead of choosed language. For example, if tesseract has been merged with USE="-linguas_en linguas_fr", tesseract will fail with this message: "Unable to load unicharset file /usr/share/tessdata/eng.unicharset" If you want to be able to use tesseract, you will have to add "-l fr" at the end of the command. That's pretty annoying for scripts trying to use tesseract, like net-misc/plowshare. I think the best solution should be to always install linguas_en or found a way with upstream to fallback to a working file when eng isn't working. Versions: tested with >=2.03
Patrick, I can fix this issue. Just let me know if you agree with the way I want to fix it (see previous comment).
That's nearly 3 months this bug has been opened. I will wait a few days then I will fix it myself. Please, let me know if you don't want me to touch your package.
Fixed in 2.04-r1. I've added myself as a maintainer so i'm re-assigning this bug to myself. The bug will be closed when 2.04-r1 will be stable.
2.04-r1 is stable since a few months now.