Summary: | app-text/pdfsandwich-0.0.7 - text is placed incorrectly | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Juergen Rose <rose> |
Component: | Current packages | Assignee: | Thomas Kahle (RETIRED) <tomka> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | ||
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Juergen Rose
2013-03-21 16:05:46 UTC
BTW., if I use pdfocr.rb or the shell script pdfocr.sh (http://blog.konradvoelkel.de/2010/01/linux-ocr-and-pdf-problem-solved/) under ubuntu or mint, which use so far as I understand cuneiform and and hocr2pdf from ExactImage, I get the same behaviour. Could it be that the coordinates for the text boxes generated by cuneiform and tesseract are different from those expected by hocr2pdf? (In reply to comment #1) > BTW., if I use pdfocr.rb or the shell script pdfocr.sh > (http://blog.konradvoelkel.de/2010/01/linux-ocr-and-pdf-problem-solved/) > under ubuntu or mint, which use so far as I understand cuneiform and and > hocr2pdf from ExactImage, I get the same behaviour. Could it be that the > coordinates for the text boxes generated by cuneiform and tesseract are > different from those expected by hocr2pdf? The simple answer is yes. Usually what you can try to do is run pdfsandwich with the -verbose options. Then you can see the command line arguments it passes to its minions and fiddle with those. Generally, we as Gentoo can't do much about it. I've just bumped pdfsandwich to version 0.0.8. Please try this new version. If it does not work, your next option would be to contact the author. P.S. cuneiform is evil. It is vulnerable to buffer overflow exploits if you process forged pdfs. New versions of everything involved have been added to the tree. |