Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 549816 - =media-libs/leptonica-1.72-r1: file not found in pyocr/tesseract with ADD_LEPTONICA_SUBDIR=1
Summary: =media-libs/leptonica-1.72-r1: file not found in pyocr/tesseract with ADD_LEP...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: James Le Cuirot
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-18 14:07 UTC by Bernard Cafarelli
Modified: 2015-06-05 14:55 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bernard Cafarelli gentoo-dev 2015-05-18 14:07:21 UTC
leptonica-1.72 now has ADD_LEPTONICA_SUBDIR enabled, which breaks OCR in pyocr with tesseract. To reproduce with pyocr installed, in python:
> from PIL import Image
> from pyocr.tesseract import image_to_string
> print(image_to_string(Image.open('test.jpg')))

pyocr uses a NamedTemporaryFile to hold a temporary BMP conversion file (like "/tmp/tess_Mu50wu.bmp") and then calls the tesseract binary on it. leptonica then errors with "Error in fopenReadStream: file not found", while it is on the file system (and readable)
With ADD_LEPTONICA_SUBDIR disabled, it finds and reads the file correctly

Not sure where the problem is in the end (probably in one of the paths logics in utils.c), I just disabled ADD_LEPTONICA_SUBDIR locally for now
Comment 1 James Le Cuirot gentoo-dev 2015-05-29 12:35:01 UTC
CC'ing Michael, who just e-mailed me about this.

Sorry for the delay, I've been busy fighting issues in Java land. The handling of /tmp is, in my opinion, the ugliest part of Leptonica. If you run the test suite, it totally spams /tmp with hundreds of megabytes of files and intentionally doesn't clean them up afterwards. If you enable ADD_LEPTONICA_SUBDIR then it at least puts this mess under a single subdirectory that you can easily delete. I didn't realise this would have any affect outside the test suite. If it respected the TMPDIR environment variable then it wouldn't be so bad because Portage would clean this up for you but it's hardcoded to use /tmp. I will raise this point with upstream before the next release. I'm still not sure why writing a few files in a cross-platform manner in this day and age is seemingly so hard but I would need to look a little closer at the code to find out. I at least wanted to look at little closer at what it's trying to do before disabling this option in the ebuild. I will try to find the time tonight.
Comment 2 Michael Stypa 2015-05-29 12:57:59 UTC
I filed the issue in leptonicas tracker:

https://code.google.com/p/leptonica/issues/detail?id=110

Regards,
Michael
Comment 3 James Le Cuirot gentoo-dev 2015-06-04 23:01:45 UTC
+*leptonica-1.72-r2 (04 Jun 2015)
+
+  04 Jun 2015; James Le Cuirot <chewi@gentoo.org> +leptonica-1.72-r2.ebuild,
+  -leptonica-1.72-r1.ebuild:
+  Undo use of ADD_LEPTONICA_SUBDIR. I didn't realise this had any effect outside
+  of the test suite and it probably broke every other usage of the library.
+  Unfortunately the test suite spams tons of files to /tmp and intentionally
+  leaves them there. Setting ADD_LEPTONICA_SUBDIR at least confined the mess to
+  a single directory. Quite frankly, Leptonica should use something like glib
+  instead of badly reinventing the wheel. I will speak to upstream and at least
+  try to get it to respect TMPDIR, which would allow Portage to clean up the
+  mess.
Comment 4 Bernard Cafarelli gentoo-dev 2015-06-05 14:55:08 UTC
Thanks for looking into this :) 1.72-r2 works fine with pyocr (until this gets sorted out in a cleanier way upstream)