leptonica-1.72 now has ADD_LEPTONICA_SUBDIR enabled, which breaks OCR in pyocr with tesseract. To reproduce with pyocr installed, in python: > from PIL import Image > from pyocr.tesseract import image_to_string > print(image_to_string(Image.open('test.jpg'))) pyocr uses a NamedTemporaryFile to hold a temporary BMP conversion file (like "/tmp/tess_Mu50wu.bmp") and then calls the tesseract binary on it. leptonica then errors with "Error in fopenReadStream: file not found", while it is on the file system (and readable) With ADD_LEPTONICA_SUBDIR disabled, it finds and reads the file correctly Not sure where the problem is in the end (probably in one of the paths logics in utils.c), I just disabled ADD_LEPTONICA_SUBDIR locally for now
CC'ing Michael, who just e-mailed me about this. Sorry for the delay, I've been busy fighting issues in Java land. The handling of /tmp is, in my opinion, the ugliest part of Leptonica. If you run the test suite, it totally spams /tmp with hundreds of megabytes of files and intentionally doesn't clean them up afterwards. If you enable ADD_LEPTONICA_SUBDIR then it at least puts this mess under a single subdirectory that you can easily delete. I didn't realise this would have any affect outside the test suite. If it respected the TMPDIR environment variable then it wouldn't be so bad because Portage would clean this up for you but it's hardcoded to use /tmp. I will raise this point with upstream before the next release. I'm still not sure why writing a few files in a cross-platform manner in this day and age is seemingly so hard but I would need to look a little closer at the code to find out. I at least wanted to look at little closer at what it's trying to do before disabling this option in the ebuild. I will try to find the time tonight.
I filed the issue in leptonicas tracker: https://code.google.com/p/leptonica/issues/detail?id=110 Regards, Michael
+*leptonica-1.72-r2 (04 Jun 2015) + + 04 Jun 2015; James Le Cuirot <chewi@gentoo.org> +leptonica-1.72-r2.ebuild, + -leptonica-1.72-r1.ebuild: + Undo use of ADD_LEPTONICA_SUBDIR. I didn't realise this had any effect outside + of the test suite and it probably broke every other usage of the library. + Unfortunately the test suite spams tons of files to /tmp and intentionally + leaves them there. Setting ADD_LEPTONICA_SUBDIR at least confined the mess to + a single directory. Quite frankly, Leptonica should use something like glib + instead of badly reinventing the wheel. I will speak to upstream and at least + try to get it to respect TMPDIR, which would allow Portage to clean up the + mess.
Thanks for looking into this :) 1.72-r2 works fine with pyocr (until this gets sorted out in a cleanier way upstream)