First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 146390
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Default Assignee for New Packages <maintainer-wanted@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: curtis <dylan38@gmail.com>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
tesseract-1.0.ebuild Ebuild text/plain Dan Davis 2006-09-05 09:59 0000 885 bytes Details
tesseract-1.0.xterm-path.patch Xterm path patch patch Dan Davis 2006-09-05 10:01 0000 629 bytes Details | Diff
tesseract-1.02.ebuild tesseract-1.02.ebuild text/plain Patrick McLean 2007-02-02 03:58 0000 1015 bytes Details
tesseract-1.02.ebuild tesseract-1.02.ebuild text/plain Patrick McLean 2007-02-02 04:41 0000 1022 bytes Details
tesseract-1.02.02022007.ebuild tesseract-1.02.02022007.ebuild text/plain Patrick McLean 2007-02-02 16:13 0000 1.11 KB Details
tesseract-1.02.02022007.ebuild tesseract-1.02.02022007.ebuild text/plain Patrick McLean 2007-02-02 16:21 0000 1.11 KB Details
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 146390 depends on: Show dependency tree
Show dependency graph
Bug 146390 blocks: 158445
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2006-09-05 06:24 0000
Google just released Tesseract OCR, an optical character recognition package
originally developed by Hewlett-Packard and now released as open source
software at SourceForge (http://sourceforge.net/projects/tesseract-ocr). I
would very much like to try it out and hope it can be included as an ebuild.

------- Comment #1 From Dan Davis 2006-09-05 09:59:41 0000 -------
Created an attachment (id=96085) [edit]
Ebuild

I tried to make this as robust as I could, but since this is my first ebuild,
I'm sure there are things I could've done differently.

I have mine in: ${PORTDIR_OVERLAY}/media-gfx/tesseract

------- Comment #2 From Dan Davis 2006-09-05 10:01:12 0000 -------
Created an attachment (id=96086) [edit]
Xterm path patch

Place this in the 'files' subdir. It corrects a hardcode path in the source
code... changes /usr/bin/X11/xterm to /usr/bin/xterm.

Mine is installed in ${PORTDIR_OVERLAY}/media-gfx/tesseract/files

------- Comment #3 From Thomas Gatliff 2006-09-06 11:07:03 0000 -------
Putting the ebuild aside, were you able to get tesseract processing correctly? 
Can you run the test image?

------- Comment #4 From Paul Kronenwetter 2006-10-04 10:01:48 0000 -------
I'll say that I cannot.  I've just downloaded and compiled the package (v1.01)
from SF and it "hangs" using 100% CPU.  Strace reveals this as the last few
things it does when invoked as:
tesseract phototest.tif phototest batch

write(2, "Tesseract Open Source OCR Engine"..., 33Tesseract Open Source OCR
Engine
) = 33
open("phototest.tif", O_RDONLY|O_LARGEFILE) = 6
read(6, "II*\0\10\0\0\0", 8)            = 8
fstat64(6, {st_mode=S_IFREG|0644, st_size=38668, ...}) = 0
mmap2(NULL, 38668, PROT_READ, MAP_SHARED, 6, 0) = 0xb7f64000
fstat64(6, {st_mode=S_IFREG|0644, st_size=38668, ...}) = 0
brk(0x86b6000)                          = 0x86b6000
munmap(0xb7f64000, 38668)               = 0
close(6)                                = 0
open("phototest.bl", O_RDONLY)          = -1 ENOENT (No such file or directory)
open("phototest.vec", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("phototest.uzn", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("phototest.pd", O_RDONLY)          = -1 ENOENT (No such file or directory)
times({tms_utime=10, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 1838221478
brk(0x86d7000)                          = 0x86d7000
times({tms_utime=12, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 1838221479

------- Comment #5 From Christopher J. Madsen 2006-10-19 13:12:44 0000 -------
I tried this ebuild with tesseract 1.02, and it seems to work.  Accuracy on
phototest.tif was 100%.  Unfortunately, the other pages I've tried haven't
fared so well; I get a lot of typos.  But it is producing recognizable text.

Note that the ebuild says LICENSE="GPL-1" when it should have been
LICENSE="Apache-2.0".

------- Comment #6 From Zeth 2006-12-12 17:21:48 0000 -------
There are some useful scripts here, that might help make it more user friendly:
http://www.groklaw.net/article.php?story=20061210115516438

It is already in Ubuntu so this might help:

http://packages.ubuntu.com/feisty/graphics/tesseract-ocr

------- Comment #7 From Juan 2006-12-25 15:01:44 0000 -------
I would like to confirm that this tesseract-ocr is functioning fine in
conjunction with mail-filter/spamassassin-fuzzyocr-3.5.0

From this ebuild request: https://bugs.gentoo.org/show_bug.cgi?id=158445

However, since the filename on sourceforge is tesseract-<ver>.tar.gz, I created
it as media-gfx/tesseract/tesseract-1.02.ebuild in portage overlay. This was
easiest for me.

------- Comment #8 From Patrick McLean 2007-02-02 03:58:02 0000 -------
Created an attachment (id=108906) [edit]
tesseract-1.02.ebuild

New ebuild for tesseract, this installs tesseract to /usr/lib/tesseract and
installs a wrapper in /usr/bin (this is more the gentoo way of doing things).

------- Comment #9 From Patrick McLean 2007-02-02 04:41:02 0000 -------
Created an attachment (id=108907) [edit]
tesseract-1.02.ebuild

this needs some error trapping

------- Comment #10 From Patrick McLean 2007-02-02 16:13:53 0000 -------
Created an attachment (id=108940) [edit]
tesseract-1.02.02022007.ebuild

Ebuild for a CVS pull of tesseract, this one will compile cleanly on amd64,
unlike the release version. This has been cleaned up quite a bit, and should be
ready for portage.

------- Comment #11 From Patrick McLean 2007-02-02 16:21:27 0000 -------
Created an attachment (id=108943) [edit]
tesseract-1.02.02022007.ebuild

oops, this one actually works properly.

------- Comment #12 From Patrick McLean 2007-02-02 21:35:47 0000 -------
Added to CVS as app-text/tesseract.

------- Comment #13 From Robert Buchholz 2007-02-03 01:44:48 0000 -------
Patrick, just some minutes too early. 1.0.3 released half an hour ago ,-)

First Last Prev Next    No search results available      Search page      Enter new bug