Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 158445 - Update: mail-filter/spamassassin-fuzzyocr-3.5.0_rc1 & app-text/gocr-0.43
Summary: Update: mail-filter/spamassassin-fuzzyocr-3.5.0_rc1 & app-text/gocr-0.43
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Other
: High enhancement (vote)
Assignee: Tom Knight (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on: 146390 170430
Blocks:
  Show dependency tree
 
Reported: 2006-12-18 03:54 UTC by Jacob Lindberg
Modified: 2007-05-20 19:34 UTC (History)
17 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
spamassassin-fuzzyocr-3.5.0_rc1.ebuild (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,1.89 KB, text/plain)
2006-12-18 03:55 UTC, Jacob Lindberg
Details
patchset2.patch (patch for spamassassin-fuzzyocr) (patchset2.patch,17.79 KB, patch)
2006-12-18 03:55 UTC, Jacob Lindberg
Details | Diff
New: dev-perl/MLDBM-Sync (MLDBM-Sync-0.30.ebuild) (MLDBM-Sync-0.30.ebuild,547 bytes, text/plain)
2006-12-18 03:56 UTC, Jacob Lindberg
Details
app-text/gocr (gocr-0.43.ebuild) (gocr-0.43.ebuild,1.27 KB, text/plain)
2006-12-18 03:57 UTC, Jacob Lindberg
Details
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,1.90 KB, text/plain)
2006-12-18 04:29 UTC, Jacob Lindberg
Details
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,1.90 KB, text/plain)
2006-12-18 04:29 UTC, Jacob Lindberg
Details
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,1.90 KB, text/plain)
2006-12-18 04:29 UTC, Jacob Lindberg
Details
fuzzy-ocr with various USE flags (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,2.20 KB, text/plain)
2006-12-25 19:49 UTC, Juan
Details
New ebuild (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,2.42 KB, text/plain)
2006-12-26 15:24 UTC, Jacob Lindberg
Details
Enable tesseract config patch (enabletesseract.patch,806 bytes, patch)
2006-12-26 15:25 UTC, Jacob Lindberg
Details | Diff
Disable ocrad in config if ! use patch (noocrad.patch,1.28 KB, patch)
2006-12-26 15:26 UTC, Jacob Lindberg
Details | Diff
Newest tesseract config patch (enabletesseract.patch,656 bytes, patch)
2006-12-26 16:16 UTC, Juan
Details | Diff
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with support for 3 OCR engines (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,3.36 KB, text/plain)
2006-12-26 17:01 UTC, Juan
Details
Disable gocr in config if ! use patch (disablegocr.patch,574 bytes, patch)
2006-12-26 17:02 UTC, Juan
Details | Diff
tie-cache ebuild for possible tie-cache dependency.... (Tie-Cache-0.17.ebuild,540 bytes, text/plain)
2006-12-26 23:37 UTC, Juan
Details
The latest ebuild (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,3.98 KB, text/plain)
2006-12-28 01:31 UTC, Jacob Lindberg
Details
The renamed disableocrad.patch (disableocrad.patch,1.28 KB, patch)
2006-12-28 01:32 UTC, Jacob Lindberg
Details | Diff
The logrotate file (fuzzyocr.logrotate,194 bytes, text/plain)
2006-12-28 01:32 UTC, Jacob Lindberg
Details
spamassassin-fuzzyocr-3.5.0_rc1-r1.ebuild (spamassassin-fuzzyocr-3.5.0_rc1-r1.ebuild,4.33 KB, text/plain)
2007-01-02 04:12 UTC, Jacob Lindberg
Details
patchset1.patch (patchset1.patch,3.81 KB, patch)
2007-01-02 04:29 UTC, Jacob Lindberg
Details | Diff
patchset3.patch (patchset3.patch,17.68 KB, patch)
2007-01-02 04:30 UTC, Jacob Lindberg
Details | Diff
postgresql.patch (postgresql.patch,36.71 KB, patch)
2007-01-04 09:56 UTC, Juan
Details | Diff
postgresql.patch (postgresql.patch,36.71 KB, patch)
2007-01-04 10:16 UTC, Juan
Details | Diff
ebuild for review (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,4.57 KB, text/plain)
2007-01-04 10:21 UTC, Juan
Details
ebuild for review (spamassassin-fuzzyocr-3.5.0_rc1.ebuild,4.60 KB, text/plain)
2007-01-04 10:27 UTC, Juan
Details
test ebuild for sql hash storage (spamassassin-fuzzyocr-3.5.1.ebuild,3.22 KB, text/plain)
2007-01-28 21:50 UTC, Paul B. Henson
Details
spamassassin-fuzzyocr-3.5.1.ebuild (spamassassin-fuzzyocr-3.5.1.ebuild,4.33 KB, text/plain)
2007-02-02 03:54 UTC, Patrick McLean
Details
spamassassin-fuzzyocr-3.5.1.ebuild (spamassassin-fuzzyocr-3.5.1.ebuild,4.30 KB, text/plain)
2007-02-02 16:30 UTC, Patrick McLean
Details
MLDBM-Sync-0.30.ebuild (MLDBM-Sync-0.30.ebuild,433 bytes, text/plain)
2007-02-06 22:42 UTC, Tom Knight (RETIRED)
Details
Files modified to work with postgres instead of mysql (postgres.tar.bz2,15.77 KB, application/octet-stream)
2007-05-20 19:34 UTC, aelber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lindberg 2006-12-18 03:54:25 UTC
I had to create all this in one bug, otherwise it wouldn't make sense.

A RC was released of this fine tool, so I decided to ebuild it. 

There is a new perl-module dependency called dev-perl/MLDBM-Sync. I have created a seperate ebuild for this (it doesn't exist at the moment). Please put that in the tree also. Without MLDBM 'spamassassin --lint' will nag about missing it.

I have changed the warning about app-text/gocr since I have created a new ebuild for this also. This fixes the old segfaulting and heavy loading problem.
Comment 1 Jacob Lindberg 2006-12-18 03:55:21 UTC
Created attachment 104267 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild
Comment 2 Jacob Lindberg 2006-12-18 03:55:54 UTC
Created attachment 104268 [details, diff]
patchset2.patch (patch for spamassassin-fuzzyocr)
Comment 3 Jacob Lindberg 2006-12-18 03:56:41 UTC
Created attachment 104269 [details]
New: dev-perl/MLDBM-Sync (MLDBM-Sync-0.30.ebuild)
Comment 4 Jacob Lindberg 2006-12-18 03:57:17 UTC
Created attachment 104270 [details]
app-text/gocr (gocr-0.43.ebuild)
Comment 5 Jacob Lindberg 2006-12-18 03:58:47 UTC
-Without MLDBM 'spamassassin --lint' will nag about missing it.
+Without MLDBM-Sync 'spamassassin --lint' will nag about missing it.
Comment 6 Jacob Lindberg 2006-12-18 04:29:23 UTC
Created attachment 104274 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI

I forgot dev-perl/DBI as dependcy. Here it is included.
Comment 7 Jacob Lindberg 2006-12-18 04:29:35 UTC
Created attachment 104275 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI

I forgot dev-perl/DBI as dependcy. Here it is included.
Comment 8 Jacob Lindberg 2006-12-18 04:29:46 UTC
Created attachment 104276 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI

I forgot dev-perl/DBI as dependency. Here it is included.
Comment 9 Jacob Lindberg 2006-12-18 04:41:29 UTC
Comment on attachment 104275 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with dev-perl/DBI

Obsoleted since bugs.gentoo.org was momo..
Comment 10 Michael Kefeder 2006-12-20 01:38:53 UTC
Thanks for all the work. I can confirm that installing with your ebuilds successfully completed.

Now I hope the plugin does what it promises ;)
Comment 11 Jacob Lindberg 2006-12-20 06:58:31 UTC
Good to hear.

I had some issues with the old one. Scoring was not that good. This one seems to be doing the job alright ;)
Comment 12 Juan 2006-12-25 14:52:21 UTC
Is it possible that spamassassin-fuzzyocr includes USE flags for the following (optional) packages:

app-text/ocrad (already in portage)
media-gfx/gifsicle (already in portage)

There are some commented settings in FuzzyOcr.cf that would require these packages to be installed, if uncommented.

Also, this requested ebuild should have as much attention paid to it as the rest as this can also be an optional package:

media-gfx/tesseract-ocr

Bug report for the tesseract-ocr ebuild @ https://bugs.gentoo.org/show_bug.cgi?id=146390

Note that the source package is tesseract-<version>.**

Lastly, this is most likely an upstream bug but when used with ocrad, my log reports the following which occurs with all 4 ocrad scansets:

2006-12-25 14:45:30 [19871] Errors in Scanset "ocrad-decolorize"
2006-12-25 14:45:30 [19871] Return code: 1, Error: /usr/bin/ocrad: invalid option -- s
                      Try `/usr/bin/ocrad --help' for more information.

2006-12-25 14:45:30 [19871] Skipping scanset because of errors, trying next...

I looked through all the fuzzy-ocr perl modules looking for the line to patch but couldn't find it.
Comment 13 Juan 2006-12-25 14:55:52 UTC
I should also add that this fine SpamAssassin plugin set (fuzzy-ocr 3.5.0 / gocr 0.43 / MLDBM-Sync) works just fine, regardless of the ocrad errors.
Comment 14 Juan 2006-12-25 16:17:54 UTC
The ocrad errors I was receiving are documented at:

http://fuzzyocr.own-hero.net/wiki/OcradWrongParameters
Comment 15 Juan 2006-12-25 19:49:46 UTC
Created attachment 104727 [details]
fuzzy-ocr with various USE flags

I have modified the current ebuild as it does not provide the flexibility available with fuzzy-ocr.

This ebuild contains the following USE flags:

dbm gocr log mysql ocrad tesseract

I am not an ebuild dev but this works fine for me. The following conditions must be met:

If you enable gocr, you must use the gocr-0.43 ebuild.
If you enable dbm, you must use the MLDBM-Sync ebuild.
If you enabled tesseract, you must use this ebuild => https://bugs.gentoo.org/show_bug.cgi?id=146390
If you use ocrad, you can use ocrad currently in portage or use this: => https://bugs.gentoo.org/show_bug.cgi?id=154579

The ebuild could probably use some extra touches such as commenting/uncommenting out scansets (gocr/ocrad/tesseract) from FuzzyOcr.scansets.
Comment 16 Jacob Lindberg 2006-12-26 15:24:10 UTC
Juan,

Good work. I discovered the same things just before leaving for Christmas.

Concerning the "Return code: 1, Error: /usr/bin/ocrad: invalid option -- s" the fix on the web page has already been applied in the scanset file. It's a matter of the version of ocrad. If you use 0.10 instead of 0.15, you will be missing "  -s, --scale=[-]<n>       scale input image by [1/]<n>" in options.

You allready fixed this with:
ocrad? ( >=app-text/ocrad-0.14 )
in the ebuild :-)

My log now:
2006-12-26 23:33:00 [2348] Scanset Order: ocrad(0) ocrad-invert(0) ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
2006-12-26 23:33:00 [2378] Exec  : /usr/bin/ocrad -s5 /var/amavis/tmp/.spamassassin23483W6okRtmp/pic01.gif.pnm
2006-12-26 23:33:00 [2348] Saved pid: 2378
2006-12-26 23:33:00 [2378] Stdout: >/var/amavis/tmp/.spamassassin23483W6okRtmp/scanset.ocrad.out
2006-12-26 23:33:00 [2378] Stderr: >/var/amavis/tmp/.spamassassin23483W6okRtmp/scanset.ocrad.err

No problem here.

About the USE flags. I like the idea, but we can't make both ocrad and gocr USE flag, since the plugin doesn't make any sense without one. I suggest to make gocr static, and ocrad as USE flag. Gocr was static in earlier version.

Since MLDBM, MLDBM-Sync and DB_File are very much required, I have removed the USE flag dbm again. The config is checking for these, and complaining in case they don't exist. Meaning you loose functionality if they are not there.

I have added perl-core/DB_File as dependency also since it was in the config too.

I have added ">=app-text/gocr-0.43" as dependency and removed the warning about earlier buggy version now that we are forcing a good version.

I also did some patching when enabling tesseract and disabling ocrad.

Next step is the mysql part.

Tell me what you think so far.

Comment 17 Jacob Lindberg 2006-12-26 15:24:59 UTC
Created attachment 104771 [details]
New ebuild
Comment 18 Jacob Lindberg 2006-12-26 15:25:36 UTC
Created attachment 104773 [details, diff]
Enable tesseract config patch
Comment 19 Jacob Lindberg 2006-12-26 15:26:12 UTC
Created attachment 104774 [details, diff]
Disable ocrad in config if ! use patch
Comment 20 Juan 2006-12-26 15:41:04 UTC
Jacob,

Awesome, I am going to leave my production server in peace and install the ebuilds related to this bug report on my laptop to test the hell out of it.

FYI, I am adding PostgreSQL support to FuzzyOCR myself so expect a pgsql USE flag in the near future (hopefully for the 3.5 stable release).

Juan
Comment 21 Juan 2006-12-26 15:48:19 UTC
gocr is not required so it doesn't make any sense to make it a dependency. One OCR is required. There are 3 to choose from. I think it would make more sense to nag the user when all 3 OCR flags are disabled OR simply make gocr a dependancy *IF* all 3 OCR flags are disabled.

But let's not lock people into having to install gocr. =)
Comment 22 Juan 2006-12-26 16:16:45 UTC
Created attachment 104776 [details, diff]
Newest tesseract config patch

For some reason, the current tesseract patch doesn't work for me... I've attached the one I created that works (for me)...
Comment 23 Jacob Lindberg 2006-12-26 16:34:29 UTC
Juan,

Ofcoz you are right about the number of OCR. Let's do something about that. 
(gocr, ocrad and tesseract).

Good to hear about PostgreSQL support. Most probably a lot of people will love that.

I need to get some sleep now. I will look into the 3 OCR issue tomorrow. It's 1:30 am here.

About the tesseract patch, this is the only patch I didn't test! I admit that. I just did a diff. 

Is the tesseract software any good?
Comment 24 Juan 2006-12-26 16:58:04 UTC
Jabob,

Cool. Well, I am one step ahead of you as I have created the newest ebuild to support all 3 OCR engines and nag if all OCR engine USE flags are disabled (perhaps a better approach is in order here)....

I currently have Fuzzy on my production server using PgSQL. It's a nasty, quick and super dirty hack but it works for now. Since I now have fuzzy on my laptop, I'll be able to add PgSQL support more easily.

About tesseract.. I think it's probably just as good as ocrad. It was open-sourced last year by HP and UNLV for what it's worth... Both ocrad and tesseract catch my custom spam images I've made for testing....
Comment 25 Juan 2006-12-26 17:01:20 UTC
Created attachment 104778 [details]
spamassassin-fuzzyocr-3.5.0_rc1.ebuild with support for 3 OCR engines

Newest ebuild that simply makes gocr a USE flag. If all 3 OCR engine USE flags are disabled, the ebuild complains then dies.
Comment 26 Juan 2006-12-26 17:02:09 UTC
Created attachment 104779 [details, diff]
Disable gocr in config if ! use patch
Comment 27 Juan 2006-12-26 18:54:18 UTC
>> Jabob wrote:
    Next step is the mysql part.

Oh yea.

my only guess would to drop the sql schemas into fuzzy's home dir (in /etc/mail/sa) and point users to those files with some post install message.. eh?
Comment 28 Juan 2006-12-26 23:37:39 UTC
Created attachment 104790 [details]
tie-cache ebuild for possible tie-cache dependency....

So apparently, the perl modules Tie-Cache is required. But I don't get it. Tie-Cache is not in portage and I never installed it on my server but Fuzzy works without issues. Move along to my laptop, I had to manually install Tie-Cache to get it Fuzzy to work.

Is Tie-Cache masked as some other package in portage? If not, it is required so an ebuild for Tie-Cache will be needed.

In case it is, here is the overlay ebuild for Tie-Cache 0.17

Can anyone confirm this? FuzzyOcr source does call on tie-cache so... hmm
Comment 29 Juan 2006-12-27 00:04:39 UTC
Comment on attachment 104790 [details]
tie-cache ebuild for possible tie-cache dependency....

># Copyright 1999-2006 Gentoo Foundation
># Distributed under the terms of the GNU General Public License v2
># $Header: /var/cvsroot/gentoo-x86/perl-core/Tie-Cache/Tie-Cache-0.17.ebuild,v 1.9 2006/08/04 13:30:56 mcummings Exp $
>
>inherit perl-module
>
>DESCRIPTION="The Perl LRU Cache Memory Module"
>HOMEPAGE="http://search.cpan.org/~chamas/Tie-Cache-0.17/Cache.pm"
>SRC_URI="mirror://search.cpan.org/CPAN/authors/id/C/CH/CHAMAS/${P}.tar.gz"
>
>LICENSE="|| ( Artistic GPL-2 )"
>SLOT="0"
>KEYWORDS="~x86"
>IUSE=""
>
>SRC_TEST="do"
>
>DEPEND="dev-lang/perl"
Comment 30 Jacob Lindberg 2006-12-27 06:13:25 UTC
Juan,

Isn't it tesseract which is depending on Tie::Cache? I'm not using tesseract at the moment, and I don't see any warnings or any requirement of this from Fuzzyocr. Please test that.

About the SQL files, good idea!

Please make sure that x86, ppc and ppc64 is included as KEYWORDS in your ebuild(s). I'm using all 3 archs when testing :)

Tomorrow I will be finished with another update of the ebuild which also include enabling log and logrotate in USE flags. Right now the log doesn't do much. Also I will restructure the DEPEND and RDEPEND since it's a mess right now :)
Comment 31 Juan 2006-12-27 12:18:43 UTC
Jacob,

FYI: I removed SA/Fuzzy and all dependencies as to start fresh. And as of now, all is working as expected.

One Tie:Cache.. When I received this error, I must admit that I installed Fuzzy then applied my PgSQL patches before doing ANY testing to confirm functionality. Installing Tie:Cache solved that issue this particular time. Now, after a fresh, squeaky clean install of SA and Fuzzy (no PgSQL patches but using my ebuild), no Tie::Cache related errors. I then patched Fuzzy with my PgSQL files and still no error. However, I did end up installing Storable on my laptop (Storable is installed on the server) but I don't see why not having Storable installed on my laptop would give me Tie::Cache related errors. I assume Storable would be used for file based hashing(????).

I have ocrad and tesseract as the OCR engines so it appears that Tie:Cache isn't a requirement for Tesseract since no more errors. All I can say is weird and that I cannot reproduce the errors.

On KEYWORDS... I will be sure to remember to add more than just my arch.. hehe..

On Log-Agent... I'm not so sure where/how this comes into place. But on Fuzzy's site, he states that Log-Agent *might* be required for MLDBM-Sync but that some users have reported no issues without it. I have removed it on this new install and have no errors/issues without it. It might be safe to remove that as a DEP. You can read about it here: http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x

The log USE flag should have been log-agent and not log, my bad.

Lastly, and off topic.. Are you using MySQL for hashing? If so, do you get the following error which occurs when repeat offending images are updated in the hash table (of course, you'd see DBD::mysql):

warn: DBD::Pg::db do failed: ERROR: syntax error at or near "check" at character

I find it weird that i would get a simple update query error that wasn't caught with mysql testing since the SQL queries are very basic queries. In the case above, this is the query which doesn't seem incorrect in any way as all cells exist as does the image being updated:

update hash set match = '1', check = '1167249571' where key = '255:255:255:255:173820::0:0:0:0:14680'
Comment 32 Juan 2006-12-27 17:52:21 UTC
Jacob,

My SQL error was Pg specific. It appears 'check' is a reserved word so I changed 'check' to 'last_seen' and it works. So no worries about errors...

In any case, I have submitted my PgSQL patchset to the devs @ FuzzyOcr. Hopefully it'll make it for the 3.5 stable release...

http://fuzzyocr.own-hero.net/ticket/34

It works as it should with PgSQL. Can't comment on MySQL functionality. This project kicks ass!
Comment 33 Juan 2006-12-27 17:55:29 UTC
Since you're fine tuning the ebuild, don't forget to add DBD-mysql when USE mysql is enabled. Same for PgSQL once that gets into source.

Storable appears to be required when hashing to files, not SQL. Can you confirm this?
Comment 34 Jacob Lindberg 2006-12-28 01:30:53 UTC
Juan,

Nice job there. I hope to see your patch go into stable 3.5 :)

Do you think we should add your patch to the ebuild?

I took your ebuild as reference. Now we are in 'sync'.

I have made some changes and enhancement to the ebuild now. I will list a small ChangeLog here:

--------
- Changed dev-db/mysql to dev-perl/DBD-Mysql in mysql USE FLAG
- Added dev-perl-core/Storable to RDEPEND
- Removed dev-perl/Log-Agent from DEPEND
- Changed the eerror to a little more user friendly message
- Added USE flag log which will change "#focr_logfile /tmp/FuzzyOcr.log" to "focr_logfile /var/log/FuzzyOcr.log" in FuzzyOcr.cf
- Renamed noocrad.patch to diableocrad.patch in files
- Changed DEPEND to only consist of dev-lang/perl and >=mail-filter/spamassassin-3.0.0. The rest is in RDEPEND since there is no need when building the package
- Added /var/lib/FuzzyOcr to handle all file dbs + changing /etc/mail/spamassassin to /var/lib/FuzzyOcr in FuzzyOcr.cf
--------

As you can see I removed the Log::Agent, and used the log USE flag for enabling logging from FuzzyOcr. I hope this is okay with you.

I have enabled image hashing (option 2) in my test setup and it all works like a charm. I even moved it to production now. 

I can't use mysql since my servers are too loaded for doing SQL queries at the moment. I will have to trust you, Juan, on that one :)

I found this in Hashes.pm: "use MLDBM qw(DB_File Storable);" and this in Config.pm: "use constant HAS_STORABLE => eval { require Storable; };". So you are obviously right about Storable as dependency.

Are we missing something else?








Comment 35 Jacob Lindberg 2006-12-28 01:31:40 UTC
Created attachment 104837 [details]
The latest ebuild
Comment 36 Jacob Lindberg 2006-12-28 01:32:22 UTC
Created attachment 104838 [details, diff]
The renamed disableocrad.patch
Comment 37 Jacob Lindberg 2006-12-28 01:32:50 UTC
Created attachment 104839 [details]
The logrotate file
Comment 38 Juan 2006-12-28 01:57:04 UTC
Jacob,

Ebuild looks nice. I think an ebuild could be created with my patch to at least have it tested. I do have mine in production working fine but it would be nice to have input from others to pass on to the Fuzzy devs if needed. USE postgre would need to be added as well as a new DEP, DBD-Pg. I think that might be best since I don't want to be the only Pg tester for the world to depend on... =)

Also, pkg_postinst should copy the SQL files somewhere and instruct the user to use/import files located in X dir. Not sure where to drop those files though...

It's 2am. I'm off to bed.

Juan
Comment 39 Juan 2006-12-28 10:54:54 UTC
Jacob,

The newest ebuild works great!

Now, I've been trying to apply my patch to the ebuild but am failing miserably. Is there anything special that I need to do to the pgsql.patch file I drop into ${FILESDIR}?

It is failing at:

 * Applying pgsql.patch ...

 * Failed Patch: pgsql.patch !
Comment 40 Jacob Lindberg 2007-01-01 22:34:25 UTC
Juan,

When I look through your patch it changed the logging facility in the config file. This is something your patch should not do. It should give the ability to use pgsql, but nothing else.

Can you create a new patch? Or provide me the one you want to use? 

I will help you make it work in the ebuild.
Comment 41 Vieri 2007-01-02 02:02:06 UTC
Hi,
I'm new to this plugin but am really interested to try it out.
I'm recurring to this ebuild because the official website says that the "stable" version is not recommended.

Your latest ebuild has:
epatch "${FILESDIR}"/patchset2.patch
The web site lists a patchset3.
I suppose it should be updated.

Also I saw that you are using the amavis user permissions on some files. My system doesn't use amavis. Is it necessary?
Comment 42 Jacob Lindberg 2007-01-02 04:11:53 UTC
Hi,

Thanks for your observations. I have created a new ebuild with all patches (1,2,3), and a warning about the amavis user. I need to do some thinking about this issue, since we can't make sure that the amavis user actually exists.
Comment 43 Jacob Lindberg 2007-01-02 04:12:52 UTC
Created attachment 105149 [details]
spamassassin-fuzzyocr-3.5.0_rc1-r1.ebuild
Comment 44 Jacob Lindberg 2007-01-02 04:29:30 UTC
Created attachment 105150 [details, diff]
patchset1.patch
Comment 45 Jacob Lindberg 2007-01-02 04:30:13 UTC
Created attachment 105151 [details, diff]
patchset3.patch
Comment 46 Vieri 2007-01-02 05:48:13 UTC
(In reply to comment #42)

thanks.
I also noticed that the ebuild requires:
>=mail-filter/spamassassin-3.0.0

however fuzzyocr-3.5 seems to require version 3.1.4 or higher (http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x).
Comment 47 Vieri 2007-01-02 05:50:41 UTC
(In reply to comment #46)
> (In reply to comment #42)
[EDIT]: I just saw the "if has_version '<mail-filter/spamassassin-3.1.4';"
Comment 48 Jacob Lindberg 2007-01-02 23:03:45 UTC
Hi again

Well about spamassassin, the oldest version available in portage is 3.1.3. This will most probably dissapear before this ebuild goes in the tree.
Comment 49 Jacob Lindberg 2007-01-02 23:38:23 UTC
And by the way:

        # if we're using spamassassin < 3.1.4 we need to set this variable
        if has_version '<mail-filter/spamassassin-3.1.4'; then
            sed -ie "s:^#focr_pre314 0.0:focr_pre314 1:" FuzzyOcr.cf
        fi

...
Comment 50 Vieri 2007-01-03 00:37:21 UTC
(In reply to comment #42)
> a warning about the amavis user. I need to do some thinking about
> this issue, since we can't make sure that the amavis user actually exists.

How about moving fperms and fowners to pkg_config() so that the user can specify which user spamassassin is running under? I find it tricky for the ebuild to correctly autodetect the spamassassin system user but if you find a way then that would be great.
Comment 51 Juan 2007-01-03 10:24:54 UTC
(In reply to comment #40)
> Juan,
> 
> When I look through your patch it changed the logging facility in the config
> file. This is something your patch should not do. It should give the ability to
> use pgsql, but nothing else.
> 
> Can you create a new patch? Or provide me the one you want to use? 
> 
> I will help you make it work in the ebuild.
> 

Jacob,

I will post my PostgreSQL patch in a bit (later today).
Comment 52 Juan 2007-01-04 09:56:55 UTC
Created attachment 105396 [details, diff]
postgresql.patch
Comment 53 Juan 2007-01-04 10:16:00 UTC
Created attachment 105398 [details, diff]
postgresql.patch
Comment 54 Juan 2007-01-04 10:21:53 UTC
Created attachment 105399 [details]
ebuild for review

Jacob,

I've attached a new ebuild for you to look at (see below). I've also uploaded the postgresql patch that I've been trying to get to work. 

Also, a couple of things about the ebuild. If you're going to assume everyone uses amavis, add an amavis flag. I don't use amavis so I have no amavis user.

So you're keeping both log and logrotate flags? I think that is rather redundant and should probably stick with logrotate.
Comment 55 Juan 2007-01-04 10:27:44 UTC
Created attachment 105400 [details]
ebuild for review
Comment 56 Paul B. Henson 2007-01-28 01:41:58 UTC
I'm putting together a new postfix/amavisd-new/clamav/spamassassin system, and came across your ebuild in progress. A few initial comments:

FuzzyOCR 3.5.1 is out, so the 3 patchsets are obsolete. Looks like the tarball has a -devel in the name now.

If you're only storing hashes to SQL, I don't think there's a need for the DBM packages. Why not put back the dbm use flag? It seems the three choices are  -dbm -*sql, no hashing. dbm -*sql, depend on dbm packages, local file hash storage. -dbm *sql, depend on DBI/appropriate DBD, sql hash storage. The current ebuild depends on dev-perl/MLDBM-Sync and dev-perl/DBI regardless of use flags, which will result in extra cruft installed. It looks like perl-core/Storable is only needed for dbm support too. I'm planning on storing hashes in mysql, and don't want to install unnecessary packages (one of the things I like about Gentoo versus precompiled dists is that flexibility). I guess a fourth choice would be dbm *sql, install it all...

Where does the direct dependency on virtual/perl-Digest-MD5 come from? I don't see anything in the FuzzyOCR code itself that uses it.

How about media-gfx/imagemagick? FuzzyOCR doesn't seem to depend on it directly. It looks like it is only needed for tesseract support? If so, it should only be included if tesseract is used.

/var/lib/FuzzyOcr is only needed when dbm is used.

Just a personal opinion, but I'm not sure why the default for the .words, .scansets, and .preps files is /etc/spamassassin rather than /etc/spamassassin/FuzzyOcr. FuzzyOcr.cf itself clearly needs to go into /etc/spamassassin, but the other files seem better located in the subdir. I'll probably install them there and update the .cf file. Actually, the files currently going into /etc/spamassassin/FuzzyOcr seem to be perl modules, not config files. Why shouldn't those go into ${VENDOR_LIB}/FuzzyOcr with all the other perl modules? Makes more sense than /etc. Looks like the ebuild already installs FuzzyOcr.pm into the spamassassin plugin dir instead of /etc/spamassassin, might as well relocate the other modules to a more appropriate spot.

Well, I guess I'll go see how well my tweaked ebuild works out.

Thanks...
Comment 57 Jacob Lindberg 2007-01-28 09:55:23 UTC
Sounds good to me. I kindda lost time to finish this project. At least for the moment. I still have it running in production though.
Comment 58 Marco Nierlich 2007-01-28 10:04:56 UTC
Paul, would you mind attaching your tweaked ebuild?
Comment 59 Paul B. Henson 2007-01-28 21:50:28 UTC
Created attachment 108421 [details]
test ebuild for sql hash storage
Comment 60 Paul B. Henson 2007-01-28 21:54:03 UTC
Ok, I attached the ebuild I've been playing with. Note it is not meant as a replacement for the last proposed ebuild, I've only tested it with the use flags I wanted, and I ripped out some of the logging stuff I didn't need rather than trying to fix it. Also, it doesn't automatically fix the paths in FuzzyOcr.cf like it should, I edited that file by hand afterward. However, I am running it on a test system successfully storing hashes into mysql, without any dbm related packages, with perl code located in /usr/lib/perl, and no complaints/problems so far.
Comment 61 Patrick McLean gentoo-dev 2007-02-02 03:54:58 UTC
Created attachment 108904 [details]
spamassassin-fuzzyocr-3.5.1.ebuild

Ebuild for spamassassin-fuzzyocr-3.5.1, this is fixed up a bit, mostly small stuff from the previous ebuilds posted here.

The postgresql patch doesn't apply anymore, I have it commented out for now, if you make up a new one for me I can add it in again. I will give this a few days testing, and hopefully to get a new postgres patch then talk to tomk about adding this to portage.
Comment 62 Patrick McLean gentoo-dev 2007-02-02 16:30:34 UTC
Created attachment 108944 [details]
spamassassin-fuzzyocr-3.5.1.ebuild

Some cleanups, change the tesseract dep from media-gfx/tesseract to app-text/tesseract since all the other OCR apps in portage are in app-text.
Comment 63 Tom Knight (RETIRED) gentoo-dev 2007-02-02 18:58:30 UTC
Sorry guys, been really busy recently. Thanks for all the work you've put into the ebuild(s), I'll have a look this weekend.
Comment 64 Jacob Lindberg 2007-02-03 21:09:47 UTC
Patrick, thanks for continuing this project. I will see if I get some time next week to help you out.
Comment 65 Tom Knight (RETIRED) gentoo-dev 2007-02-06 22:39:04 UTC
Comment on attachment 104270 [details]
app-text/gocr (gocr-0.43.ebuild)

0.43 has been added to the tree, see bug 145624.
Comment 66 Tom Knight (RETIRED) gentoo-dev 2007-02-06 22:42:43 UTC
Created attachment 109385 [details]
MLDBM-Sync-0.30.ebuild

Fixed the LICENSE and KEYWORDS (we can't add arches which we haven't tested on although we can request that the arch teams add their ~ARCH keywords)
Comment 67 Jacob Lindberg 2007-02-07 07:46:52 UTC
So my ppc, ppc64 and x86 setups doesn't count for MLDBM-Sync?

Please fix this again, Tom. It has been tested fully and actually running at the moment :-)
Comment 68 Tom Knight (RETIRED) gentoo-dev 2007-02-07 17:42:23 UTC
(In reply to comment #67)
> So my ppc, ppc64 and x86 setups doesn't count for MLDBM-Sync?
> 
> Please fix this again, Tom. It has been tested fully and actually running at
> the moment :-)
> 

Although you've tested it on those arches it's Gentoo policy that when adding a new package only ~ARCH keywords for arches that the dev(s) has tested it on should be included.

Once it's been added I'll file another bug to get the arch teams to add their ~ARCH keywords if it works correctly on those arches, I'll mention that you've tested it on those arches.
Comment 69 Jacob Lindberg 2007-02-08 11:24:39 UTC
Tom, 
Okay; not a problem.
Comment 70 Tom Knight (RETIRED) gentoo-dev 2007-02-23 16:22:15 UTC
I've tested this out and made a few modifications to the ebuild, once the SPARC team have added keywords for the required dependencies in bug 168060 bug 168062 and bug 168063 (which I've been told will be done by tomorrow evening) I'll add the 3.5.1 ebuild to the tree.

Thanks for everyone's patience and hard work that's gone into this.
Comment 71 Paul B. Henson 2007-02-23 22:13:36 UTC
Tom,

Glad to hear this is about to go into portage. It doesn't look like you posted the final version of the ebuild you plan to add, I was just wondering if you had the chance to incorporate any of the suggestions I made in comments #56/60.

Thanks much...
Comment 72 Tom Knight (RETIRED) gentoo-dev 2007-03-11 15:53:50 UTC
(In reply to comment #71)
> I was just wondering if you
> had the chance to incorporate any of the suggestions I made in comments #56/60.

Yes, I've removed the un-needed requirements on virtual/perl-Digest-MD5 and media-gfx/imagemagick they were left over from the previous version and are no longer needed.

I've also re-added the dbm USE flag to control the requirements needed for hashing support.
Comment 73 Jason Phillips 2007-03-12 14:55:32 UTC
(In reply to comment #72)
> Yes, I've removed the un-needed requirements on virtual/perl-Digest-MD5 and
> media-gfx/imagemagick they were left over from the previous version and are no
> longer needed.
> I've also re-added the dbm USE flag to control the requirements needed for
> hashing support.

Hi Tom. If the ebuild isn't going into Portage shortly, would you mind posting your latest version here? Thanks, Jason.
Comment 74 Tom Knight (RETIRED) gentoo-dev 2007-03-12 19:03:02 UTC
(In reply to comment #73)
> Hi Tom. If the ebuild isn't going into Portage shortly, would you mind posting
> your latest version here? Thanks, Jason.
> 

Too late, I've just added it to the tree :) It will show up on the mirrors within the next hour. Thanks to everyone who helped out.
Comment 75 aelber 2007-05-20 19:34:34 UTC
Created attachment 119839 [details]
Files modified to work with postgres instead of mysql

These are a sql schema, Config.pm, Hashing.pm, and FuzzyOcr.pm based on Juan's postgres patch.  I haven't included the cf since there's no change.  There are two changes in these files from Juan's patchset that should be noted:

First, the sql file does not try to drop the prior schema, and it assumes the database user will be "spamassassin" rather than FuzzyOCR.  This conforms to the instructions used when setting spamassassin itself to use Postgres.

Second, this disables the ability to use mysql.  Why does it do that?  Because the code to check installation of the right DBD:: class doesn't work right, and otherwise it throws an error every time it starts.