Bug 185718 - mail-filter/dspam-3.8.0-r2 - hash storage backend not installed
Bug#: 185718 Product:  Gentoo Linux Version: unspecified Platform: All
OS/Version: Linux Status: RESOLVED Severity: major Priority: P2
Resolution: FIXED Assigned To: mrness@gentoo.org Reported By: steeeeeveee@gmx.net
Component: Ebuilds
URL: 
Summary: mail-filter/dspam-3.8.0-r2 - hash storage backend not installed
Keywords:  
Status Whiteboard: 
Opened: 2007-07-18 00:09 0000
Description:   Opened: 2007-07-18 00:09 0000
1) I am trying to use DSPAM with the PostgreSQL driver (use flag postgres) but
the driver does not get installed. /usr/lib/dspam/ is absolutely empty.

2) The hash driver does not get compiled when emerging DSPAM. The hash driver
could/should be always compiled since the hash driver is not depended to any
database backend and always available in DSPAM. DSPAM supports the compilation
with multiple drivers which can be exchanged in dspam.conf.

Reproducible: Always

------- Comment #1 From Jakub Moc (RETIRED) 2007-07-18 06:14:36 0000 -------
(In reply to comment #0)
> 2) The hash driver does not get compiled when emerging DSPAM. The hash driver
> could/should be always compiled since the hash driver is not depended to any
> database backend and always available in DSPAM.

Erm, nope. It is intentionally disabled if you select anything else, because
it's buggy. Read the ewarn in ebuild.

------- Comment #2 From Alin Năstac 2007-07-18 06:33:39 0000 -------
(In reply to comment #0)
> 1) I am trying to use DSPAM with the PostgreSQL driver (use flag postgres) but
> the driver does not get installed. /usr/lib/dspam/ is absolutely empty.

When you select only one database backend, the driver is compiled as a static
library, thus the .so file does not exist.

> 2) The hash driver does not get compiled when emerging DSPAM. The hash driver
> could/should be always compiled since the hash driver is not depended to any
> database backend and always available in DSPAM. DSPAM supports the compilation
> with multiple drivers which can be exchanged in dspam.conf.

Jakub already replied to this point.

*** This bug has been marked as a duplicate of bug 156077 ***

------- Comment #3 From steveb 2007-07-18 11:29:27 0000 -------
(In reply to comment #1)
> (In reply to comment #0)
> > 2) The hash driver does not get compiled when emerging DSPAM. The hash driver
> > could/should be always compiled since the hash driver is not depended to any
> > database backend and always available in DSPAM.
> 
> Erm, nope. It is intentionally disabled if you select anything else, because
> it's buggy. Read the ewarn in ebuild.
> 
No! The hash driver is not buggy. You are referring to bug #179400?
The hash driver is not suited for every tokenizer and pvalue. The hash driver
is only there for Tokenizer SBPH and PValue markov. No other driver can be used
when you set the tokenizer to SBPH and PValue to markov. That's the reason the
hash driver is there.

The other backends are suited for all other tokenizers except for SBPH.

If you use the hash driver for anything else then SPBH then the driver will
only save one class in the storage engine. Either spam or ham. Depending which
one you hit first. But it will only save one class and not both.

The hash driver is +/- a one to one copy from Bill's Yerazunis CSS (CRM Sparse
Spectra) and it is heavy used in CRM114 (and in OSBF-Lua). Jonathan Zdziarski
added it to DSPAM some time ago and the only purpose is to serve as a storage
engine for CRM114's SBPH (Sparse Binary Polynomial Hashing) tokenizer. It has
it's purpose and I think you should readd it back into the DSPAM ebuild.

Using the hash driver for anything else then SBPH will lead into issues as
described in #179400

This all is described in the documentation of DSPAM (at least I think so. Maybe
I am working to long with DSPAM, CRM114, OSBF and have read this on the mailing
list or some were else. But I really think this is described in the
documentation).

------- Comment #4 From Alin Năstac 2007-07-19 08:35:10 0000 -------
Fixed in -r3. Now hash storage driver is always installed and I've added ewarns
at the beginning of pkg_postinst() about how to use it. Also, when no other
backend is installed, dspam.conf will have the following characteristics:
 - no StorageDriver selected (since hash is the only backend available, it will
be build as a static library)
 - Tokenizer sbph
 - PValue markov

Thank you very much for the info!

------- Comment #5 From steveb 2007-07-21 18:14:21 0000 -------
(In reply to comment #4)
> Fixed in -r3. Now hash storage driver is always installed and I've added ewarns
> at the beginning of pkg_postinst() about how to use it. Also, when no other
> backend is installed, dspam.conf will have the following characteristics:
>  - no StorageDriver selected (since hash is the only backend available, it will
> be build as a static library)
>  - Tokenizer sbph
>  - PValue markov
> 
> Thank you very much for the info!
> 

Markov is the killer. Using Markov with anything else then hash driver is a
show stopper.

Anyway... You made me rethink the Cron job for DSPAM. We need to rework that
beast to handle more scenarios. I have rewritten for me the script to handle
all compiled/installed drivers and added some other stuff to it (like vacuum
for DSPAM tables in PostgreSQL, compressing CSS files for the hash driver and
purging of signatures for the hash driver).

Should I post the new job here or should I open a new bug?

// SteveB

------- Comment #6 From Alin Năstac 2007-07-22 03:09:18 0000 -------
(In reply to comment #5)
> Markov is the killer. Using Markov with anything else then hash driver is a
> show stopper.

Doug had problems with hash driver and I doubt he was using markov (I suspect
he was using the default dspam.conf).


> Should I post the new job here or should I open a new bug?

Open a new bug, but don't expect me to respond faster than in a couple of weeks
because I'm on vacation. You got this reply because I have insomnia and I found
an open wireless network at my hotel. ;)