The Bayesian filter with SpamAssassin needs DB_File to work, but this is currently only pulled in with the berkdb USE flag. Without berkdb, DB_File is unavailable and SpamAssassin is untrainable.
What solution are you looking for?
DB_file is stated as optional in SpamAssassin's INSTALL file:
- DB_File (from CPAN, included in many distributions)
Used to store data on-disk, for the Bayes-style logic, TxRep, and
auto-whitelist. *Much* more efficient than the other standard Perl
database packages. Strongly recommended.
And it does state that you'll need it for "Bayes-style logic" (ie: training). However, training is not required to run SpamAssassin.
DB_File is perl's module for accessing berkdb.
DB_File is a module which allows Perl programs to make use of the facilities provided by Berkeley DB [...]
(virtual/perl-DB_File just requires that dev-lang/perl have USE="berkdb" set.)
Real problem is that USE=berkdb needs to be enabled on dev-lang/perl, or you need to use MySQL/PostgreSQL in order to have a Bayes DB.
I'm still not following what the issue or desired solution is.
If you want berkdb storage, you set berkdb (which then propagates +berkdb to dev-lang/perl via virtual/perl-DB_File, right?).
If you want SQL storage, you set of the SQL flags.
If you're the 1% that just wants downloaded rules -- there's nothing that needs to be stored, so you don't have to set any of them.
In my case my setup just broke after upgrading perl, because berkdb is now dropped. You could argue that that's my fault because of the berkdb flag change. I'll buy that, fact remains that the bayesian filter only works with berkdb (not gdbm) and that bayes support unexpectedly broke.
Perhaps something like USE=bayes on spamassassin could have a required use of one or more of berkdb and sql flags. I think the most important change is that by default it worked (because berkdb) and now it doesn't.
(In reply to Fabian Groffen from comment #4)
> Perhaps something like USE=bayes on spamassassin could have a required use
> of one or more of berkdb and sql flags. I think the most important change
> is that by default it worked (because berkdb) and now it doesn't.
@grobian, could I get any input you might have on https://github.com/gentoo/gentoo/pull/21801 ? Thanks.
I think the change suggested in the pull-request makes it explicit that some deps are necessary to enable bayes support.
Since bayes is auto-enabled, I support the +bayes construct. I think none of the required flags are enabled by default though, so it will trigger a choice to be made by the user.
Perhaps, it would be better to also change the defaults in local.cf:
use_bayes 1/0 (based on USE=bayes)
and for SQL perhaps commented out suggestions for:
I recently migrated from db to mysql, which took hours (almost a day) so it isn't a task to be taken lightly.
I guess many people who do not use sa-learn have not noticed that their bayes setup got broken recently, perhaps this ebuild change will also notify them to review their setup.
That pull request doesn't seem to have been well received.
How would you feel instead about a post-install message that warns that bayes support is unavailable if none of the storage USE flags are set?
It just needs a bit more work to actually enable/disable bayes in the config, such that people have to enable bayes explicitly (and then have a db option available).