Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 175317 - SpamAssassin bayes MySQL store suboptimally indexed in mailfiltering guide
Summary: SpamAssassin bayes MySQL store suboptimally indexed in mailfiltering guide
Status: RESOLVED FIXED
Alias: None
Product: [OLD] Docs on www.gentoo.org
Classification: Unclassified
Component: Other documents (show other bugs)
Hardware: All Linux
: High minor (vote)
Assignee: nm (RETIRED)
URL: http://www.gentoo.org/doc/en/mailfilt...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-20 08:02 UTC by Janne Pikkarainen
Modified: 2007-04-25 07:35 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Janne Pikkarainen 2007-04-20 08:02:56 UTC
I just found out that bayes_token table in mailfiltering guide codelisting 10.2 is not very suitably indexed. 

---
CREATE TABLE bayes_token (
          id                    int(11)         NOT NULL default '0',
          token                 char(5)         NOT NULL default '',
          spam_count            int(11)         NOT NULL default '0',
          ham_count             int(11)         NOT NULL default '0',
          atime                 int(11)         NOT NULL default '0',
          PRIMARY KEY           (id, token)
          ) TYPE=MyISAM;
---

In my case SpamAssassin is every now and then performing queries against atime column. That is not indexed at all in the mailfiltering guide. Slow queries logged includes SELECTS like this one:

---
SELECT min(atime) FROM bayes_token WHERE id = '1';
---

My bayes_token table contains about 8,6 million records, so a full table scan takes a while. Right after I created a new index with

---
CREATE INDEX id_atime_index ON bayes_token(id, atime);
---

MySQL started to perform A LOT faster. Check this out:

---
mysql> SELECT min(atime) FROM bayes_token WHERE id = '1';
+------------+
| min(atime) |
+------------+
| 1176345812 |
+------------+
1 row in set (0.00 sec)

mysql> SELECT min(atime) FROM bayes_token IGNORE INDEX (id_atime_index) WHERE id = '1';
+------------+
| min(atime) |
+------------+
| 1176345812 |
+------------+
1 row in set (19.96 sec)
---

My suggestion to this problem is to initially create table with this command:

---
CREATE TABLE bayes_token (
          id                    int(11)         NOT NULL default '0',
          token                 char(5)         NOT NULL default '',
          spam_count            int(11)         NOT NULL default '0',
          ham_count             int(11)         NOT NULL default '0',
          atime                 int(11)         NOT NULL default '0',
          PRIMARY KEY           (id, token),
          INDEX                 (id, atime)
          ) TYPE=MyISAM;
---

Reproducible: Always
Comment 1 nm (RETIRED) gentoo-dev 2007-04-20 15:30:28 UTC
Preliminary feedback on #gentoo-dev seems to agree that adding an index is helpful.
Comment 2 nm (RETIRED) gentoo-dev 2007-04-25 07:27:06 UTC
Thanks for your suggestion. Fixed in CVS.
Comment 3 Janne Pikkarainen 2007-04-25 07:35:22 UTC
Thank you very much!