Bug 199700 - mail-filter/dspam segfaults if postgres is used
|
Bug#:
199700
|
Product: Gentoo Linux
|
Version: unspecified
|
Platform: All
|
|
OS/Version: Linux
|
Status: RESOLVED
|
Severity: major
|
Priority: P2
|
|
Resolution: FIXED
|
Assigned To: mrness@gentoo.org
|
Reported By: jmjoseph@andrew.cmu.edu
|
|
Component: Applications
|
|
|
URL:
|
|
Summary: mail-filter/dspam segfaults if postgres is used
|
|
Keywords:
|
|
Status Whiteboard:
|
|
Opened: 2007-11-19 20:23 0000
|
If the postgres driver is used, dspam segfaults on very many messages. I
initially expected a migration from mysql to postgres to be at fault, but
training new users (with no information in the database) also fails. I did not
experience these crashes when the mysql driver was used. This defect is
serious enough that dspam cannot be used.
I have attached one such crash-prone message, include a little information from
gdb, and list contents of dspam.debug below. I'd greatly appreciate any help
with this issue and would be happy to assist in any way.
For the record, I am running dspam 3.8.0-r7 along with a postgres 8.0.13
server. Note that the particular crash below occurs only when showing factors,
but disabling this only delays the crash to the next time CTX->factors->first
is accessed.
Note that I had originally posted this to the dspam-dev mailing list, but found
the project to be inactive.
GDB:
Short story: CTX->factors->first->ptr at dspam.c:3092 does not appear to be
initialized
----------------------------------------------------------------
Starting program: /usr/bin/dspam --debug --user jacob --mode=notrain --client
--stdout --deliver=innocent,spam <
/home/jacob/dbg_mail-linux-cluster-bounces@redhat.com-1194995056
[Thread debugging using libthread_db enabled]
[New Thread -1212545344 (LWP 17038)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1212545344 (LWP 17038)]
add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092
3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f",
(gdb) list
3087 node_ft = c_nt_first(CTX->factors, &c_ft);
3088 while(node_ft != NULL) {
3089 struct dspam_factor *f = (struct dspam_factor *)
node_ft->ptr;
3090 if (f) {
3091 strlcat(data, ",\n\t", sizeof(data));
3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f",
3093 f->token_name, f->value);
3094 strlcat(data, scratch, sizeof(data));
3095 }
3096 node_ft = c_nt_next(CTX->factors, &c_ft);
(gdb) bt
#0 add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092
#1 0x08053442 in process_message (ATX=0xbfe51b18, message=0x8062bb8,
username=0x8062930 "jacob", result_string=0xbfe519d0) at dspam.c:727
#2 0x08054295 in process_users (ATX=0xbfe51b18, message=0x8062b10)
at dspam.c:1797
#3 0x08054f30 in main (argc=Cannot access memory at address 0x0
) at dspam.c:258
(gdb) print CTX->factors
$11 = (struct nt *) 0x806e688
(gdb) print *CTX->factors
$12 = {
first = 0x806e570,
insert = 0x616d6c69,
items = 1700146542,
nodetype = 1869181810
}
(gdb) print *CTX->factors->first
$13 = {
ptr = 0x38,
next = 0x20
}
-----------------------------------------------------------
dspam.debug
-----------------------------------------------------------
17124: [11/14/2007 00:19:02] No QuarantineAgent option found. Using standard
quarantine.
17124: [11/14/2007 00:19:02] DSPAM Instance Startup
17124: [11/14/2007 00:19:02] input args: /usr/bin/dspam --debug --user jacob
--mode=notrain --client --stdout --deliver=innocent,spam
17124: [11/14/2007 00:19:02] pass-thru args:
17124: [11/14/2007 00:19:02] processing user jacob
17124: [11/14/2007 00:19:02] uid = 0, euid = 0, gid = 0, egid = 503
17124: [11/14/2007 00:19:02] loading preferences for user jacob
17124: [11/14/2007 00:19:02] Loading preferences for uid 680
17124: [11/14/2007 00:19:02] Loading preferences for uid 0
17124: [11/14/2007 00:19:02] default preferences empty. reverting to dspam.conf
preferences.
17124: [11/14/2007 00:19:02] Loading preferences from dspam.conf
17124: [11/14/2007 00:19:02] using /var/spool/dspam/opt-in/local/jacob.dspam as
path
17124: [11/14/2007 00:19:02] using /var/spool/dspam/opt-out/local/jacob.nodspam
as path
17124: [11/14/2007 00:19:02] sedation level set to: 0
17124: [11/14/2007 00:19:02] Connecting to 127.0.0.1:3310 for virus check
17124: [11/14/2007 00:19:02] Loading 278 BNR patterns
17124: [11/14/2007 00:19:02] bnr reported snr of 6.597
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.05_0.40_
0.01000 0s 3i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.40_0.40_
0.01000 0s 7i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.05_0.05_
0.01000 0s 41i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.00_0.00_0.05_
0.01000 0s 38i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.40_0.05_
0.01000 0s 3i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.05_0.40_
0.01000 0s 7i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.40_0.40_
0.01000 0s 8i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.t| 0.01000 0s 46i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.05_0.05_
0.01000 0s 3i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.40_0.05_
0.01000 0s 7i
17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.00_0.05_0.05_
0.01000 0s 38i
17124: [11/14/2007 00:19:02] Whitelist threshold: 10
17124: [11/14/2007 00:19:02] [burton] [0.001190] is+> (1frq, 0s, 614i)
17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i)
17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i)
17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i)
17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i)
17124: [11/14/2007 00:19:02] [burton] [0.002888] https+// (1frq, 11s, 2778i)
17124: [11/14/2007 00:19:02] [burton] [0.003946] X-Mailman-Version*2.1.5 (1frq,
29s, 5355i)
17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i)
17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i)
17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i)
17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i)
17124: [11/14/2007 00:19:02] [burton] [0.006364] https (1frq, 26s, 2970i)
17124: [11/14/2007 00:19:02] [burton] [0.006518] List-Post*<mailto (1frq, 62s,
6913i)
17124: [11/14/2007 00:19:02] [burton] [0.006790] List-Help*request (1frq, 55s,
5886i)
17124: [11/14/2007 00:19:02] [burton] [0.008260] Errors-To*bounces (1frq, 67s,
5885i)
17124: [11/14/2007 00:19:02] [burton] [0.008264] Sender*bounces (1frq, 67s,
5882i)
17124: [11/14/2007 00:19:02] [burton] [0.008533] Url*redhat (1frq, 0s, 85i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Help*cluster+request
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Post*cluster+redhat.com>
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634]
List-Subscribe*<https+//www.redhat.com/mailman/listinfo/linux (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] cluster+redhat (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Subject*[Linux+cluster] (1frq,
0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] list+Linux (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Sender*linux+cluster (1frq,
0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634]
List-Subscribe*//www.redhat.com/mailman/listinfo/linux+cluster> (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] redhat+com (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Return-Path*cluster+bounces
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] Burton-Bayesian Probability: 0.000000 Samples: 27
17124: [11/14/2007 00:19:02] no factors specified; using default
17124: [11/14/2007 00:19:02] Result Confidence: 0.99
17124: [11/14/2007 00:19:02] [burton] [0.001190] is+> (1frq, 0s, 614i)
17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i)
17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i)
17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i)
17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i)
17124: [11/14/2007 00:19:02] [burton] [0.002888] https+// (1frq, 11s, 2778i)
17124: [11/14/2007 00:19:02] [burton] [0.003946] X-Mailman-Version*2.1.5 (1frq,
29s, 5355i)
17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i)
17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i)
17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i)
17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i)
17124: [11/14/2007 00:19:02] [burton] [0.006364] https (1frq, 26s, 2970i)
17124: [11/14/2007 00:19:02] [burton] [0.006518] List-Post*<mailto (1frq, 62s,
6913i)
17124: [11/14/2007 00:19:02] [burton] [0.006790] List-Help*request (1frq, 55s,
5886i)
17124: [11/14/2007 00:19:02] [burton] [0.008260] Errors-To*bounces (1frq, 67s,
5885i)
17124: [11/14/2007 00:19:02] [burton] [0.008264] Sender*bounces (1frq, 67s,
5882i)
17124: [11/14/2007 00:19:02] [burton] [0.008533] Url*redhat (1frq, 0s, 85i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Help*cluster+request
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Post*cluster+redhat.com>
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634]
List-Subscribe*<https+//www.redhat.com/mailman/listinfo/linux (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] cluster+redhat (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Subject*[Linux+cluster] (1frq,
0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] list+Linux (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Sender*linux+cluster (1frq,
0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634]
List-Subscribe*//www.redhat.com/mailman/listinfo/linux+cluster> (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] redhat+com (1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] [burton] [0.008634] Return-Path*cluster+bounces
(1frq, 0s, 84i)
17124: [11/14/2007 00:19:02] Burton-Bayesian Probability: 0.000000 Samples: 27
17124: [11/14/2007 00:19:02] Result Confidence: 0.99
17124: [11/14/2007 00:19:02] BNR Decision Concurs
17124: [11/14/2007 00:19:02] total processing time: 0.08533s
17124: [11/14/2007 00:19:02] saving signature as 473a8546171241968551123
17124: [11/14/2007 00:19:02] libdspam returned probability of 0.000000
17124: [11/14/2007 00:19:02] message result: NOT SPAM
--------------------------------------------------------------
Please attach your dspam.conf. I also need the output of "equery uses
mail-filter/dspam" command.
I have attached my dspam.conf, as well as a (1-message) training corpus that
will crash from an empty database. In particular, clearing the user 'jacob'
from the database, and running dspam_train on the attached tarball crashes
immediately. I am running 'dspam_train jacob t_spam t_nonspam'.
Thanks for the assistance. Note that I have received a reply on the
dspam-users mailing list, but no resolution as of yet. I will post here if any
is found.
# equery uses mail-filter/dspam
[ Searching for packages matching mail-filter/dspam... ]
[ Colour Code : set unset ]
[ Legend : Left column (U) - USE flags from make.conf ]
[ : Right column (I) - USE flags packages was installed with ]
[ Found these USE variables for mail-filter/dspam-3.8.0-r7 ]
U I
+ + clamav : Adds support for Clam AntiVirus software (usually with a
plugin)
+ + daemon : Enable support for DSPAM to run in --daemon mode
+ + debug : Enable extra debug codepaths, like asserts and extra
output. If you want to get meaningful backtraces see
http://www.gentoo.org/proj/en/qa/backtraces.xml .
- - large-domain : Builds for large domain rather than for domain scale
+ + ldap : Adds LDAP support (Lightweight Directory Access Protocol)
+ + mysql : Adds mySQL Database support
+ + postgres : Adds support for the postgresql database
- - sqlite : Adds support for sqlite - embedded sql database
- - syslog : Enables support for syslog
- - user-homedirs : Build with user homedir support
- - virtual-users : Build with virtual-users support
Created an attachment (id=136539) [details]
training corpus
This tarball contains a single good message. When dspam_train is used with it,
and an empty database schema, I observe the described segmentation fault
immediately.
I am unable to reproduce.
Test environment:
- postgres-8.0.13 installed with USE="doc kerberos nls pam perl python
readline ssl tcl test xml zlib
- dspam-3.8.0-r7 instaled with USE="clamav daemon ldap mysql postgres sqlite"
- setup postgresql server using emerge --config postgresql
- start postgresql server
- install dspam; replace dspam.conf with the attached conf
- start dspam service
- run dspam_train mrness t_spam t_nonspam
Result: work as expected
Output:
dspam_train mrness t_spam t_nonspam
Taking Snapshot...
mrness
TP: 0 TN: 0 FP: 0 FN: 0 SC: 0 NC: 0
Training t_nonspam / t_spam corpora...
[test: nonspam] 1195138440.M148448P15692V0000000 result: PASS
TRAINING COMPLETE
Training Snapshot:
mrness
TP: 0 TN: 1 FP: 0 FN: 0 SC: 0 NC: 0
SHR: 100.00% HSR: 0.00% OCA: 100.00%
Overall Statistics:
mrness
TP: 0 TN: 1 FP: 0 FN: 0 SC: 0 NC: 0
SHR: 100.00% HSR: 0.00% OCA: 100.00%
Did I've missed something?
Ah, I forgot the step between "install dspam" and "start dspam":
- configure postgresql database by running emerge --config dspam
@Jacob Joseph: Could you modify your ebuild and add the verbose debug option to
it and re-emerge DSPAM and then post the output of the new debug log?
Changes needed to be done:
$(use_enable syslog) \
$(use_enable debug) \
$(use_enable debug bnr-debug) \
+ $(use_enable debug verbose-debug) \
--enable-long-usernames \
--with-dspam-group=dspam \
--with-dspam-home-group=dspam \
Steve, Alin, thank you for your testing.
I have tested further mysql on this and other machines. I have tested with
3.8.0-r7 and 3.8.0-r8, with verbose debugging on. With repeated compilations,
and repeated runs, it does seem that dspam, and dspam_train crash only
sometimes. The backtrace is always as previously posted. While the posted
training corpus sometimes does not cause a crash, some other message always
does. For obvious reasons, I cannot post all of my training corpus to fully
illustrate this.
I will attach the verbose debug log shortly.
Created an attachment (id=137659) [details]
Message associated with the above verbose output
Command line used:
/usr/bin/dspam --user jacob --deliver=summary --stdout <
'1195501649.M638339P24186V0000000000000302I002DB2C4_5967.jjoseph.org,S=29974:2,S'
(In reply to comment #9 #10 #11)
>
Are you fixed on PGSql or would you consider moving to MySQL as an option?
The reason fro the crashes with MySQL are probably:
- In DSPAM you MaxMessageSize (let's abbreviate this to MMS) xxx
- In MySQL you have max_allowed_packet (let's abbreviate this to MAP) xxx
- You got a mail with total size (let's abbreviate this to TS) xxx
Now the problem is that DSPAM will create tokens and data for the message you
get. In one point it time it will insert the data into MySQL. Since one part of
the data inserted into MySQL is in binary (a blob) DSPAM will call MySQL C API
functions to escape the binary data. This C API calls need maximum double
amount of memory plus one byte then the real binary data has. Now the problem
starts if:
((size of binary data) * 2 + 1) + (space needed for the insert SQL commands) >
MAP
Another problem is if:
((MMS * 2 + 1) > MAP
This is one reason why DSPAM in the current release can crash.
Another reason is that when DSPAM does tokenize the mail it searches the
storage backend for tokens maching the new tokenized tokens. It creates a huge
SQL query to search for the tokens. The generation of the SQL query can grow
pretty big. Depending how much tokens you have for the user or any group the
user belongs to. If you have a lot of tokens DSPAM can easy create a SQL query
bigger then MAP and fail when executing that query. Unfortunately DSPAM then
closes the handle to the storage backend and becomes unusable until you restart
the daemon. Or if you don't use the client/server (or agent) mode it will just
crash the DSPAM binary and not deliver any result.
Does this sound logical to you?
I have taken some time to fix that issue in the MySQL driver. But not in
PostgreSQL. I first want to get MySQL running and then focus on PostgreSQL.
If you are not limited in what storage engine you use in DSPAM, then I would
offer you to post here a patch for MySQL driver in DSPAM. I am testing since 3
days this driver on my end and it works. But I am not 100% finished with it. If
you can wait then I would be very thankful. I promise to soon post the patch
here.
Is that okay with you?
In the mean time you could switch to MySQL and increase max_allowed_packet to
lets say 128MB or any thing above 60MB and then test again. I am very confident
that you will have less crashes. Do decrease the MaxMessageSize as well. Try to
stay with max_allowed_packet to at least MaxMessageSize * 3.
btw: The message you posted has already DSPAM headers. If you want to train
correctly, then you need to wipe out the DSPAM headers and the DSPAM signature
in the body (if you have any).
Running something like this here should clean most of the irrelevant stuff from
your corpus:
for foo in *
do
sed -i
"/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-DKIM:\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X\-Virus\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\-policyd\-weight:\|X\-Spam\-[^:]*:\)
.*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\-Signature:
[0-9a-f,]*$/d;s/^Subject: \(ADV\|UNS\): /Subject: /;s/^Subject: \[SPAM\][\t
]*/Subject: /;s/^Subject: \[\(\+\|\-\)\{1,2\}\] /Subject:
/;s/\!DSPAM:[0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g" ${foo}
done
// SteveB
Steve,
I do appreciate your detailed reply. As I mentioned in my first message, all
of the tests mentioned here do not occur when mysql is used. Indeed, I only
encountered any of this upon migrating to postgres, my preferred database.
Without fault to dspam, I would be surprised to see that such things as the
maximum packet size would cause intermittent failure. It sounds very much like
memory corruption to me. Additionally, postgres reports no other error than
that the client dropped the connection. When the dspam daemon is used, it will
continue to service 'dspam --client...' with other messages.
As for the dspam headers being present in the message already, I mean these
messages only as an example of failure I was seeing on new, incoming messages.
I'd be happy to retest my corpus after their removal to either show this is a
bug, or eliminate it as a possibility.
My goal in posting this bug was two-fold. First, I'd like dspam to work for
me. Additionally, there is little point in someone else suffering similar
troubles.
I still would very much appreciate further suggestions to diagnose the problem
at hand. I'm unsure what more can be done to reproduce it your systems.
Thanks again. ~Jacob
Unfortunatelly no one (including upstream) were able to reproduce your
segfault. Since even you don't have a safe way to reproduce the bug I can only
conclude 2 things:
a) you are affected by some kind of race (btw, how many processors you
have?). I had in the past some bad experiences with gdb when I tried to debug
multi-threaded code, therefore I must ask you to compare gdb results with
strace.
b) your problem is caused by an external factor. Please verify your hardware
by running a complete memtest pass.
If you don't detect a hardware problem, please give complete step-by-step
instructions for reproducing the segfault starting from an empty database. If
you cannot obtain a segfault every time you try it, give us a frequency
estimate .
(In reply to comment #0)
> GDB:
> Short story: CTX->factors->first->ptr at dspam.c:3092 does not appear to be
> initialized
> ----------------------------------------------------------------
> Starting program: /usr/bin/dspam --debug --user jacob --mode=notrain --client
> --stdout --deliver=innocent,spam <
> /home/jacob/dbg_mail-linux-cluster-bounces@redhat.com-1194995056
> [Thread debugging using libthread_db enabled]
> [New Thread -1212545344 (LWP 17038)]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1212545344 (LWP 17038)]
> add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092
> 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f",
> (gdb) list
> 3087 node_ft = c_nt_first(CTX->factors, &c_ft);
> 3088 while(node_ft != NULL) {
> 3089 struct dspam_factor *f = (struct dspam_factor *)
> node_ft->ptr;
> 3090 if (f) {
> 3091 strlcat(data, ",\n\t", sizeof(data));
> 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f",
> 3093 f->token_name, f->value);
> 3094 strlcat(data, scratch, sizeof(data));
> 3095 }
> 3096 node_ft = c_nt_next(CTX->factors, &c_ft);
> (gdb) bt
> #0 add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092
> #1 0x08053442 in process_message (ATX=0xbfe51b18, message=0x8062bb8,
> username=0x8062930 "jacob", result_string=0xbfe519d0) at dspam.c:727
> #2 0x08054295 in process_users (ATX=0xbfe51b18, message=0x8062b10)
> at dspam.c:1797
> #3 0x08054f30 in main (argc=Cannot access memory at address 0x0
> ) at dspam.c:258
> (gdb) print CTX->factors
> $11 = (struct nt *) 0x806e688
> (gdb) print *CTX->factors
> $12 = {
> first = 0x806e570,
> insert = 0x616d6c69,
> items = 1700146542,
> nodetype = 1869181810
> }
> (gdb) print *CTX->factors->first
> $13 = {
> ptr = 0x38,
> next = 0x20
> }
> -----------------------------------------------------------
>
How did you manage to get that backtrace in GDB? I can't get DSPAM to produce
me a backtrace when it crashes. How did you compile DSPAM? What CPU are you
using? Did you compiled other packages with the debug USE flag? Do you use PAM
and have you enabled core dumps in your system?
// SteveB
BTW: Could you emerge DSPAM without debug USE flag and try if your installation
works more stable then with debug USE flag?
Steve,
I'm unsure what aspect of debugging with gdb is failing for you, but I've
configured the 3.8.0-r8 ebuild with:
./configure --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man
--infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
--localstatedir=/var/lib --with-storage-driver=hash_drv,mysql_drv,pgsql_drv
--with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --enable-daemon
--enable-ldap --enable-clamav --disable-large-scale --enable-domain-scale
--disable-syslog --enable-debug --enable-bnr-debug --enable-verbose-debug
--enable-long-usernames --with-dspam-group=dspam --with-dspam-home-group=dspam
--with-dspam-mode=2511 --with-logdir=/var/log/dspam --disable-virtual-users
--enable-preferences-extension --disable-homedir
--with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql
--with-pgsql-includes=/usr/include/postgresql
--with-pgsql-libraries=/usr/lib/postgresql --build=i686-pc-linux-gnu
I'm also using CFLAGS="-march=athlon-xp -O1 -g -pipe". To be
absolutely clear, I have recompiled glibc, postgres, clamav, and
openldap with these CFLAGS. Much of the rest of the system has
"-march=athlon-xp -03 -pipe". You may also find it usefull to add
'nostrip' and 'noclean' to your FEATURES in /etc/make.conf. The
latter, in particular, will keep the sources around where gdb can find them.
To get a backtrace on a message that I know causes a segfault, I
perform the following:
<delete all tokens for the user jacob from the database, begin training>
# dspam_train jacob t_spam t_nonspam
<Wait for one message to segfault>
...
[test: spam ] 1195501649.M366542P24186V0000000 result: sh: line 1: 8860
Segmentation fault /usr/bin/dspam --user jacob --deliver=summary --stdout
<
't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S'
BROKEN result!!
...
<Verify we still segfault, and run in gdb>
# /usr/bin/dspam --user jacob --deliver=summary --stdout <
't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S'
Segmentation fault
# gdb /usr/bin/dspam
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run --user jacob --deliver=summary --stdout <
't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S'
Starting program: /usr/bin/dspam --user jacob --deliver=summary --stdout <
't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S'
[Thread debugging using libthread_db enabled]
[New Thread -1212250432 (LWP 8965)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1212250432 (LWP 8965)]
add_xdspam_headers (CTX=0x807c728, ATX=0xbfc989c8) at dspam.c:3086
3086 struct dspam_factor *f = (struct dspam_factor *)
node_ft->ptr;
(gdb) bt
#0 add_xdspam_headers (CTX=0x807c728, ATX=0xbfc989c8) at dspam.c:3086
#1 0x08053672 in process_message (ATX=0xbfc989c8, message=0x80629f8,
username=0x8062930 "jacob", result_string=0xbfc98880) at dspam.c:724
#2 0x080544cb in process_users (ATX=0xbfc989c8, message=0x8062950) at
dspam.c:1794
#3 0x08055171 in main (argc=5, argv=0xbfc998a4) at dspam.c:255
(gdb) list
3081 if (CTX->factors != NULL) {
3082 snprintf(data, sizeof(data), "X-DSPAM-Factors: %d",
3083 CTX->factors->items);
3084 node_ft = c_nt_first(CTX->factors, &c_ft);
3085 while(node_ft != NULL) {
3086 struct dspam_factor *f = (struct dspam_factor *)
node_ft->ptr;
3087 if (f) {
3088 strlcat(data, ",\n\t", sizeof(data));
3089 snprintf(scratch, sizeof(scratch), "%s, %2.5f",
3090 f->token_name, f->value);
(gdb)
Though the particular message involved in a crash seems to vary from
compile to compile, I always experience a crash on my training
corpus. When configured without any of the debug options (but the same
CFLAGS), I am not currently able to reproduce the above segfault thus far.
(In reply to comment #18)
> I have posted a large training corpus at
> http://www.jjoseph.org/files/training_corpus.tar.bz2. With the debug use flag,
> I have never been able to train on this data set without experiencing a
> reproducible segfault on a particular message.
>
I downloaded your corpus and started training on my DSPAM 3.8.0 with my own
improved/fixed MySQL driver.
The system is not the fastest in the world:
mail ~ # cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon (TM)
stepping : 4
cpu MHz : 1400.180
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 2803.66
clflush size : 32
mail ~ # free -m
total used free shared buffers cached
Mem: 1012 990 21 0 0 333
-/+ buffers/cache: 657 355
Swap: 1535 765 770
mail ~ #
So far no crashes with the MySQL backend:
-----------------------------------------
Training Snapshot:
jmjoseph
TP: 6807 TN: 880 FP: 12 FN: 75 SC: 25 NC: 0
SHR: 98.91% HSR: 1.35% OCA: 98.88%
Overall Statistics:
jmjoseph
TP: 6807 TN: 880 FP: 12 FN: 75 SC: 25 NC: 0
SHR: 98.91% HSR: 1.35% OCA: 98.88%
real 31m26.005s
user 0m58.616s
sys 0m44.377s
mail ~ # dspam_stats -s -H jmjoseph
jmjoseph:
TP True Positives: 6807
TN True Negatives: 880
FP False Positives: 12
FN False Negatives: 75
SC Spam Corpusfed: 25
NC Nonspam Corpusfed: 0
TL Training Left: 0
SHR Spam Hit Rate 98.91%
HSR Ham Strike Rate: 1.35%
PPV Positive predictive value: 99.82%
OCA Overall Accuracy: 98.88%
mail ~ #
-----------------------------------------
Tomorrow after work I will test with DSPAM 3.8.0-r8 and PostgreSQL and one
small patch I made one week ago for PostgreSQL.
The MySQL patch I made is not small. I added 511 lines and removed 249 lines in
the MySQL driver. While I was working on the MySQL driver I have looked inside
the PostgreSQL driver to see how things are there solved. Now I know that the
PostgreSQL driver is as well not in the best shape. I need to take my time and
do the same changes to PostgreSQL diver as I did for MySQL.
Created an attachment (id=138206) [details]
dspam-3.8.0-fix_bnr_debug.patch
This patch fixes (hopefully) the crash when using --enable-bnr-debug. Have fun
:)
@Alin: If you are going to include that patch into Gentoo, could you then as
well change the ebuild to have this:
# Debug build
if use debug; then
filter-flags "-fomit-frame-pointer" "-O?"
append-flags "-O0" "-ggdb"
fi
And this here:
$(use_enable debug bnr-debug) \
$(use_enable debug verbose-debug) \
--enable-verbose-debug does implicit set --enable-debug so it is not needed.
But does no harm if you do:
$(use_enable debug) \
$(use_enable debug bnr-debug) \
$(use_enable debug verbose-debug) \
Can you both test this patch and enable:
--enable-debug
--enable-bnr-debug
--enable-verbose-debug
And then check if DSPAM still crashes?
// SteveB
Steve, your patch does seem to fix my crashes. I'll post back if a more
complete test reveals any further news. Thank you. ~Jacob
(In reply to comment #21)
> Steve, your patch does seem to fix my crashes. I'll post back if a more
> complete test reveals any further news. Thank you. ~Jacob
>
Perfect! You have to thank Alin. Without him I would never know any thing about
this bug. Thanks Alin!
Fixed in -r9, thanks!
I've added $(use_enable debug verbose-debug), but I didn't mess with CFLAGS as
you suggested in comment #20. AFAIK users who want to debug a certain program
have to know how to mangle CFLAGS and FEATURES at package level. I see no
reason to override user's CFLAGS.
(In reply to comment #23)
> Fixed in -r9, thanks!
>
> I've added $(use_enable debug verbose-debug), but I didn't mess with CFLAGS as
> you suggested in comment #20. AFAIK users who want to debug a certain program
> have to know how to mangle CFLAGS and FEATURES at package level. I see no
> reason to override user's CFLAGS.
>
I am not on your line. If some one enables debug USE flag, then I would expect
that the package gets compiled with debug options. I would understand your
intervention if DSPAM would be the only package overwriting/filtering/enforcing
certain CFLAGS with debug USE flags. But we are not.
Allow me to ask the other way around: Why USE flags at all? If some one wants
to use certain switches (which are right now provided with the help of the USE
flags) then we could expect the user to know CFLAGS, FEATURES, LDFLAGS and
friends at package level.
My question does not make sense. I know. But I think not setting the CFLAGS for
a package where the user explicitly requests "debug" in the USE flags is as
well not making sense.
What do you think?
Debug flag is never used (or shouldn't be used) as a synonym to "build the
program with -O0 -gddb and don't strip the debug info". In this case, it
activates debug logging. Yeah, not exactly intuitive, but this is how upstream
decided to implement things. If I were the author, these would have been
accomplished through command line options, not compilation defines.
Anyway, the debug USE flag reason of being has been discussed over the time on
dev ml. I distinctly remember that adding -ggdb to CFLAGS is a no-no.
(In reply to comment #25)
> Debug flag is never used (or shouldn't be used) as a synonym to "build the
> program with -O0 -gddb and don't strip the debug info". In this case, it
> activates debug logging. Yeah, not exactly intuitive, but this is how upstream
> decided to implement things. If I were the author, these would have been
> accomplished through command line options, not compilation defines.
>
I did not say anything about "don't strip the debug info".
> Anyway, the debug USE flag reason of being has been discussed over the time on
> dev ml. I distinctly remember that adding -ggdb to CFLAGS is a no-no.
>
Well... then I have a nice list of such "no-no's":
grep -R "\-ggdb" /usr/portage/
Anyway... it is okay.