If the postgres driver is used, dspam segfaults on very many messages. I initially expected a migration from mysql to postgres to be at fault, but training new users (with no information in the database) also fails. I did not experience these crashes when the mysql driver was used. This defect is serious enough that dspam cannot be used. I have attached one such crash-prone message, include a little information from gdb, and list contents of dspam.debug below. I'd greatly appreciate any help with this issue and would be happy to assist in any way. For the record, I am running dspam 3.8.0-r7 along with a postgres 8.0.13 server. Note that the particular crash below occurs only when showing factors, but disabling this only delays the crash to the next time CTX->factors->first is accessed. Note that I had originally posted this to the dspam-dev mailing list, but found the project to be inactive. GDB: Short story: CTX->factors->first->ptr at dspam.c:3092 does not appear to be initialized ---------------------------------------------------------------- Starting program: /usr/bin/dspam --debug --user jacob --mode=notrain --client --stdout --deliver=innocent,spam < /home/jacob/dbg_mail-linux-cluster-bounces@redhat.com-1194995056 [Thread debugging using libthread_db enabled] [New Thread -1212545344 (LWP 17038)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1212545344 (LWP 17038)] add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f", (gdb) list 3087 node_ft = c_nt_first(CTX->factors, &c_ft); 3088 while(node_ft != NULL) { 3089 struct dspam_factor *f = (struct dspam_factor *) node_ft->ptr; 3090 if (f) { 3091 strlcat(data, ",\n\t", sizeof(data)); 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f", 3093 f->token_name, f->value); 3094 strlcat(data, scratch, sizeof(data)); 3095 } 3096 node_ft = c_nt_next(CTX->factors, &c_ft); (gdb) bt #0 add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092 #1 0x08053442 in process_message (ATX=0xbfe51b18, message=0x8062bb8, username=0x8062930 "jacob", result_string=0xbfe519d0) at dspam.c:727 #2 0x08054295 in process_users (ATX=0xbfe51b18, message=0x8062b10) at dspam.c:1797 #3 0x08054f30 in main (argc=Cannot access memory at address 0x0 ) at dspam.c:258 (gdb) print CTX->factors $11 = (struct nt *) 0x806e688 (gdb) print *CTX->factors $12 = { first = 0x806e570, insert = 0x616d6c69, items = 1700146542, nodetype = 1869181810 } (gdb) print *CTX->factors->first $13 = { ptr = 0x38, next = 0x20 } ----------------------------------------------------------- dspam.debug ----------------------------------------------------------- 17124: [11/14/2007 00:19:02] No QuarantineAgent option found. Using standard quarantine. 17124: [11/14/2007 00:19:02] DSPAM Instance Startup 17124: [11/14/2007 00:19:02] input args: /usr/bin/dspam --debug --user jacob --mode=notrain --client --stdout --deliver=innocent,spam 17124: [11/14/2007 00:19:02] pass-thru args: 17124: [11/14/2007 00:19:02] processing user jacob 17124: [11/14/2007 00:19:02] uid = 0, euid = 0, gid = 0, egid = 503 17124: [11/14/2007 00:19:02] loading preferences for user jacob 17124: [11/14/2007 00:19:02] Loading preferences for uid 680 17124: [11/14/2007 00:19:02] Loading preferences for uid 0 17124: [11/14/2007 00:19:02] default preferences empty. reverting to dspam.conf preferences. 17124: [11/14/2007 00:19:02] Loading preferences from dspam.conf 17124: [11/14/2007 00:19:02] using /var/spool/dspam/opt-in/local/jacob.dspam as path 17124: [11/14/2007 00:19:02] using /var/spool/dspam/opt-out/local/jacob.nodspam as path 17124: [11/14/2007 00:19:02] sedation level set to: 0 17124: [11/14/2007 00:19:02] Connecting to 127.0.0.1:3310 for virus check 17124: [11/14/2007 00:19:02] Loading 278 BNR patterns 17124: [11/14/2007 00:19:02] bnr reported snr of 6.597 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.05_0.40_ 0.01000 0s 3i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.40_0.40_ 0.01000 0s 7i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.05_0.05_ 0.01000 0s 41i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.00_0.00_0.05_ 0.01000 0s 38i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.05_0.40_0.05_ 0.01000 0s 3i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.05_0.40_ 0.01000 0s 7i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.40_0.40_ 0.01000 0s 8i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.t| 0.01000 0s 46i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.05_0.05_ 0.01000 0s 3i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.40_0.40_0.05_ 0.01000 0s 7i 17124: [11/14/2007 00:19:02] Interesting BNR Pattern: bnr.s|0.00_0.05_0.05_ 0.01000 0s 38i 17124: [11/14/2007 00:19:02] Whitelist threshold: 10 17124: [11/14/2007 00:19:02] [burton] [0.001190] is+> (1frq, 0s, 614i) 17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i) 17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i) 17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i) 17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i) 17124: [11/14/2007 00:19:02] [burton] [0.002888] https+// (1frq, 11s, 2778i) 17124: [11/14/2007 00:19:02] [burton] [0.003946] X-Mailman-Version*2.1.5 (1frq, 29s, 5355i) 17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i) 17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i) 17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i) 17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i) 17124: [11/14/2007 00:19:02] [burton] [0.006364] https (1frq, 26s, 2970i) 17124: [11/14/2007 00:19:02] [burton] [0.006518] List-Post*<mailto (1frq, 62s, 6913i) 17124: [11/14/2007 00:19:02] [burton] [0.006790] List-Help*request (1frq, 55s, 5886i) 17124: [11/14/2007 00:19:02] [burton] [0.008260] Errors-To*bounces (1frq, 67s, 5885i) 17124: [11/14/2007 00:19:02] [burton] [0.008264] Sender*bounces (1frq, 67s, 5882i) 17124: [11/14/2007 00:19:02] [burton] [0.008533] Url*redhat (1frq, 0s, 85i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Help*cluster+request (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Post*cluster+redhat.com> (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Subscribe*<https+//www.redhat.com/mailman/listinfo/linux (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] cluster+redhat (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Subject*[Linux+cluster] (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] list+Linux (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Sender*linux+cluster (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Subscribe*//www.redhat.com/mailman/listinfo/linux+cluster> (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] redhat+com (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Return-Path*cluster+bounces (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] Burton-Bayesian Probability: 0.000000 Samples: 27 17124: [11/14/2007 00:19:02] no factors specified; using default 17124: [11/14/2007 00:19:02] Result Confidence: 0.99 17124: [11/14/2007 00:19:02] [burton] [0.001190] is+> (1frq, 0s, 614i) 17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i) 17124: [11/14/2007 00:19:02] [burton] [0.002290] >+I (2frq, 8s, 2550i) 17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i) 17124: [11/14/2007 00:19:02] [burton] [0.002382] >+The (2frq, 3s, 919i) 17124: [11/14/2007 00:19:02] [burton] [0.002888] https+// (1frq, 11s, 2778i) 17124: [11/14/2007 00:19:02] [burton] [0.003946] X-Mailman-Version*2.1.5 (1frq, 29s, 5355i) 17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i) 17124: [11/14/2007 00:19:02] [burton] [0.004760] >+> (60frq, 38s, 5812i) 17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i) 17124: [11/14/2007 00:19:02] [burton] [0.005031] wrote+> (2frq, 39s, 5643i) 17124: [11/14/2007 00:19:02] [burton] [0.006364] https (1frq, 26s, 2970i) 17124: [11/14/2007 00:19:02] [burton] [0.006518] List-Post*<mailto (1frq, 62s, 6913i) 17124: [11/14/2007 00:19:02] [burton] [0.006790] List-Help*request (1frq, 55s, 5886i) 17124: [11/14/2007 00:19:02] [burton] [0.008260] Errors-To*bounces (1frq, 67s, 5885i) 17124: [11/14/2007 00:19:02] [burton] [0.008264] Sender*bounces (1frq, 67s, 5882i) 17124: [11/14/2007 00:19:02] [burton] [0.008533] Url*redhat (1frq, 0s, 85i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Help*cluster+request (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Post*cluster+redhat.com> (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Subscribe*<https+//www.redhat.com/mailman/listinfo/linux (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] cluster+redhat (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Subject*[Linux+cluster] (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] list+Linux (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Sender*linux+cluster (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] List-Subscribe*//www.redhat.com/mailman/listinfo/linux+cluster> (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] redhat+com (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] [burton] [0.008634] Return-Path*cluster+bounces (1frq, 0s, 84i) 17124: [11/14/2007 00:19:02] Burton-Bayesian Probability: 0.000000 Samples: 27 17124: [11/14/2007 00:19:02] Result Confidence: 0.99 17124: [11/14/2007 00:19:02] BNR Decision Concurs 17124: [11/14/2007 00:19:02] total processing time: 0.08533s 17124: [11/14/2007 00:19:02] saving signature as 473a8546171241968551123 17124: [11/14/2007 00:19:02] libdspam returned probability of 0.000000 17124: [11/14/2007 00:19:02] message result: NOT SPAM --------------------------------------------------------------
Created attachment 136430 [details] An example message that crashes dspam
Please attach your dspam.conf. I also need the output of "equery uses mail-filter/dspam" command.
I have attached my dspam.conf, as well as a (1-message) training corpus that will crash from an empty database. In particular, clearing the user 'jacob' from the database, and running dspam_train on the attached tarball crashes immediately. I am running 'dspam_train jacob t_spam t_nonspam'. Thanks for the assistance. Note that I have received a reply on the dspam-users mailing list, but no resolution as of yet. I will post here if any is found. # equery uses mail-filter/dspam [ Searching for packages matching mail-filter/dspam... ] [ Colour Code : set unset ] [ Legend : Left column (U) - USE flags from make.conf ] [ : Right column (I) - USE flags packages was installed with ] [ Found these USE variables for mail-filter/dspam-3.8.0-r7 ] U I + + clamav : Adds support for Clam AntiVirus software (usually with a plugin) + + daemon : Enable support for DSPAM to run in --daemon mode + + debug : Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see http://www.gentoo.org/proj/en/qa/backtraces.xml . - - large-domain : Builds for large domain rather than for domain scale + + ldap : Adds LDAP support (Lightweight Directory Access Protocol) + + mysql : Adds mySQL Database support + + postgres : Adds support for the postgresql database - - sqlite : Adds support for sqlite - embedded sql database - - syslog : Enables support for syslog - - user-homedirs : Build with user homedir support - - virtual-users : Build with virtual-users support
Created attachment 136538 [details] offending dspam.conf
Created attachment 136539 [details] training corpus This tarball contains a single good message. When dspam_train is used with it, and an empty database schema, I observe the described segmentation fault immediately.
I am unable to reproduce. Test environment: - postgres-8.0.13 installed with USE="doc kerberos nls pam perl python readline ssl tcl test xml zlib - dspam-3.8.0-r7 instaled with USE="clamav daemon ldap mysql postgres sqlite" - setup postgresql server using emerge --config postgresql - start postgresql server - install dspam; replace dspam.conf with the attached conf - start dspam service - run dspam_train mrness t_spam t_nonspam Result: work as expected Output: dspam_train mrness t_spam t_nonspam Taking Snapshot... mrness TP: 0 TN: 0 FP: 0 FN: 0 SC: 0 NC: 0 Training t_nonspam / t_spam corpora... [test: nonspam] 1195138440.M148448P15692V0000000 result: PASS TRAINING COMPLETE Training Snapshot: mrness TP: 0 TN: 1 FP: 0 FN: 0 SC: 0 NC: 0 SHR: 100.00% HSR: 0.00% OCA: 100.00% Overall Statistics: mrness TP: 0 TN: 1 FP: 0 FN: 0 SC: 0 NC: 0 SHR: 100.00% HSR: 0.00% OCA: 100.00% Did I've missed something?
Ah, I forgot the step between "install dspam" and "start dspam": - configure postgresql database by running emerge --config dspam
@Jacob Joseph: Could you modify your ebuild and add the verbose debug option to it and re-emerge DSPAM and then post the output of the new debug log? Changes needed to be done: $(use_enable syslog) \ $(use_enable debug) \ $(use_enable debug bnr-debug) \ + $(use_enable debug verbose-debug) \ --enable-long-usernames \ --with-dspam-group=dspam \ --with-dspam-home-group=dspam \
Steve, Alin, thank you for your testing. I have tested further mysql on this and other machines. I have tested with 3.8.0-r7 and 3.8.0-r8, with verbose debugging on. With repeated compilations, and repeated runs, it does seem that dspam, and dspam_train crash only sometimes. The backtrace is always as previously posted. While the posted training corpus sometimes does not cause a crash, some other message always does. For obvious reasons, I cannot post all of my training corpus to fully illustrate this. I will attach the verbose debug log shortly.
Created attachment 137657 [details] verbose debug output during a crash
Created attachment 137659 [details] Message associated with the above verbose output Command line used: /usr/bin/dspam --user jacob --deliver=summary --stdout < '1195501649.M638339P24186V0000000000000302I002DB2C4_5967.jjoseph.org,S=29974:2,S'
(In reply to comment #9 #10 #11) > Are you fixed on PGSql or would you consider moving to MySQL as an option? The reason fro the crashes with MySQL are probably: - In DSPAM you MaxMessageSize (let's abbreviate this to MMS) xxx - In MySQL you have max_allowed_packet (let's abbreviate this to MAP) xxx - You got a mail with total size (let's abbreviate this to TS) xxx Now the problem is that DSPAM will create tokens and data for the message you get. In one point it time it will insert the data into MySQL. Since one part of the data inserted into MySQL is in binary (a blob) DSPAM will call MySQL C API functions to escape the binary data. This C API calls need maximum double amount of memory plus one byte then the real binary data has. Now the problem starts if: ((size of binary data) * 2 + 1) + (space needed for the insert SQL commands) > MAP Another problem is if: ((MMS * 2 + 1) > MAP This is one reason why DSPAM in the current release can crash. Another reason is that when DSPAM does tokenize the mail it searches the storage backend for tokens maching the new tokenized tokens. It creates a huge SQL query to search for the tokens. The generation of the SQL query can grow pretty big. Depending how much tokens you have for the user or any group the user belongs to. If you have a lot of tokens DSPAM can easy create a SQL query bigger then MAP and fail when executing that query. Unfortunately DSPAM then closes the handle to the storage backend and becomes unusable until you restart the daemon. Or if you don't use the client/server (or agent) mode it will just crash the DSPAM binary and not deliver any result. Does this sound logical to you? I have taken some time to fix that issue in the MySQL driver. But not in PostgreSQL. I first want to get MySQL running and then focus on PostgreSQL. If you are not limited in what storage engine you use in DSPAM, then I would offer you to post here a patch for MySQL driver in DSPAM. I am testing since 3 days this driver on my end and it works. But I am not 100% finished with it. If you can wait then I would be very thankful. I promise to soon post the patch here. Is that okay with you? In the mean time you could switch to MySQL and increase max_allowed_packet to lets say 128MB or any thing above 60MB and then test again. I am very confident that you will have less crashes. Do decrease the MaxMessageSize as well. Try to stay with max_allowed_packet to at least MaxMessageSize * 3. btw: The message you posted has already DSPAM headers. If you want to train correctly, then you need to wipe out the DSPAM headers and the DSPAM signature in the body (if you have any). Running something like this here should clean most of the irrelevant stuff from your corpus: for foo in * do sed -i "/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-DKIM:\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X\-Virus\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\-policyd\-weight:\|X\-Spam\-[^:]*:\) .*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\-Signature: [0-9a-f,]*$/d;s/^Subject: \(ADV\|UNS\): /Subject: /;s/^Subject: \[SPAM\][\t ]*/Subject: /;s/^Subject: \[\(\+\|\-\)\{1,2\}\] /Subject: /;s/\!DSPAM:[0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g" ${foo} done // SteveB
Steve, I do appreciate your detailed reply. As I mentioned in my first message, all of the tests mentioned here do not occur when mysql is used. Indeed, I only encountered any of this upon migrating to postgres, my preferred database. Without fault to dspam, I would be surprised to see that such things as the maximum packet size would cause intermittent failure. It sounds very much like memory corruption to me. Additionally, postgres reports no other error than that the client dropped the connection. When the dspam daemon is used, it will continue to service 'dspam --client...' with other messages. As for the dspam headers being present in the message already, I mean these messages only as an example of failure I was seeing on new, incoming messages. I'd be happy to retest my corpus after their removal to either show this is a bug, or eliminate it as a possibility. My goal in posting this bug was two-fold. First, I'd like dspam to work for me. Additionally, there is little point in someone else suffering similar troubles. I still would very much appreciate further suggestions to diagnose the problem at hand. I'm unsure what more can be done to reproduce it your systems. Thanks again. ~Jacob
Unfortunatelly no one (including upstream) were able to reproduce your segfault. Since even you don't have a safe way to reproduce the bug I can only conclude 2 things: a) you are affected by some kind of race (btw, how many processors you have?). I had in the past some bad experiences with gdb when I tried to debug multi-threaded code, therefore I must ask you to compare gdb results with strace. b) your problem is caused by an external factor. Please verify your hardware by running a complete memtest pass. If you don't detect a hardware problem, please give complete step-by-step instructions for reproducing the segfault starting from an empty database. If you cannot obtain a segfault every time you try it, give us a frequency estimate .
(In reply to comment #0) > GDB: > Short story: CTX->factors->first->ptr at dspam.c:3092 does not appear to be > initialized > ---------------------------------------------------------------- > Starting program: /usr/bin/dspam --debug --user jacob --mode=notrain --client > --stdout --deliver=innocent,spam < > /home/jacob/dbg_mail-linux-cluster-bounces@redhat.com-1194995056 > [Thread debugging using libthread_db enabled] > [New Thread -1212545344 (LWP 17038)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1212545344 (LWP 17038)] > add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092 > 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f", > (gdb) list > 3087 node_ft = c_nt_first(CTX->factors, &c_ft); > 3088 while(node_ft != NULL) { > 3089 struct dspam_factor *f = (struct dspam_factor *) > node_ft->ptr; > 3090 if (f) { > 3091 strlcat(data, ",\n\t", sizeof(data)); > 3092 snprintf(scratch, sizeof(scratch), "%s, %2.5f", > 3093 f->token_name, f->value); > 3094 strlcat(data, scratch, sizeof(data)); > 3095 } > 3096 node_ft = c_nt_next(CTX->factors, &c_ft); > (gdb) bt > #0 add_xdspam_headers (CTX=0x80705e0, ATX=0xbfe51b18) at dspam.c:3092 > #1 0x08053442 in process_message (ATX=0xbfe51b18, message=0x8062bb8, > username=0x8062930 "jacob", result_string=0xbfe519d0) at dspam.c:727 > #2 0x08054295 in process_users (ATX=0xbfe51b18, message=0x8062b10) > at dspam.c:1797 > #3 0x08054f30 in main (argc=Cannot access memory at address 0x0 > ) at dspam.c:258 > (gdb) print CTX->factors > $11 = (struct nt *) 0x806e688 > (gdb) print *CTX->factors > $12 = { > first = 0x806e570, > insert = 0x616d6c69, > items = 1700146542, > nodetype = 1869181810 > } > (gdb) print *CTX->factors->first > $13 = { > ptr = 0x38, > next = 0x20 > } > ----------------------------------------------------------- > How did you manage to get that backtrace in GDB? I can't get DSPAM to produce me a backtrace when it crashes. How did you compile DSPAM? What CPU are you using? Did you compiled other packages with the debug USE flag? Do you use PAM and have you enabled core dumps in your system? // SteveB
BTW: Could you emerge DSPAM without debug USE flag and try if your installation works more stable then with debug USE flag?
Steve, I'm unsure what aspect of debugging with gdb is failing for you, but I've configured the 3.8.0-r8 ebuild with: ./configure --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --with-storage-driver=hash_drv,mysql_drv,pgsql_drv --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --enable-daemon --enable-ldap --enable-clamav --disable-large-scale --enable-domain-scale --disable-syslog --enable-debug --enable-bnr-debug --enable-verbose-debug --enable-long-usernames --with-dspam-group=dspam --with-dspam-home-group=dspam --with-dspam-mode=2511 --with-logdir=/var/log/dspam --disable-virtual-users --enable-preferences-extension --disable-homedir --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --with-pgsql-includes=/usr/include/postgresql --with-pgsql-libraries=/usr/lib/postgresql --build=i686-pc-linux-gnu I'm also using CFLAGS="-march=athlon-xp -O1 -g -pipe". To be absolutely clear, I have recompiled glibc, postgres, clamav, and openldap with these CFLAGS. Much of the rest of the system has "-march=athlon-xp -03 -pipe". You may also find it usefull to add 'nostrip' and 'noclean' to your FEATURES in /etc/make.conf. The latter, in particular, will keep the sources around where gdb can find them. To get a backtrace on a message that I know causes a segfault, I perform the following: <delete all tokens for the user jacob from the database, begin training> # dspam_train jacob t_spam t_nonspam <Wait for one message to segfault> ... [test: spam ] 1195501649.M366542P24186V0000000 result: sh: line 1: 8860 Segmentation fault /usr/bin/dspam --user jacob --deliver=summary --stdout < 't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S' BROKEN result!! ... <Verify we still segfault, and run in gdb> # /usr/bin/dspam --user jacob --deliver=summary --stdout < 't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S' Segmentation fault # gdb /usr/bin/dspam GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run --user jacob --deliver=summary --stdout < 't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S' Starting program: /usr/bin/dspam --user jacob --deliver=summary --stdout < 't_spam/1195501649.M366542P24186V0000000000000302I002DB0F9_5508.jjoseph.org,S=24044:2,S' [Thread debugging using libthread_db enabled] [New Thread -1212250432 (LWP 8965)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1212250432 (LWP 8965)] add_xdspam_headers (CTX=0x807c728, ATX=0xbfc989c8) at dspam.c:3086 3086 struct dspam_factor *f = (struct dspam_factor *) node_ft->ptr; (gdb) bt #0 add_xdspam_headers (CTX=0x807c728, ATX=0xbfc989c8) at dspam.c:3086 #1 0x08053672 in process_message (ATX=0xbfc989c8, message=0x80629f8, username=0x8062930 "jacob", result_string=0xbfc98880) at dspam.c:724 #2 0x080544cb in process_users (ATX=0xbfc989c8, message=0x8062950) at dspam.c:1794 #3 0x08055171 in main (argc=5, argv=0xbfc998a4) at dspam.c:255 (gdb) list 3081 if (CTX->factors != NULL) { 3082 snprintf(data, sizeof(data), "X-DSPAM-Factors: %d", 3083 CTX->factors->items); 3084 node_ft = c_nt_first(CTX->factors, &c_ft); 3085 while(node_ft != NULL) { 3086 struct dspam_factor *f = (struct dspam_factor *) node_ft->ptr; 3087 if (f) { 3088 strlcat(data, ",\n\t", sizeof(data)); 3089 snprintf(scratch, sizeof(scratch), "%s, %2.5f", 3090 f->token_name, f->value); (gdb) Though the particular message involved in a crash seems to vary from compile to compile, I always experience a crash on my training corpus. When configured without any of the debug options (but the same CFLAGS), I am not currently able to reproduce the above segfault thus far.
I have posted a large training corpus at http://www.jjoseph.org/files/training_corpus.tar.bz2. With the debug use flag, I have never been able to train on this data set without experiencing a reproducible segfault on a particular message.
(In reply to comment #18) > I have posted a large training corpus at > http://www.jjoseph.org/files/training_corpus.tar.bz2. With the debug use flag, > I have never been able to train on this data set without experiencing a > reproducible segfault on a particular message. > I downloaded your corpus and started training on my DSPAM 3.8.0 with my own improved/fixed MySQL driver. The system is not the fastest in the world: mail ~ # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon (TM) stepping : 4 cpu MHz : 1400.180 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips : 2803.66 clflush size : 32 mail ~ # free -m total used free shared buffers cached Mem: 1012 990 21 0 0 333 -/+ buffers/cache: 657 355 Swap: 1535 765 770 mail ~ # So far no crashes with the MySQL backend: ----------------------------------------- Training Snapshot: jmjoseph TP: 6807 TN: 880 FP: 12 FN: 75 SC: 25 NC: 0 SHR: 98.91% HSR: 1.35% OCA: 98.88% Overall Statistics: jmjoseph TP: 6807 TN: 880 FP: 12 FN: 75 SC: 25 NC: 0 SHR: 98.91% HSR: 1.35% OCA: 98.88% real 31m26.005s user 0m58.616s sys 0m44.377s mail ~ # dspam_stats -s -H jmjoseph jmjoseph: TP True Positives: 6807 TN True Negatives: 880 FP False Positives: 12 FN False Negatives: 75 SC Spam Corpusfed: 25 NC Nonspam Corpusfed: 0 TL Training Left: 0 SHR Spam Hit Rate 98.91% HSR Ham Strike Rate: 1.35% PPV Positive predictive value: 99.82% OCA Overall Accuracy: 98.88% mail ~ # ----------------------------------------- Tomorrow after work I will test with DSPAM 3.8.0-r8 and PostgreSQL and one small patch I made one week ago for PostgreSQL. The MySQL patch I made is not small. I added 511 lines and removed 249 lines in the MySQL driver. While I was working on the MySQL driver I have looked inside the PostgreSQL driver to see how things are there solved. Now I know that the PostgreSQL driver is as well not in the best shape. I need to take my time and do the same changes to PostgreSQL diver as I did for MySQL.
Created attachment 138206 [details, diff] dspam-3.8.0-fix_bnr_debug.patch This patch fixes (hopefully) the crash when using --enable-bnr-debug. Have fun :) @Alin: If you are going to include that patch into Gentoo, could you then as well change the ebuild to have this: # Debug build if use debug; then filter-flags "-fomit-frame-pointer" "-O?" append-flags "-O0" "-ggdb" fi And this here: $(use_enable debug bnr-debug) \ $(use_enable debug verbose-debug) \ --enable-verbose-debug does implicit set --enable-debug so it is not needed. But does no harm if you do: $(use_enable debug) \ $(use_enable debug bnr-debug) \ $(use_enable debug verbose-debug) \ Can you both test this patch and enable: --enable-debug --enable-bnr-debug --enable-verbose-debug And then check if DSPAM still crashes? // SteveB
Steve, your patch does seem to fix my crashes. I'll post back if a more complete test reveals any further news. Thank you. ~Jacob
(In reply to comment #21) > Steve, your patch does seem to fix my crashes. I'll post back if a more > complete test reveals any further news. Thank you. ~Jacob > Perfect! You have to thank Alin. Without him I would never know any thing about this bug. Thanks Alin!
Fixed in -r9, thanks! I've added $(use_enable debug verbose-debug), but I didn't mess with CFLAGS as you suggested in comment #20. AFAIK users who want to debug a certain program have to know how to mangle CFLAGS and FEATURES at package level. I see no reason to override user's CFLAGS.
(In reply to comment #23) > Fixed in -r9, thanks! > > I've added $(use_enable debug verbose-debug), but I didn't mess with CFLAGS as > you suggested in comment #20. AFAIK users who want to debug a certain program > have to know how to mangle CFLAGS and FEATURES at package level. I see no > reason to override user's CFLAGS. > I am not on your line. If some one enables debug USE flag, then I would expect that the package gets compiled with debug options. I would understand your intervention if DSPAM would be the only package overwriting/filtering/enforcing certain CFLAGS with debug USE flags. But we are not. Allow me to ask the other way around: Why USE flags at all? If some one wants to use certain switches (which are right now provided with the help of the USE flags) then we could expect the user to know CFLAGS, FEATURES, LDFLAGS and friends at package level. My question does not make sense. I know. But I think not setting the CFLAGS for a package where the user explicitly requests "debug" in the USE flags is as well not making sense. What do you think?
Debug flag is never used (or shouldn't be used) as a synonym to "build the program with -O0 -gddb and don't strip the debug info". In this case, it activates debug logging. Yeah, not exactly intuitive, but this is how upstream decided to implement things. If I were the author, these would have been accomplished through command line options, not compilation defines. Anyway, the debug USE flag reason of being has been discussed over the time on dev ml. I distinctly remember that adding -ggdb to CFLAGS is a no-no.
(In reply to comment #25) > Debug flag is never used (or shouldn't be used) as a synonym to "build the > program with -O0 -gddb and don't strip the debug info". In this case, it > activates debug logging. Yeah, not exactly intuitive, but this is how upstream > decided to implement things. If I were the author, these would have been > accomplished through command line options, not compilation defines. > I did not say anything about "don't strip the debug info". > Anyway, the debug USE flag reason of being has been discussed over the time on > dev ml. I distinctly remember that adding -ggdb to CFLAGS is a no-no. > Well... then I have a nice list of such "no-no's": grep -R "\-ggdb" /usr/portage/ Anyway... it is okay.