Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 252565 - mail-filter/spamassassin-3.2.1-r1: error in language files
Summary: mail-filter/spamassassin-3.2.1-r1: error in language files
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High major
Assignee: Gentoo Perl team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-26 08:12 UTC by Andrey Petukhov
Modified: 2009-01-27 10:04 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Fixed languages file (languages,99.40 KB, text/plain)
2008-12-26 08:16 UTC, Andrey Petukhov
Details
testcase - mail message UTF-8 RU (utf8.msg,153.53 KB, text/plain)
2008-12-26 08:40 UTC, Andrey Petukhov
Details
ru.iso-8859-5.lm (ru.iso-8859-5.lm,1.58 KB, text/plain)
2009-01-23 08:40 UTC, Andrey Petukhov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Petukhov 2008-12-26 08:12:31 UTC
Mail::SpamAssassin::Plugin::TextCat
/usr/share/spamassassin/languages
contains wrong tokens for ru.iso-8859-5 (UTF-8)
When switching to "normalize_charset 1"
TextCat becomes useless
It never detect language.
When i replaced it to http://www.phpclasses.org/browse/file/14651.html
This file contain tokens for ru.iso-8859-5 (UTF-8)
Everything just fine.


Reproducible: Always

Steps to Reproduce:
1.wget http://www1.uralpress.ru/my/utf8.msg
2.sa-learn -D --spam utf8.msg 2>&1 | grep textcat
3.dbg: textcat: can't determine language uniquely enough
wget http://www1.uralpress.ru/my/languages
cp languages /usr/share/spamassassin/
sa-learn -D --spam utf8.msg 2>&1 | grep textcat
dbg: textcat: language possibly: ru.iso-8859-5
Actual Results:  
bg: textcat: can't determine language uniquely enough

Expected Results:  
dbg: textcat: language possibly: ru.iso-8859-5

/etc/spamassassin/v310.pre:
loadplugin Mail::SpamAssassin::Plugin::TextCat



Portage 2.1.4.5 (default/linux/x86/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r9 i686)
=================================================================
System uname: 2.6.25-gentoo-r9 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
Timestamp of tree: Fri, 26 Dec 2008 01:00:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p33
dev-lang/python:     2.5.2-r7
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.61-r2
sys-devel/automake:  1.9.6-r2, 1.10.1-r1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=k8 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-O2 -march=k8 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://mirror.bytemark.co.uk/gentoo/ http://gentoo.tups.lv/source/ http://trumpetti.atm.tut.fi/gentoo/"
LANG="ru_RU.UTF-8"
LC_ALL=""
LDFLAGS="-Wl,-O1"
LINGUAS="ru"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl apache2 bash-completion bzip2 cli cracklib crypt cups dri gdbm gpm iconv isdnlog jpeg logrotate midi mmx mudflap mysql mysqli nls nptl nptlonly openmp pam pcre perl png pppd python readline reflection session spl sse sse2 ssl sysfs tcpd truetype unicode vhosts x86 xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_dbd authn_default authn_file authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers imagemap log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling ssl unique_id vhost_alias" APACHE2_MPMS="prefork" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="ru" USERLAND="GNU" VIDEO_CARDS="fbdev glint i810 intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Andrey Petukhov 2008-12-26 08:16:53 UTC
Created attachment 176408 [details]
Fixed languages file
Comment 2 Andrey Petukhov 2008-12-26 08:40:36 UTC
Created attachment 176411 [details]
testcase - mail message UTF-8 RU
Comment 3 Andrey Petukhov 2009-01-23 06:12:47 UTC
(In reply to comment #0)

> 1.wget http://www1.uralpress.ru/my/utf8.msg
> wget http://www1.uralpress.ru/my/languages

Please use files from attachments.
Comment 4 Azamat H. Hackimov 2009-01-23 07:24:30 UTC
This is misconfigured installation.

normalize_charset not required for UTF-8 aware system - all messages normalizing to UTF-8 by default.

ISO-8895-5 is NOT UTF-8. Messages not avaible, I can't reproduce bug.

I think this is INVALID bug.
Comment 5 Andrey Petukhov 2009-01-23 07:50:35 UTC
(In reply to comment #4)
> This is misconfigured installation.
> 
> normalize_charset not required for UTF-8 aware system - all messages
> normalizing to UTF-8 by default.
> 
> ISO-8895-5 is NOT UTF-8. Messages not avaible, I can't reproduce bug.
> 
> I think this is INVALID bug.
> 

See Comment #3 - messages in attachments
ISO-8895-5 - is the charset detected by spamassassin, encoding of message was UTF-8
normalize_charset - See http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html#language_options
Comment 6 Andrey Petukhov 2009-01-23 08:23:57 UTC
(In reply to comment #4)
> This is misconfigured installation.
> I think this is INVALID bug.

You think. This is great.
Try to reproduce or prove it's INVALID.
Comment 7 Andrey Petukhov 2009-01-23 08:40:11 UTC
Created attachment 179423 [details]
ru.iso-8859-5.lm