An ASCII text file beginning with the caracters "Vé" is identified as MPEG-4 LOAS file. As we do not accept MPEG files in e-mails ( via Mailscanner ), text messages get blocked. We use portuguese language, thats why we use accents. Reproducible: Always Steps to Reproduce: 1. create a text file test.txt, beginnig with the letters Vé 2. execute 'file test.txt' 3. it returns test.txt: MPEG-4 LOAS Expected Results: It should identify the file as ASCII text file
1) Is this a regression relative to the current stable version? 2) Please post `emerge --info' too. 3) Attach a sample file - I cannot reproduce the problem: jeroen@elmer ~ $ eversion sys-apps/file sys-apps/file-4.26 jeroen@elmer ~ $ file /keeps/gentoo/bugs/245951.txt /keeps/gentoo/bugs/245951.txt: UTF-8 Unicode text jeroen@elmer ~ $ cat /keeps/gentoo/bugs/245951.txt Vé
Created attachment 171018 [details] text file, which identifies as MPEG
I tried it on 2 diferente servers, one with file 4.26, the other with 4.23 Server1: fw ~ # echo "Vé abcdefg" > test.txt fw ~ # file test.txt test.txt: MPEG-4 LOAS fw ~ # file --version file-4.26 magic file from /usr/share/misc/file/magic Server2 srv franz # file test.txt test.txt: MPEG-4 LOAS srv franz # file --version file-4.23 magic file from /usr/share/misc/file/magic The emerge --info output: fw ~ # emerge --info Portage 2.1.4.5 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r4SMP i686) ================================================================= System uname: 2.6.24-gentoo-r4SMP i686 Intel(R) XEON(TM) CPU 2.00GHz Timestamp of tree: Fri, 07 Nov 2008 05:15:01 +0000 app-shells/bash: 3.2_p17-r1 dev-java/java-config: 1.3.7, 2.1.6 dev-lang/python: 2.4.4-r9 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.61-r2 sys-devel/automake: 1.7.9-r1, 1.10.1 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=i686 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /opt/openfire/resources/security/ /var/bind" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O2 -march=i686 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.samerica.gentoo.org/gentoo-portage" USE="acl bash-completion berkdb cli cracklib crypt dri fortran gdbm gpm iconv isdnlog logrotate midi mudflap ncurses nls nptl nptlonly openmp pam pcre perl pppd python readline reflection sasl session spl ssl tcpd unicode x86 xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i740 i810 imstt intel mach64 mga neomagic nsc nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
Confirmed.
can not confirm here : orzel@berlioz /tmp% echo "Vé abcdefg" > test.txt orzel@berlioz /tmp% file test.txt test.txt: UTF-8 Unicode text orzel@berlioz /tmp% file --version file-4.26 System uname: Linux-2.6.27-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4200+-with-glibc2.2.5 Timestamp of tree: Sat, 20 Dec 2008 02:36:01 +0000 distcc 3.1 x86_64-pc-linux-gnu [disabled] ccache version 2.4 [disabled] app-shells/bash: 3.2_p48 dev-java/java-config: 1.3.7-r1, 2.1.6-r1 dev-lang/python: 2.5.2-r8 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r8 dev-util/cmake: 2.6.2 sys-apps/baselayout: 2.0.0 sys-apps/openrc: 0.3.0-r1 sys-apps/sandbox: 1.2.18.1-r3 sys-devel/autoconf: 2.13, 2.63 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.19 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.27-r2 ABI="amd64" ACCEPT_KEYWORDS="amd64 ~amd64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" ANT_HOME="/usr/share/ant" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ARCH="amd64" ASFLAGS_x86="--32" AUTOCLEAN="yes" BASH_ENV="/.profile" CBUILD="x86_64-pc-linux-gnu" CDEFINE_amd64="__x86_64__" CDEFINE_x86="__i386__" CFLAGS="-march=native -O3 -pipe -msse3" CFLAGS_x86="-m32" CHOST="x86_64-pc-linux-gnu"
that's because the file in question isnt ASCII. you cant expect non-ASCII files to be detected as ASCII. that second character there makes the file ISO-8859-1.
Ok... it´s non-ascii.. but why is it recognized MPEG-4, and not as ISO-8859-1 or UTF-8 Unicode text ? My problem is that we block MPEG files on our mailserver ( with Mailscanner / Postfix ... etc .. ) and some text files like those are blocked as MPEG files. Could there something be wrong in my internationalization setting ?
because that is the signature for mpeg loas files. just look at the magic db: # Live MPEG-4 audio streams (instead of RTP FlexMux) 0 beshort&0xFFE0 0x56E0 MPEG-4 LOAS if you use iso-8859-* files without anything around it, you risk being detected as any variety of file formats. that's simply how binary files work.