Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 861089 - sys-apps/file-5.42 has encoding problem with UTF-8 printing file names
Summary: sys-apps/file-5.42 has encoding problem with UTF-8 printing file names
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard: Should be fixed in unkeyworded 5.42-r...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-25 22:34 UTC by Joerg Schaible
Modified: 2022-08-23 20:04 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joerg Schaible 2022-07-25 22:34:15 UTC
file prints the file name of an examined file on UTF-8 file system with erroneous decoding. Example:

$ file "Video/Die große Verführung.mp4"
Video/Die gro\303\237e Verf\303\274hrung.: ISO Media, MP4 Base Media v1 [ISO 14496-12:2003]

Installing file-4.51 again, the command has following correct output:

$ file "Video/Die große Verführung.mp4"
Video/Die große Verführung.mp4: ISO Media, MP4 Base Media v1 [ISO 14496-12:2003]

Reproducible: Always

Steps to Reproduce:
1. install file-4.52
2. examine file in an UTF-8 file system that contains non-ASCII characters

Actual Results:  
non-ASCII chars are displayed with an erroneous decoding

Expected Results:  
non-ASCII chars should not be decoded at all

$ emerge --info
Portage 3.0.30 (python 3.10.5-final-0, default/linux/amd64/17.1, gcc-11.3.0, glibc-2.34-r13, 5.15.52-gentoo-x86_64 x86_64)
=================================================================
System uname: Linux-5.15.52-gentoo-x86_64-x86_64-AMD_Ryzen_5_3400G_with_Radeon_Vega_Graphics-with-glibc2.34
KiB Mem:    14237116 total,    354332 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Fri, 22 Jul 2022 19:46:40 +0000
Head commit of repository gentoo: 1236825c6d6467d11bcbee96d9adfee3621075e4

sh bash 5.1_p16
ld GNU ld (Gentoo 2.37_p1 p2) 2.37
app-misc/pax-utils:        1.3.4::gentoo
app-shells/bash:           5.1_p16::gentoo
dev-java/java-config:      2.3.1::gentoo
dev-lang/perl:             5.34.1-r3::gentoo
dev-lang/python:           3.10.5::gentoo
dev-util/cmake:            3.22.4::gentoo
dev-util/meson:            0.62.2::gentoo
sys-apps/baselayout:       2.8::gentoo
sys-apps/openrc:           0.44.10::gentoo
sys-apps/sandbox:          2.29::gentoo
sys-devel/autoconf:        2.71-r1::gentoo
sys-devel/automake:        1.16.5::gentoo
sys-devel/binutils:        2.37_p1-r2::gentoo
sys-devel/binutils-config: 5.4.1::gentoo
sys-devel/gcc:             11.3.0::gentoo
sys-devel/gcc-config:      2.5-r1::gentoo
sys-devel/libtool:         2.4.7::gentoo
sys-devel/make:            4.3::gentoo
sys-kernel/linux-headers:  5.15-r3::gentoo (virtual/os-headers)
sys-libs/glibc:            2.34-r13::gentoo
Repositories:

gentoo
    location: /var/db/portage/tree/central
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo.git
    priority: -1000

local
    location: /var/db/portage/tree/local
    masters: gentoo

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -Os -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.1/ext-active/ /etc/php/cgi-php8.1/ext-active/ /etc/php/cli-php8.1/ext-active/ /etc/php/fpm-php8.1/ext-active/ /etc/php/phpdbg-php8.1/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -Os -pipe"
DISTDIR="/var/db/portage/distfiles/bobbel"
EMERGE_DEFAULT_OPTS="--autounmask=n --with-bdeps=y"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://sunsite.cnlab-switch.ch/mirror/gentoo/ http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ http://ftp.heanet.ie/pub/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en de"
MAKEOPTS="-j7 -s"
PKGDIR="/var/db/portage/packages/core2duo"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp/portage"
SHELL="/bin/bash"
USE="acl alsa amd64 bzip2 caps cli crypt cups dbus dri fam fortran gdbm gpm iconv icu idn ipv6 jpeg jpeg2k libglvnd libtirpc logrotate multilib ncurses nptl pam pcre png readline seccomp split-usr ssl syslog tiff unicode usb vim-syntax xattr zlib zstd" ABI_X86="64" ADA_TARGET="gnat_2020" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx avx2 f16c fma3 pclmul popcnt rdrand sha sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-4 php8-0 php8-1" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10" RUBY_TARGETS="ruby27" SANE_BACKENDS="net" USERLAND="GNU" VIDEO_CARDS="vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LEX, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-07-26 01:38:22 UTC
Could you try file-9999? If it's still broken, could you report upstream?
Comment 2 aleck 2022-08-03 10:12:05 UTC
Same problem here, I think its a typo in the version. The affected version is sys-apps/file-5.42.
Comment 3 aleck 2022-08-03 10:21:10 UTC
seems to be fixed in 9999
Comment 5 Bernd Feige 2022-08-08 10:38:50 UTC
Note that I was bitten by this because this bug causes the "ext" file action mechanism of app-misc/mc to fail if there is any wide character in the path. For example, I have a directory "9€_Ticket" below which PDF files are stored, and showing the PDF files with MC (i.e., starting the PDF viewer on the "View" action) silently does not work. Nearly reported an MC bug for this...
I can confirm that the MC problem is fixed by sys-apps/file-9999.

Sam: I applied the three patches (they don't apply cleanly just because of the FILE_RCSID updates...) but they are not sufficient ("file: mbrtowc.c:104: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed.").
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-08-15 01:09:46 UTC
(In reply to Bernd Feige from comment #5)
> Note that I was bitten by this because this bug causes the "ext" file action
> mechanism of app-misc/mc to fail if there is any wide character in the path.
> For example, I have a directory "9€_Ticket" below which PDF files are
> stored, and showing the PDF files with MC (i.e., starting the PDF viewer on
> the "View" action) silently does not work. Nearly reported an MC bug for
> this...
> I can confirm that the MC problem is fixed by sys-apps/file-9999.
> 
> Sam: I applied the three patches (they don't apply cleanly just because of
> the FILE_RCSID updates...) but they are not sufficient ("file:
> mbrtowc.c:104: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed.").

Thanks. I got it to apply with https://github.com/file/file/commit/7e59d34206d7c962e093d4239e5367a2cd8b7623 thrown in first (and dropping all RCS hunks, of course) but then it just hanged on any input. Bleh.
Comment 7 Larry the Git Cow gentoo-dev 2022-08-16 02:30:45 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=253ca90f3f968a03ea6fff8f0011cf411764b22e

commit 253ca90f3f968a03ea6fff8f0011cf411764b22e
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2022-08-16 02:28:55 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2022-08-16 02:29:57 +0000

    sys-apps/file: backport unicode handling fixes to 5.42
    
    Temporarily unkeyworded given I had a few issues before I threw
    in a few extra patches. Want to give it a test run for a day
    or so myself first before keywording.
    
    Bug: https://bugs.gentoo.org/861089
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/file/file-5.42-r1.ebuild                 | 162 +++++++++
 sys-apps/file/files/file-5.42-unicode-fixes.patch | 414 ++++++++++++++++++++++
 2 files changed, 576 insertions(+)
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-08-16 02:31:38 UTC
Could you guys try out 5.42-r1 (unkeyworded for now) and let me know if it solves your issues, and if it works OK in general? Thanks.
Comment 9 Larry the Git Cow gentoo-dev 2022-08-16 02:34:54 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=72038d02f171d9e3defd2c054f857869f84e9287

commit 72038d02f171d9e3defd2c054f857869f84e9287
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2022-08-16 02:34:40 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2022-08-16 02:34:40 +0000

    sys-apps/file: drop 5.42 back to ~arch
    
    Issues with handling unicode.
    
    Bug: https://bugs.gentoo.org/861089
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/file/file-5.42.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 10 Larry the Git Cow gentoo-dev 2022-08-22 18:05:50 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=da324cbb488b8a50aaa3e26af5ea0120535eefee

commit da324cbb488b8a50aaa3e26af5ea0120535eefee
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2022-08-22 18:04:56 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2022-08-22 18:05:25 +0000

    sys-apps/file: rekeyword 5.42-r1 w/ unicode fixes
    
    Closes: https://bugs.gentoo.org/861089
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/file/file-5.42-r1.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 11 aleck 2022-08-23 19:54:17 UTC
Hello Sam,

sorry, was on holiday. Works like a charm. 

Thank you very much

Jan
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-08-23 20:04:46 UTC
(In reply to aleck from comment #11)
> Hello Sam,
> 
> sorry, was on holiday. Works like a charm. 
> 
> Thank you very much
> 
> Jan

No problem at all and big thank you for confirming!