file prints the file name of an examined file on UTF-8 file system with erroneous decoding. Example: $ file "Video/Die große Verführung.mp4" Video/Die gro\303\237e Verf\303\274hrung.: ISO Media, MP4 Base Media v1 [ISO 14496-12:2003] Installing file-4.51 again, the command has following correct output: $ file "Video/Die große Verführung.mp4" Video/Die große Verführung.mp4: ISO Media, MP4 Base Media v1 [ISO 14496-12:2003] Reproducible: Always Steps to Reproduce: 1. install file-4.52 2. examine file in an UTF-8 file system that contains non-ASCII characters Actual Results: non-ASCII chars are displayed with an erroneous decoding Expected Results: non-ASCII chars should not be decoded at all $ emerge --info Portage 3.0.30 (python 3.10.5-final-0, default/linux/amd64/17.1, gcc-11.3.0, glibc-2.34-r13, 5.15.52-gentoo-x86_64 x86_64) ================================================================= System uname: Linux-5.15.52-gentoo-x86_64-x86_64-AMD_Ryzen_5_3400G_with_Radeon_Vega_Graphics-with-glibc2.34 KiB Mem: 14237116 total, 354332 free KiB Swap: 0 total, 0 free Timestamp of repository gentoo: Fri, 22 Jul 2022 19:46:40 +0000 Head commit of repository gentoo: 1236825c6d6467d11bcbee96d9adfee3621075e4 sh bash 5.1_p16 ld GNU ld (Gentoo 2.37_p1 p2) 2.37 app-misc/pax-utils: 1.3.4::gentoo app-shells/bash: 5.1_p16::gentoo dev-java/java-config: 2.3.1::gentoo dev-lang/perl: 5.34.1-r3::gentoo dev-lang/python: 3.10.5::gentoo dev-util/cmake: 3.22.4::gentoo dev-util/meson: 0.62.2::gentoo sys-apps/baselayout: 2.8::gentoo sys-apps/openrc: 0.44.10::gentoo sys-apps/sandbox: 2.29::gentoo sys-devel/autoconf: 2.71-r1::gentoo sys-devel/automake: 1.16.5::gentoo sys-devel/binutils: 2.37_p1-r2::gentoo sys-devel/binutils-config: 5.4.1::gentoo sys-devel/gcc: 11.3.0::gentoo sys-devel/gcc-config: 2.5-r1::gentoo sys-devel/libtool: 2.4.7::gentoo sys-devel/make: 4.3::gentoo sys-kernel/linux-headers: 5.15-r3::gentoo (virtual/os-headers) sys-libs/glibc: 2.34-r13::gentoo Repositories: gentoo location: /var/db/portage/tree/central sync-type: git sync-uri: https://github.com/gentoo-mirror/gentoo.git priority: -1000 local location: /var/db/portage/tree/local masters: gentoo ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="@FREE" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -Os -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.1/ext-active/ /etc/php/cgi-php8.1/ext-active/ /etc/php/cli-php8.1/ext-active/ /etc/php/fpm-php8.1/ext-active/ /etc/php/phpdbg-php8.1/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=native -Os -pipe" DISTDIR="/var/db/portage/distfiles/bobbel" EMERGE_DEFAULT_OPTS="--autounmask=n --with-bdeps=y" ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://sunsite.cnlab-switch.ch/mirror/gentoo/ http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ http://ftp.heanet.ie/pub/gentoo/" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="en de" MAKEOPTS="-j7 -s" PKGDIR="/var/db/portage/packages/core2duo" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp/portage" SHELL="/bin/bash" USE="acl alsa amd64 bzip2 caps cli crypt cups dbus dri fam fortran gdbm gpm iconv icu idn ipv6 jpeg jpeg2k libglvnd libtirpc logrotate multilib ncurses nptl pam pcre png readline seccomp split-usr ssl syslog tiff unicode usb vim-syntax xattr zlib zstd" ABI_X86="64" ADA_TARGET="gnat_2020" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx avx2 f16c fma3 pclmul popcnt rdrand sha sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-4 php8-0 php8-1" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10" RUBY_TARGETS="ruby27" SANE_BACKENDS="net" USERLAND="GNU" VIDEO_CARDS="vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LEX, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Could you try file-9999? If it's still broken, could you report upstream?
Same problem here, I think its a typo in the version. The affected version is sys-apps/file-5.42.
seems to be fixed in 9999
I'm guessing the fixes are: - https://github.com/file/file/commit/c80065fe6900be5e794941e29b32440e9969b1c3 - https://github.com/file/file/commit/d471022b2772071877895759f209f2c346757a4c - https://github.com/file/file/commit/441ac2b15508909e82ad467960df4ac0adf9644c
Note that I was bitten by this because this bug causes the "ext" file action mechanism of app-misc/mc to fail if there is any wide character in the path. For example, I have a directory "9€_Ticket" below which PDF files are stored, and showing the PDF files with MC (i.e., starting the PDF viewer on the "View" action) silently does not work. Nearly reported an MC bug for this... I can confirm that the MC problem is fixed by sys-apps/file-9999. Sam: I applied the three patches (they don't apply cleanly just because of the FILE_RCSID updates...) but they are not sufficient ("file: mbrtowc.c:104: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed.").
(In reply to Bernd Feige from comment #5) > Note that I was bitten by this because this bug causes the "ext" file action > mechanism of app-misc/mc to fail if there is any wide character in the path. > For example, I have a directory "9€_Ticket" below which PDF files are > stored, and showing the PDF files with MC (i.e., starting the PDF viewer on > the "View" action) silently does not work. Nearly reported an MC bug for > this... > I can confirm that the MC problem is fixed by sys-apps/file-9999. > > Sam: I applied the three patches (they don't apply cleanly just because of > the FILE_RCSID updates...) but they are not sufficient ("file: > mbrtowc.c:104: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed."). Thanks. I got it to apply with https://github.com/file/file/commit/7e59d34206d7c962e093d4239e5367a2cd8b7623 thrown in first (and dropping all RCS hunks, of course) but then it just hanged on any input. Bleh.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=253ca90f3f968a03ea6fff8f0011cf411764b22e commit 253ca90f3f968a03ea6fff8f0011cf411764b22e Author: Sam James <sam@gentoo.org> AuthorDate: 2022-08-16 02:28:55 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-08-16 02:29:57 +0000 sys-apps/file: backport unicode handling fixes to 5.42 Temporarily unkeyworded given I had a few issues before I threw in a few extra patches. Want to give it a test run for a day or so myself first before keywording. Bug: https://bugs.gentoo.org/861089 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/file/file-5.42-r1.ebuild | 162 +++++++++ sys-apps/file/files/file-5.42-unicode-fixes.patch | 414 ++++++++++++++++++++++ 2 files changed, 576 insertions(+)
Could you guys try out 5.42-r1 (unkeyworded for now) and let me know if it solves your issues, and if it works OK in general? Thanks.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=72038d02f171d9e3defd2c054f857869f84e9287 commit 72038d02f171d9e3defd2c054f857869f84e9287 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-08-16 02:34:40 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-08-16 02:34:40 +0000 sys-apps/file: drop 5.42 back to ~arch Issues with handling unicode. Bug: https://bugs.gentoo.org/861089 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/file/file-5.42.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=da324cbb488b8a50aaa3e26af5ea0120535eefee commit da324cbb488b8a50aaa3e26af5ea0120535eefee Author: Sam James <sam@gentoo.org> AuthorDate: 2022-08-22 18:04:56 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-08-22 18:05:25 +0000 sys-apps/file: rekeyword 5.42-r1 w/ unicode fixes Closes: https://bugs.gentoo.org/861089 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/file/file-5.42-r1.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Hello Sam, sorry, was on holiday. Works like a charm. Thank you very much Jan
(In reply to aleck from comment #11) > Hello Sam, > > sorry, was on holiday. Works like a charm. > > Thank you very much > > Jan No problem at all and big thank you for confirming!